11-711 Algorithms for NLP Chart Parsing Reading: James Allen, Natural Language Understanding Section 3.4, pp. 53-61
Chart Parsing General Principles: A Bottom-Up parsing method Construct a parse starting from the input symbols Build constituents from sub-constituents When all constituents on the RHS of a rule are matched, create a constituent for the LHS of the rule The Chart allows storing partial analyses, so that they can be shared. Data structures used by the algorithm: The Key: the current constituent we are attempting to match An Active Arc: a grammar rule that has a partially matched RHS The Agenda: Keeps track of newly found unprocessed constituents The Chart: Records processed constituents (non-terminals) that span substrings of the input 1 11-711 Algorithms for NLP
Chart Parsing Steps in the Process: Input is processed left-to-right, one word at a time 1. Find all POS of word (terminal-level) 2. Initialize Agenda with all POS of the word 3. Pick a Key from the Agenda 4. Add all grammar rules that start with the Key as active arcs 5. Extend any existing active arcs with the Key 6. Add LHS constituents of newly completed rules to the Agenda 7. Add the Key to the Chart 8. If Agenda not empty - goto (3), else goto (1) 2 11-711 Algorithms for NLP
The Chart Parsing Algorithm Extending Active Arcs with a Key: Each Active Arc has the form: 1 A Key constituent has the form: When processing the Key 1 2, we search the active arc list for an arc 1 0 1, and then create a new active arc 1 0 2 If the new active arc is a completed rule: 1 0 2, then we add 0 2 to the Agenda After using the key to extend all relevant arcs, it is entered into the Chart 3 11-711 Algorithms for NLP
The Chart Parsing Algorithm The Main Algorithm: parsing input 1 1. 0 2. If Agenda is empty and then set 1, find all POS of and add them as constituents 1 to the Agenda 3. Pick a Key constituent 1 from the Agenda 4. For each grammar rule of form 1, add 1 to the list of active arcs 5. Use Key to extend all relevant active arcs 6. Add LHS of any completed active arcs into the Agenda 7. Insert the Key into the Chart 8. If Key is 1 then Accept the input Else goto (2) 4 11-711 Algorithms for NLP
Chart Parsing - Example The Grammar: 1 2 3 4 5 6 5 11-711 Algorithms for NLP
Chart Parsing - Example The input: The large can can hold the water POS of Input Words: the: ART large: ADJ can: N, AUX, V hold: N, V water: N, V 6 11-711 Algorithms for NLP
Chart Parsing = Example The input: "X = large can can hold the 8 f 1 1-71 1 Algorithms for NLP
Chart Parsing - Example The input: "X = The large can can hold the water" ART 1 ADJ 1 I 2 3 NP + ART 0 ADJ N NP + ART0 N NP + ADJ o N 0 - Figure 3.9 The chart after seeing an ADJ in position 2 11-71 1 Algorithms for NLP
P io -1-5 - P C Z C - - X C > S Z C 8 Z C W W Z -a M Z?I 2 - CD h) w Z 2 2 - CD P b ~
Chart Parsing - Example The input: "X = The large can can hold the water" S I (rule 1 with NPI and VP2) S2 (rule I with NP2 and VP2) VP3 (rule 5 with AUXI and VP2) NP2 (rule 4) VP2 (rule 5) NPI (rule 2) VPI (rule 6) N1 N2 NP3 (rule 3) VI v2 v3 v4 ART I ADJ 1 AUX 1 AUX2 N3 ART2 N4 1 the 2 large 3 can 4 can 5 hold 6 the 7 water Figure 3.15 The final chart 11-71 1 Algorithms for NLP
Time Complexity of Chart Parsing Algorithm iterates for each word of input (i.e. iterations) How many keys can be created in a single iteration? Each key in Agenda has the form 1, 1 Thus keys Each key 1 can extend active arcs of the form 1, for 1 Each key can thus extend at most active arcs Time required for each iteration is thus 2 Time bound on entire algorithm is therefore 3 12 11-711 Algorithms for NLP
Parsing with a Chart Parser We need to keep back-pointers to the constituents that we combine Each constituent added to the chart must have the form 1 1, where the are pointers to the RHS sub-constituents Pointers must thus also be recorded within the active arcs At the end - reconstruct a parse from the back-pointers 13 11-711 Algorithms for NLP
Parsing with a Chart Parser Efficient Representation of Ambiguities a Local Ambiguity - multiple ways to derive the same substring from a non-terminal What do local ambiguities look like with Chart Parsing? Multiple items in the chart of the form 1 1, with the same, and. Local Ambiguity Packing: create a single item in the Chart for, with pointers to the various possible derivations. can then be a sufficient back-pointer in the chart Allows to efficiently represent a very large number of ambiguities (even exponentially many) Unpacking - producing one or more of the packed parse trees by following the back-pointers. 15 11-711 Algorithms for NLP
Improved Chart Parser Increasing the Efficiency of the Algorithm Observation: The algorithm creates constituents that cannot fit into a global parse structure. Example: 2 3 These can be eliminated using predictions Using Top-down predictions: For each non-terminal, define the set of all leftmost non-terminals derivable from Compute in advance for all non-terminals When processing the key 1 we look for grammar rules that form an active arc 2 Add the active arc only if is predicted: there already exists an active arc 1 such that For 1 (arcs of the form 2 1 1 ), add the active arc only if 16 11-711 Algorithms for NLP
Improved Chart Parser Simple Procedure for Computing : (1) For every A, Cur-First[A] = { A } (2) For every grammar rule A --> B..., Cur-First[A] = Union( Cur-First[A], { B }) (3) Updated = T While Updated do Updated = F For every A do Old-First[A] = Cur-First[A] For each B in Old-First[A] do Cur-First[A] = Union( Cur-First[A], Cur-First[B]) If Cur-First[A]!= Old-First[A] then Updated = T (4) For every A do First[A] = Cur-First[A] 17 11-711 Algorithms for NLP
Chart Parsing - Example The input: ".I. = The large can can hold the water"