Deductive Parsing with Sequentially Indexed Grammars

Size: px
Start display at page:

Download "Deductive Parsing with Sequentially Indexed Grammars"

Transcription

1 Deductive Parsing with Sequentially Indexed Grammars Jan van Eijck May 25, 2005 Abstract This paper extends the Earley parsing algorithm for context free languages [3] to the case of sequentially indexed languages. Sequentially indexed languages are related to indexed languages [1, 2]. The difference is that parallel processing of index stacks is replaced by sequential processing [4]. This paper contains the full code of an implementation in Haskell [6], in literate programming style [7], of an algorithm for deductive parsing based on [8], focussing on the case of an Earley style parsing algorithm for sequentially indexed languages. Keywords: Deductive parsing, context free grammars, indexed languages, nested stack automata, Earley parsing algorithm, Haskell, literate programming. 1 Introduction Indexed grammars [1] are quadruples G = (N, T, P, S), where N is a finite nonterminal alphabet, T is a finite terminal alphabet, N T =, P is a finite set of productions of the form (X, α) with X N N, α (N N T ), where N is the set of all X Y with X, Y N, and S N is the start symbol. A production (X, α) is written as X α. Let G = (N, T, P, S) be an indexed grammar. A pair (X, [X 1,..., X n ]), with X, X 1,..., X n N is called an indexed nonterminal. Indexed nonterminals are written as X X1 X n. Let N be the set of all X X1 X n, with X, X 1,..., X n N. Then a sentential form for G is a string α in (N T ). To define the one step derivation relation, we need a preliminary definition: Definition 1 If δ (N N T ) and γ N, then δ γ is given by the following recursion: ɛ ζ = ɛ (w : δ) ζ = w : δ ζ if w T, (Y : δ) ζ = Y ζ : δ ζ if Y N, (Y Z : δ) ζ = Y (Z:ζ) : δ ζ if Y Z N, 1

2 Note that as a special case of this, we have that (Y Z ) θ = Y Z:θ. Using the definition of δ γ, we can define one step derivations: Definition 2 Let α, β be sentential forms for indexed grammar G. Then α G β iff 1. α = γ 1 X ζ γ 2, X δ is a production of the grammar, and β = γ 1 δ ζ γ 2, or, 2. α = γ 1 X (Y :ζ) γ 2, X Y δ is a production of the grammar, and β = γ 1 δ ζ γ 2 In terms of this, α G β is defined in the usual way. This definition is equivalent to the definition in [1]. Sequentially indexed grammars use indices that get pushed to an arbitrary nonterminal in the righthand side of a production. Sequentially indexed grammars look just like indexed grammars, but the definition of derivation is different. The following definition uses list concatenation If ζ is the result of concatenating ζ 1 and ζ 2, we denote this as ζ = ζ 1 ++ζ 2. Definition 3 If δ (N N T ) and γ N, then (δ) ζ is the subset of (N T ) defined recursively as: (ɛ) [] = {ɛ} (ɛ) ζ = if ζ [], (w : δ) ζ = {w : δ δ δ ζ } if w T, (C : δ) ζ = {C ζ1 : δ ζ = ζ 1 ++ζ 2, δ δ ζ2 } if C N (C Y : δ) ζ = {C Y :ζ1 : δ ζ = ζ 1 ++ζ 2, δ δ ζ2 } if C Y N. The relation of one-step derivation is defined in terms of (δ) ζ, as follows: Definition 4 Let α, β be sentential forms for indexed grammar G. Then α G β iff 1. α = γ 1 B ζ γ 2, B δ is a production of the grammar, and β = γ 1 δ γ 2, where δ in (δ) γ, or 2. α = γ 1 B (Y :ζ) γ 2, B Y δ is a production of the grammar, and β = γ 1 δ γ 2, where δ (δ) ζ. In derivations with sequentially indexed grammars, stacks are never allowed to disappear, and stacks are never allowed to get duplicated. In particular, a production B ɛ will not allow a one-step derivation like B Y Y X ɛ, and a production B CD will not allow a one-step derivation like B Y Y X C Y Y X D Y Y X (but it will allow B Y Y X C Y Y X D, B Y Y X C Y Y D X, B Y Y X C Y D Y X, and B Y Y X CD Y Y X ). A production like B X ɛ can lead to a one-step derivation B X ɛ. This effectively treats X as a trace. Sequentially indexed grammars are different from an earlier proposal for a restricted form of indexed grammars, in [5]. Gazdar proposed to use index lists that get copied to a single nonterminal in the righthand sides of productions, but in such a way that this heir-nonterminal has to be indicated in the rule. 2

3 2 General Data Structures module DPS where import List import Char import System.IO.Unsafe (unsafeperformio) Terminal and nonterminal symbols: data Symbol a b = T a N b D b I b b deriving (Eq,Ord,Read) The D nonterminal is useful for extending a grammar with a new start symbol. The I and J indicate nonterminals indexed with another nonterminal (the distinction is useful for indicating whether an index has been pushed to the stack or not). The S nonterminal indicates a nonterminal indexed with a stack of nonterminals. Given show functions for the types a and b, we define a show function for Symbol a b as follows: instance (Show a, Show b) => Show (Symbol a b) where show (T x) = show x show (N x) = show x show (D x) = # : show x show (I x y) = show x ++ "[" ++ show y ++ "]" The property of being a nonterminal: nonterm :: Symbol a b -> Bool nonterm (T _) = False nonterm _ = True Category of a nonterminal: 3

4 ntcat :: Symbol a b -> [b] ntcat (N x) = [x] ntcat (I x _) = [x] ntcat _ = [] Index of a nonterminal: ntidx :: Symbol a b -> [b] ntidx (N x) = [] ntidx (I _ y) = [y] ntidx _ = [] The property of being a dummy symbol dummy :: Symbol a b -> Bool dummy (D _) = True dummy _ = False The property of being an indexed symbol: indexed :: Symbol a b -> Bool indexed (I ) = True indexed _ = False Grammar rules: data Rule a b = Rule (Symbol a b) [Symbol a b] deriving Eq A show function for grammar rules. instance (Show a, Show b) => Show (Rule a b) where show (Rule y zs) = show y ++ "-->" ++ show zs 4

5 Reading a grammar rule: instance (Read a, Read b) => Read (Rule a b) where readsprec p = \ r -> [ (Rule symbol rhs,u) (symbol,s) <- reads r, ("-->", t) <- lex s, (rhs, u) <- reads t ] Example: DPIL> read "N S --> [T a, N S, T a ]" :: Rule Char Char S -->[ a, S, a ] Functions for accessing the left- and righthand sides of a rule. lhs :: Rule a b -> Symbol a b lhs (Rule x ys) = x rhs :: Rule a b -> [Symbol a b] rhs (Rule x ys) = ys Function for counting the number of nonterminals in the righthand side of a rule: ntc :: [Symbol a b] -> Int ntc [] = 0 ntc (N _:rest) = 1 + ntc rest ntc (I : rest) = 1 + ntc rest ntc (_ : rest) = ntc rest A grammar is a list of rules: type Grammar a b = [Rule a b] When specifying a grammar we adopt the convention that the lefthandside symbol of the first grammar rule is the start symbol. 5

6 start :: Grammar a b -> Symbol a b start grammar = lhs (head grammar) Converting a list of strings into a grammar: readgrammar :: (Read a, Read b) => [String] -> Grammar a b readgrammar ls = map (read :: (Read a, Read b) => String -> Rule a b) ls where ls = filter nonempty ls nonempty = \ s -> dropwhile isspace s /= [] A function for reading a grammar from a file. getgrammar :: (Read a, Read b) => FilePath -> IO (Grammar a b) getgrammar filename = do str <- readfile filename return (readgrammar (lines str)) Same, avoiding the IO monad: getgr :: (Read a, Read b) => FilePath -> Grammar a b getgr filename = unsafeperformio (getgrammar filename) 3 Example Grammars for CF Languages For concreteness sake, let us assume that terminal and nonterminal symbols are of type Char. Here is an example grammar, read in from file grammar0 (it is assumed that the file grammar0 is in the current directory): DPS> getgr "grammar0" :: Grammar String String ["S"-->["a","S","b"],"S"-->["a","b"]] 6

7 Here is another example grammar. grammar1 :: Grammar Char Char grammar1 = [Rule (N S ) [T a, N S, T a ], Rule (N S ) [T b, N S, T b ], Rule (N S ) [T a ], Rule (N S ) [T b ] ] An example of a grammar with epsilon rules: grammar2 :: Grammar Char Char grammar2 = [Rule (N S ) [T a, N S, T a ], Rule (N S ) [T b, N S, T b ], Rule (N S ) [T a ], Rule (N S ) [T b ], Rule (N S ) [] ] A grammar for balanced parentheses: grammar3 :: Grammar Char Char grammar3 = [Rule (N S ) [T (, N S, T ), N S ], Rule (N S ) [] ] 4 Grammars for Non-CF Languages grammar4 :: Grammar Char Char grammar4 = [Rule (N S ) [T a, I S X ], Rule (N S ) [N A ], Rule (I A X ) [T b, N A, T c ], Rule (N A ) [] ] 7

8 grammar5 :: Grammar Char Char grammar5 = [Rule (N S ) [T a,i S X ], Rule (N S ) [T b,i S Y ], Rule (N S ) [N A ], Rule (I A X ) [N A, T a ], Rule (I A Y ) [N A, T b ], Rule (N A ) [] ] grammar6 :: Grammar Char Char grammar6 = [Rule (N A ) [I A X ], Rule (N A ) [N B ], Rule (I B X ) [T a, N B ], Rule (N B ) [] ] 5 Derivation Trees Here is a data type for derivation trees: data Tree a b = Leaf a Node b [b] [Tree a b] deriving (Eq,Ord,Show) Here is an example: tree0 = Node S [] [Leaf a, Leaf b ] Displaying a tree on the screen: 8

9 displaytree :: (Show a, Show b) => Tree a b -> IO() displaytree tr = mapm_ putstrln (showtree 0 tr) where showtree :: (Show a, Show b) => Int -> Tree a b -> [String] showtree i (Leaf x) = [(map (\ _ -> ) [1..i]) ++ show x] showtree i (Node x [] ts) = ((map (\ _ -> ) [1..i]) ++ show x) : concat (map (showtree (i+5)) ts) showtree i (Node x xs ts) = ((map (\ _ -> ) [1..i]) ++ show x ++ show xs) : concat (map (showtree (i+5)) ts) The example tree gets displayed as follows: DPIL> displaytree tree0 S a b Displaying a tree list: displaytrees :: (Show a, Show b) => [Tree a b] -> IO() displaytrees trees = sequence_ (map displaytree trees) 6 Earley Items, Axioms, Goals, Consequences Earley items Earley items for context free parsing are of the form i, A α β, j. They consist of a rule A αβ with a in its righthand side to indicate the part of the righthand side that was recognized so far, a pointer i to the parent node where the rule was invoked, and a pointer j to the position in the input that recognition has reached. For parsing indexed languages, we will use three extra components: 1. A stack of the indices at the point where the rule was invoked, 2. A stack of indices for the first nonterminal to the right of, 3. A stack of indices for the tail of the nonterminal list to the right of. We will use Greek letters η, ζ, θ for index stacks, 9

10 The item format now becomes: i, θ, A α β, η, ζ, j where θ, η, ζ are stacks of indices (nonterminals). The item indicates the following: grammar rule A αβ was invoked at point i, at the point of invocation, the top node A has associated stack θ, at point j, part α of the righthand side of the rule has been successfully recognized, η is the stack for the first nonterminal among β (if β has no nonterminals, then η is empty), ζ is the stack for the remainder of the nonterminals in β (if β has less than two nonterminals, then ζ is empty). For good measure, we also include a derivation tree component, by putting a list of derivation trees as the last component of an Earley item. data Item a b = Item Int [b] (Symbol a b) [Symbol a b] [Symbol a b] [b] [b] Int [Tree a b] deriving (Eq,Ord) A show function for items, using * for the dot, and suppressing the derivation tree component. 10

11 instance (Show a, Show b) => Show (Item a b) where show (Item i theta b symbols symbols eta zeta j ts) = "(" ++ show i ++ "," ++ show theta ++ "," ++ show b ++ "==>" ++ show symbols ++ "*" ++ show symbols ++ "," ++ show eta ++ "," ++ show zeta ++ "," ++ show j ++ ")" A function for extracting the list of derivation trees from an Earley item: gettrees :: Item a b -> [Tree a b] gettrees (Item i theta b symbols symbols eta zeta j ts) = ts Axiom In the case of Earley parsing with CF grammars, there is one axiom. It has the form 0, S S, 0, where S is the start symbol of the grammar and S is a new start symbol. Adapting this to the case of parsing with sequentially indexed grammars, the axiom takes the shape 0, [], S S, [], [], 0, indicating that at the beginning of the parse, there is one pending nonterminal, and all stack components are empty. axioms :: Grammar a b -> [Item a b] axioms grammar = [Item 0 [] (D x) [] [N x] [] [] 0 []] where (N x) = start grammar Goal In the case of Earley parsing with CF grammars, there is one goal. It has the form 0, S S, n, where S is the start symbol of the grammar, S is the new start symbol used in the axiom, and n is the length of the input. For the case of Earley style parsing with indexed grammars, we also require that the index stack components are empty at the end of the parse, so the goal shape becomes: 0, [], S S, [], [], n. 11

12 Here is a function for recognizing goals: goal :: (Eq a, Eq b) => Grammar a b -> [a] -> Item a b -> Bool goal grammar tokens (Item i theta symbol symbols symbols eta zeta k trees) = i == 0 && theta == [] && dummy symbol && symbols == [start grammar] && symbols == [] && eta == [] && zeta == [] && k == length tokens Consequences As in the case of Earley parsing with CF grammars, there are three kinds of consequences, for scanning, prediction and completion. consequences :: (Eq a,eq b) => Grammar a b -> [a] -> Item a b -> [Item a b] -> [Item a b] consequences grammar tokens trigger stored = scan tokens trigger ++ predict tokens grammar trigger ++ complete grammar trigger stored Scanning The scanning rule for Earley parsing with CF grammars is the rule that shifts the bullet across a terminal. It has the form (derivation tree component omitted): i, A α wβ, j i, A αw β, j

13 For parsing sequentially indexed languages, three index stack components are added to this. Scanning does not change the index stacks θ, η, ζ. i, θ, A α wβ, η, ζ, j i, θ, A αw β, η, ζ, j + 1 scan :: (Eq a,eq b) => [a] -> Item a b -> [Item a b] scan tokens (Item i theta a alpha [] eta zeta j ts) = [] scan tokens (Item i theta a alpha (symbol:beta) eta zeta j ts) j >= length tokens = [] otherwise = [ Item i theta a (alpha ++ [symbol]) beta eta zeta (j+1) (ts ++ [Leaf (tokens!! j)]) symbol == (T (tokens!! j)) ] Prediction The prediction rule for Earley parsing is the rule that initializes a new rule B γ on the basis of a premisse indicating that B is expected at the current point in the input. In the CF grammar case it has the following form (derivation tree component omitted): i, A α Bβ, j B γ j, B γ, j In the case of Earley-style parsing with sequentially indexed grammars this splits into four rules. The rules split the first index stack. For this we need some terminology. If γ is a list of grammar symbols and η, η, η are index stacks, then c(γ) is the number of nonterminals in γ, and C(η, η, η, γ) is the following constraint: η = η ++η (c(γ) = 0 η = []) (c(γ) = 1 η = []). Splitting a list in two sublists: 13

14 split :: [a] -> [([a],[a])] split [] = [([],[])] split (x:xs) = ([],x:xs): map (\ (us,vs) -> (x:us,vs)) (split xs) Implementation of the constraint: constraint :: (Eq a, Eq b) => ([b],[b],[symbol a b]) -> Bool constraint (stack1,stack2,symbols) = (ntc symbols /= 0 (stack1 == [] && stack2 == [])) && (ntc symbols /= 1 stack2 == []) The first prediction rule covers the case of an expected nonterminal B matched against a rule with head B. The rule distributes the appropriate stack over the new item, in accordance with the constraint imposed by the number of nonterminals in the righthand side of the grammar rule used in the prediction. i, θ, A α Bβ, η, ζ, j j, η, B γ, η, η, j B γ, C(η, η, η, γ) The second rule covers the case of an expected nonterminal B matched against a rule with head B X. This rule pops the index stack associated with B. i, θ, A α Bβ, (X : η), ζ, j j, η, B X γ, η, η B X γ, C(η, η, η, γ), j The third rule covers the case of an expected nonterminal B Y matched against a rule B γ: i, θ, A α B Y β, η, ζ, j j, (Y : η), B γ, η, η, j B γ, C(Y : η, η, η, γ), n j > η Note the side condition on the rule. The side condition prevents unlimited growth of the stack. This is needed to prevent a rule like A A Y from causing an unbounded number of pushes. The fourth rule covers the case of an expected nonterminal B Y matched against a rule B Y γ: i, θ, A α B Y β, η, ζ, j j, η, B Y γ, η, η, j B Y γ, C(η, η, η, γ) If no further symbols are expected, nothing is predicted: 14

15 predict :: (Eq a,eq b) => [a] -> Grammar a b -> Item a b -> [Item a b] predict tokens grammar (Item i theta a alpha [] eta zeta j ts) = [] If a nonterminal without index is expected, we get: predict tokens grammar (Item i theta a alpha (N x:beta) eta zeta j ts) = [ Item j eta (N x) [] gamma eta eta j [] Rule (N z) gamma <- grammar, (eta,eta ) <- split eta, x == z, constraint (eta,eta,gamma) ] ++ [ Item j (tail eta) (I x y) [] gamma eta eta j [] Rule (I x y) gamma <- grammar, x == x, eta /= [], head eta == y, (eta,eta ) <- split (tail eta), constraint (eta,eta,gamma) ] If a nonterminal with an index is expected, we get: 15

16 predict tokens grammar (Item i theta a alpha (I x y:beta) eta zeta j ts) = [ Item j (y:eta) (N x) [] gamma eta eta j [] Rule (N x ) gamma <- grammar, (eta,eta ) <- split (y:eta), x == x, constraint (eta,eta,gamma), length tokens - j > length eta ] ++ [ Item j eta (I x y) [] gamma eta eta j [] Rule (I x y ) gamma <- grammar, x == x, y == y, (eta,eta ) <- split eta, constraint (eta,eta,gamma) ] Finally, we need a catch-all clause to indicate that these are all the predict consequences. This covers the case where the next expected symbol is a terminal. predict tokens grammar (Item i theta a alpha beta eta zeta j ts) = [] Completion The completion rule for Earley parsing is the rule that shifts the bullet across a non-terminal. It has two premisses, and it is of the following form (derivation tree component 16

17 omitted): i, A α Bβ, k k, B γ, j i, A αb β, j For the case of Earley-style parsing with sequentially indexed grammars, this splits into four rules, as follows. The first rule checks that the lefthand tail index stack of the first premisse matches the head index stack of the second premisse, for the case of a match of expected symbol B against completed rule B γ. i, θ, A α Bβ, η, ζ, k k, η, B γ, [], [], j i, θ, A αb β, ζ, ζ C(ζ, ζ, ζ, β), j The second rule covers the case of a match of expected symbol B against completed rule B Y γ. i, θ, A α Bβ, (Y : η), ζ, k k, η, B Y γ, [], [], j i, θ, A αb β, ζ, ζ C(ζ, ζ, ζ, β), j The third rule covers the case of a match of expected symbol B Y against completed rule B γ. i, θ, A α B Y β, η, ζ, k k, (Y : η), B γ, [], [], j i, θ, A αb Y β, ζ, ζ C(ζ, ζ, ζ, β), j The fourth rule covers the case of a match of expected symbol B Y B Y γ. against completed rule i, θ, A α B Y β, η, ζ, k k, η, B Y γ, [], [], j i, θ, A αb Y β, ζ, ζ C(ζ, ζ, ζ, β), j In the implementation this is handled by distinguishing four cases: Trigger of the form i, θ, A α Bβ, η, ζ, k: look for completed item with head B or B Y on the chart. Trigger of the form i, θ, A α B Y β, η, ζ, k: look for completed item with head B or B Y on the chart. Trigger of the form k, η, B γ, [], [], j: look for item with expected symbol B or B Y the chart. on Trigger of the form k, η, B Y γ, [], [], j: look for item with expected symbol B or B Y on the chart. 17

18 complete :: (Eq a, Eq b) => Grammar a b -> Item a b -> [Item a b] -> [Item a b] complete grammar (Item i theta a alpha (N x:beta) eta zeta k ts) stored = [ Item i theta a (alpha++[n x]) beta zeta zeta j (ts ++ [Node x eta ts ]) (Item k eta symbol gamma [] [] [] j ts ) <- stored, (zeta,zeta ) <- split zeta, constraint (zeta,zeta,beta), k == k, eta == eta, symbol == N x ] ++ [ Item i theta a (alpha++[n x]) beta zeta zeta j (ts ++ [Node x eta ts ]) (Item k eta (I x y) gamma [] [] [] j ts ) <- stored, k == k, x == x, eta /= [], head eta == y, tail eta == eta, (zeta,zeta ) <- split zeta, constraint (zeta,zeta,beta) ] 18

19 complete grammar (Item i theta a alpha (I x y:beta) eta zeta k ts) stored = [ Item i theta a (alpha++[i x y]) beta zeta zeta j (ts ++ [Node x eta ts ]) (Item k eta symbol gamma [] [] [] j ts ) <- stored, eta /= [], head eta == y, tail eta == eta, (zeta,zeta ) <- split zeta, constraint (zeta,zeta,beta), k == k, symbol == N x ] ++ [ Item i theta a (alpha++[i x y]) beta zeta zeta j (ts ++ [Node x eta ts ]) (Item k eta symbol gamma [] [] [] j ts ) <- stored, k == k, symbol == I x y, eta == eta, (zeta,zeta ) <- split zeta, constraint (zeta,zeta,beta) ] 19

20 complete grammar (Item k eta (N x) gamma [] [] [] j ts) stored = [ Item i theta a (alpha++[n x]) beta zeta zeta j (ts ++ [Node x eta ts]) (Item i theta a alpha (symbol:beta) eta zeta k ts ) <- stored, k == k, eta == eta, symbol == N x, (zeta,zeta ) <- split zeta, constraint (zeta,zeta,beta) ] ++ [ Item i theta a (alpha++[i x y]) beta zeta zeta j (ts ++ [Node x eta ts]) (Item i theta a alpha (I x y:beta) eta zeta k ts ) <- stored, k == k, eta /= [], head eta == y, tail eta == eta, x == x, (zeta,zeta ) <- split zeta, constraint (zeta,zeta,beta) ] complete grammar (Item k eta (I x y) gamma [] [] [] j ts) stored = [ Item i theta a (alpha++[n x]) beta zeta zeta j (ts ++ [Node x (y:eta) ts]) (Item i theta a alpha (symbol:beta) (y:eta ) zeta k ts ) <- stored, k == k, eta == eta, symbol == N x, (zeta,zeta ) <- split zeta, constraint (zeta,zeta,beta) ] ++ [ Item i theta a (alpha++[i x y]) beta zeta zeta j (ts ++ [Node x (y:eta) ts]) (Item i theta a alpha (I x y :beta) eta zeta k ts ) <- stored, k == k, x == x, y == y, eta == eta, (zeta,zeta ) <- split zeta, constraint (zeta,zeta,beta) ] 20

21 In the implementation, we have to also specify what happens to premisses of the form This is the final case of the catch-all pattern. i, θ, A α wβ, η, ζ, k. complete grammar item stored = [] This completes the Earley-specific part of the story. 7 Chart and Agenda A chart plus agenda is a pair of item lists. Call this datatype a store. type Store a b = ([Item a b],[item a b]) The idea is to use the agenda for those items that have been proved, but whose direct consequences have not yet been derived, and the chart for the proved items the consequences of which have also been computed. We start out with an empty chart and with a list of all axioms on the agenda. initstore :: (Eq a, Eq b) => Grammar a b -> [a] -> Store a b initstore grammar tokens = ([], axioms grammar) Next, we tackle the items on the agenda one by one: add their consequences to the agenda. move them from the agenda to the chart (as their consequences have been computed). 21

22 exhaustagenda :: (Eq a, Eq b) => Grammar a b -> [a] -> Store a b -> Store a b exhaustagenda grammar tokens (chart,[]) = (chart,[]) exhaustagenda grammar tokens (chart,agenda@(trigger:rest)) = exhaustagenda grammar tokens (newchart,newagenda) where newchart = chart ++ [trigger] store = chart ++ agenda conseq = consequences grammar tokens trigger chart new = conseq \\ store newagenda = rest ++ new Check whether a goal item has been found, and return the list of goal items. goalfound :: (Eq a, Eq b) => Grammar a b -> [a] -> [Item a b] -> [Item a b] goalfound grammar tokens store = filter gl store where gl = goal grammar tokens If a parse is successful, it is nice to display the chart: display :: Show a => [a] -> IO() display [] = return () display (x:xs) = do print x display xs Rather than displaying the whole chart, we will display only the records of the nodes that have been successfully created. To that end, we prune the chart using the following filter: pruned :: (Eq a, Eq b) => [Item a b] -> [Item a b] pruned = filter (\ (Item i theta s symbols symbols eta zeta j ts) -> symbols == []) As output of a parse we allow either a parsetree or a chart, depending on a boolean trigger. 22

23 data OutputKind = Tree Chart deriving Eq Parsing is now a matter of initializing the store, exhausting the agenda, and checking whether a goal item has been found in the chart. parse :: (Eq a, Show a, Eq b, Show b) => Grammar a b -> [a] -> OutputKind -> IO() parse grammar tokens output = if goals /= [] then if output == Tree then displaytrees ptrees else display (pruned chart) else putstrln "no parse" where goals = goalfound grammar tokens chart ptrees = gettrees (head goals) ptree = head (ptrees) init = initstore grammar tokens result = exhaustagenda grammar tokens init chart = fst result Incomplete parses (for debugging): iparse :: (Eq a, Show a, Eq b, Show b) => Grammar a b -> [a] -> IO() iparse grammar tokens = display chart where init = initstore grammar tokens result = exhaustagenda grammar tokens init chart = fst result Parsing with a grammar read from a file: prs :: String -> [String] -> OutputKind -> IO() prs string tokens output = do grammar <- getgrammar string :: IO(Grammar String String) parse grammar tokens output 23

24 8 Testing parsetest :: (Eq a, Eq b) => Grammar a b -> [a] -> Bool parsetest grammar tokens = goals /= [] where goals = goalfound grammar tokens chart init = initstore grammar tokens result = exhaustagenda grammar tokens init chart = fst result test :: (Eq a, Show a, Eq b, Show b) => (Grammar a b, [a]) -> String test (grammar, tokens) = if parsetest grammar tokens then show grammar ++ " " ++ show tokens ++ " succeeds" else show grammar ++ " " ++ show tokens ++ " fails" 24

25 suite1 :: [(Grammar Char Char, [Char])] suite1 = [ (grammar1, ""), (grammar1, "abba"), (grammar1, "aba"), (grammar2, ""), (grammar2, "aba"), (grammar2, "abba"), (grammar2, "aaabbaaa"), (grammar3, ""), (grammar3, "(()())"), (grammar3, "(()()"), (grammar3, "((((())))()"), (grammar3, "((((())))())"), (grammar4, ""), (grammar4, "aabbcc"), (grammar4, "aabbbcc"), (grammar4, "aabbbccc"), (grammar4, "aaaaabbbbbccccc"), (grammar5, ""), (grammar5, "aabaaab"), (grammar5, "aabaab"), (grammar5, "aaaaabbaaaaabb"), (grammar6, ""), (grammar6, "a"), (grammar6, "ab") ] runtests :: IO() runtests = sequence_ (map (putstrln. test) suite1) 9 Function for Stand-alone Use Module declaration: 25

26 module Main where import DPS import System Definition of main function: main :: IO() main = do args <- getargs prs (args!! 0) (words (args!! 1)) Tree 26

27 This allows: sig]$ more grammar6 N "S" --> [N "NP", N "VP"] N "VP" --> [N "TV", N "NP"] N "VP" --> [T "talked"] N "VP" --> [T "smiled"] N "NP" --> [N "Det", N "CN"] N "NP" --> [T "John"] N "NP" --> [T "Mary"] N "TV" --> [T "loved"] N "TV" --> [T "hated"] N "Det" --> [T "the"] N "Det" --> [T "some"] N "CN" --> [T "man"] N "CN" --> [T "woman"] N "CN" --> [N "CN", T "that", I "S" "NP"] I "NP" "NP" --> [] [jve@water sig]$ runhugs Main grammar6 "John hated the man that loved Mary" "S" "NP" "John" "VP" "TV" "hated" "NP" "Det" "the" "CN" "CN" "man" "that" "S"["NP"] "NP"["NP"] "VP" "TV" "loved" "NP" "Mary" [jve@water sig]$ References [1] Aho, A. V. Indexed grammars an extension of context-free grammars. Journal of the ACM 15, 4 (1968),

28 [2] Aho, A. V. Nested stack automata. Journal of the ACM 16, 3 (1969), [3] Earley, J. An efficient context-free parsing algorithm. Communications of the ACM 13 (1970), [4] Eijck, J. v. Sequentially indexed grammars. manuscript, Centre for Mathematics and Computer Science, Amsterdam, [5] Gazdar, G. Applicability of indexed grammars to natural languages. In Natural Language Parsing and Linguistic Theories, U. Reyle and C. Rohrer, Eds. Reidel, Dordrecht, 1988, pp [6] Jones, S. P., Hughes, J., et al. Report on the programming language Haskell 98. Available from the Haskell homepage: [7] Knuth, D. Literate Programming. CSLI Lecture Notes, no. 27. CSLI, Stanford, [8] Shieber, S., Schabes, Y., and Pereira, F. Principles and implementation of deductive parsing. Journal of Logic Programming 24 (1995),

Parsing. Earley Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 39

Parsing. Earley Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 39 Parsing Earley Parsing Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 39 Table of contents 1 Idea 2 Algorithm 3 Tabulation 4 Parsing 5 Lookaheads 2 / 39 Idea (1) Goal: overcome

More information

Parsing. Cocke Younger Kasami (CYK) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 35

Parsing. Cocke Younger Kasami (CYK) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 35 Parsing Cocke Younger Kasami (CYK) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 35 Table of contents 1 Introduction 2 The recognizer 3 The CNF recognizer 4 CYK parsing 5 CYK

More information

Parsing a primer. Ralf Lämmel Software Languages Team University of Koblenz-Landau

Parsing a primer. Ralf Lämmel Software Languages Team University of Koblenz-Landau Parsing a primer Ralf Lämmel Software Languages Team University of Koblenz-Landau http://www.softlang.org/ Mappings (edges) between different representations (nodes) of language elements. For instance,

More information

Andreas Daul Natalie Clarius

Andreas Daul Natalie Clarius Parsing as Deduction M. Shieber, Y. Schabes, F. Pereira: Principles and Implementation of Deductive Parsing Andreas Daul Natalie Clarius Eberhard Karls Universität Tübingen Seminar für Sprachwissenschaft

More information

CS 314 Principles of Programming Languages

CS 314 Principles of Programming Languages CS 314 Principles of Programming Languages Lecture 5: Syntax Analysis (Parsing) Zheng (Eddy) Zhang Rutgers University January 31, 2018 Class Information Homework 1 is being graded now. The sample solution

More information

Bottom-Up Parsing. Lecture 11-12

Bottom-Up Parsing. Lecture 11-12 Bottom-Up Parsing Lecture 11-12 (From slides by G. Necula & R. Bodik) 9/22/06 Prof. Hilfinger CS164 Lecture 11 1 Bottom-Up Parsing Bottom-up parsing is more general than topdown parsing And just as efficient

More information

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview Tokens and regular expressions Syntax and context-free grammars Grammar derivations More about parse trees Top-down and bottom-up

More information

Compilers. Bottom-up Parsing. (original slides by Sam

Compilers. Bottom-up Parsing. (original slides by Sam Compilers Bottom-up Parsing Yannis Smaragdakis U Athens Yannis Smaragdakis, U. Athens (original slides by Sam Guyer@Tufts) Bottom-Up Parsing More general than top-down parsing And just as efficient Builds

More information

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones Parsing III (Top-down parsing: recursive descent & LL(1) ) (Bottom-up parsing) CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones Copyright 2003, Keith D. Cooper,

More information

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468 Parsers Xiaokang Qiu Purdue University ECE 468 August 31, 2018 What is a parser A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure

More information

Wednesday, September 9, 15. Parsers

Wednesday, September 9, 15. Parsers Parsers What is a parser A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure of a program (think: diagramming a sentence) Agenda

More information

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs: What is a parser Parsers A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure of a program (think: diagramming a sentence) Agenda

More information

Chapter 3: Lexing and Parsing

Chapter 3: Lexing and Parsing Chapter 3: Lexing and Parsing Aarne Ranta Slides for the book Implementing Programming Languages. An Introduction to Compilers and Interpreters, College Publications, 2012. Lexing and Parsing* Deeper understanding

More information

Standard prelude. Appendix A. A.1 Classes

Standard prelude. Appendix A. A.1 Classes Appendix A Standard prelude In this appendix we present some of the most commonly used definitions from the standard prelude. For clarity, a number of the definitions have been simplified or modified from

More information

Syntax Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Syntax Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay Syntax Analysis (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay September 2007 College of Engineering, Pune Syntax Analysis: 2/124 Syntax

More information

Programming Languages 3. Definition and Proof by Induction

Programming Languages 3. Definition and Proof by Induction Programming Languages 3. Definition and Proof by Induction Shin-Cheng Mu Oct. 22, 2015 Total Functional Programming The next few lectures concerns inductive definitions and proofs of datatypes and programs.

More information

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview n Tokens and regular expressions n Syntax and context-free grammars n Grammar derivations n More about parse trees n Top-down and

More information

Logical Methods in... using Haskell Getting Started

Logical Methods in... using Haskell Getting Started Logical Methods in... using Haskell Getting Started Jan van Eijck May 4, 2005 Abstract The purpose of this course is to teach a bit of functional programming and logic, and to connect logical reasoning

More information

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino 3. Syntax Analysis Andrea Polini Formal Languages and Compilers Master in Computer Science University of Camerino (Formal Languages and Compilers) 3. Syntax Analysis CS@UNICAM 1 / 54 Syntax Analysis: the

More information

Context-free grammars

Context-free grammars Context-free grammars Section 4.2 Formal way of specifying rules about the structure/syntax of a program terminals - tokens non-terminals - represent higher-level structures of a program start symbol,

More information

Let s Talk About Logic

Let s Talk About Logic Let s Talk About Logic Jan van Eijck CWI & ILLC, Amsterdam Masterclass Logica, 2 Maart 2017 Abstract This lecture shows how to talk about logic in computer science. To keep things simple, we will focus

More information

Types of parsing. CMSC 430 Lecture 4, Page 1

Types of parsing. CMSC 430 Lecture 4, Page 1 Types of parsing Top-down parsers start at the root of derivation tree and fill in picks a production and tries to match the input may require backtracking some grammars are backtrack-free (predictive)

More information

FROWN An LALR(k) Parser Generator

FROWN An LALR(k) Parser Generator FROWN An LALR(k) Parser Generator RALF HINZE Institute of Information and Computing Sciences Utrecht University Email: ralf@cs.uu.nl Homepage: http://www.cs.uu.nl/~ralf/ September, 2001 (Pick the slides

More information

Lecture Bottom-Up Parsing

Lecture Bottom-Up Parsing Lecture 14+15 Bottom-Up Parsing CS 241: Foundations of Sequential Programs Winter 2018 Troy Vasiga et al University of Waterloo 1 Example CFG 1. S S 2. S AyB 3. A ab 4. A cd 5. B z 6. B wz 2 Stacks in

More information

Formal Languages and Compilers Lecture VI: Lexical Analysis

Formal Languages and Compilers Lecture VI: Lexical Analysis Formal Languages and Compilers Lecture VI: Lexical Analysis Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/ artale/ Formal

More information

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant Syntax Analysis: Context-free Grammars, Pushdown Automata and Part - 4 Department of Computer Science and Automation Indian Institute of Science Bangalore 560 012 NPTEL Course on Principles of Compiler

More information

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars CMSC 330: Organization of Programming Languages Context Free Grammars Where We Are Programming languages Ruby OCaml Implementing programming languages Scanner Uses regular expressions Finite automata Parser

More information

MA513: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 18 Date: September 12, 2011

MA513: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 18 Date: September 12, 2011 MA53: Formal Languages and Automata Theory Topic: Context-free Grammars (CFG) Lecture Number 8 Date: September 2, 20 xercise: Define a context-free grammar that represents (a simplification of) expressions

More information

Models of Computation II: Grammars and Pushdown Automata

Models of Computation II: Grammars and Pushdown Automata Models of Computation II: Grammars and Pushdown Automata COMP1600 / COMP6260 Dirk Pattinson Australian National University Semester 2, 2018 Catch Up / Drop in Lab Session 1 Monday 1100-1200 at Room 2.41

More information

Wednesday, August 31, Parsers

Wednesday, August 31, Parsers Parsers How do we combine tokens? Combine tokens ( words in a language) to form programs ( sentences in a language) Not all combinations of tokens are correct programs (not all sentences are grammatically

More information

Assignment 4 CSE 517: Natural Language Processing

Assignment 4 CSE 517: Natural Language Processing Assignment 4 CSE 517: Natural Language Processing University of Washington Winter 2016 Due: March 2, 2016, 1:30 pm 1 HMMs and PCFGs Here s the definition of a PCFG given in class on 2/17: A finite set

More information

Monday, September 13, Parsers

Monday, September 13, Parsers Parsers Agenda Terminology LL(1) Parsers Overview of LR Parsing Terminology Grammar G = (Vt, Vn, S, P) Vt is the set of terminals Vn is the set of non-terminals S is the start symbol P is the set of productions

More information

CSCE 314 TAMU Fall CSCE 314: Programming Languages Dr. Flemming Andersen. Haskell Functions

CSCE 314 TAMU Fall CSCE 314: Programming Languages Dr. Flemming Andersen. Haskell Functions 1 CSCE 314: Programming Languages Dr. Flemming Andersen Haskell Functions 2 Outline Defining Functions List Comprehensions Recursion 3 Conditional Expressions As in most programming languages, functions

More information

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved. Syntax Analysis Prof. James L. Frankel Harvard University Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved. Context-Free Grammar (CFG) terminals non-terminals start

More information

Compiler Construction: Parsing

Compiler Construction: Parsing Compiler Construction: Parsing Mandar Mitra Indian Statistical Institute M. Mitra (ISI) Parsing 1 / 33 Context-free grammars. Reference: Section 4.2 Formal way of specifying rules about the structure/syntax

More information

Lecture 8: Context Free Grammars

Lecture 8: Context Free Grammars Lecture 8: Context Free s Dr Kieran T. Herley Department of Computer Science University College Cork 2017-2018 KH (12/10/17) Lecture 8: Context Free s 2017-2018 1 / 1 Specifying Non-Regular Languages Recall

More information

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7 Top-Down Parsing and Intro to Bottom-Up Parsing Lecture 7 1 Predictive Parsers Like recursive-descent but parser can predict which production to use Predictive parsers are never wrong Always able to guess

More information

Context-Free Languages and Parse Trees

Context-Free Languages and Parse Trees Context-Free Languages and Parse Trees Mridul Aanjaneya Stanford University July 12, 2012 Mridul Aanjaneya Automata Theory 1/ 41 Context-Free Grammars A context-free grammar is a notation for describing

More information

4. Lexical and Syntax Analysis

4. Lexical and Syntax Analysis 4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal

More information

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6 Compiler Design 1 Bottom-UP Parsing Compiler Design 2 The Process The parse tree is built starting from the leaf nodes labeled by the terminals (tokens). The parser tries to discover appropriate reductions,

More information

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7 Top-Down Parsing and Intro to Bottom-Up Parsing Lecture 7 1 Predictive Parsers Like recursive-descent but parser can predict which production to use Predictive parsers are never wrong Always able to guess

More information

Bottom-Up Parsing. Lecture 11-12

Bottom-Up Parsing. Lecture 11-12 Bottom-Up Parsing Lecture 11-12 (From slides by G. Necula & R. Bodik) 2/20/08 Prof. Hilfinger CS164 Lecture 11 1 Administrivia Test I during class on 10 March. 2/20/08 Prof. Hilfinger CS164 Lecture 11

More information

Introduction to Programming, Aug-Dec 2006

Introduction to Programming, Aug-Dec 2006 Introduction to Programming, Aug-Dec 2006 Lecture 3, Friday 11 Aug 2006 Lists... We can implicitly decompose a list into its head and tail by providing a pattern with two variables to denote the two components

More information

4. Lexical and Syntax Analysis

4. Lexical and Syntax Analysis 4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal

More information

Parsing - 1. What is parsing? Shift-reduce parsing. Operator precedence parsing. Shift-reduce conflict Reduce-reduce conflict

Parsing - 1. What is parsing? Shift-reduce parsing. Operator precedence parsing. Shift-reduce conflict Reduce-reduce conflict Parsing - 1 What is parsing? Shift-reduce parsing Shift-reduce conflict Reduce-reduce conflict Operator precedence parsing Parsing-1 BGRyder Spring 99 1 Parsing Parsing is the reverse of doing a derivation

More information

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4) CS1622 Lecture 9 Parsing (4) CS 1622 Lecture 9 1 Today Example of a recursive descent parser Predictive & LL(1) parsers Building parse tables CS 1622 Lecture 9 2 A Recursive Descent Parser. Preliminaries

More information

CS 457/557: Functional Languages

CS 457/557: Functional Languages CS 457/557: Functional Languages Lists and Algebraic Datatypes Mark P Jones Portland State University 1 Why Lists? Lists are a heavily used data structure in many functional programs Special syntax is

More information

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing Roadmap > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing The role of the parser > performs context-free syntax analysis > guides

More information

CSCI312 Principles of Programming Languages!

CSCI312 Principles of Programming Languages! CSCI312 Principles of Programming Languages!! Chapter 3 Regular Expression and Lexer Xu Liu Recap! Copyright 2006 The McGraw-Hill Companies, Inc. Clite: Lexical Syntax! Input: a stream of characters from

More information

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing.

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing. LR Parsing Compiler Design CSE 504 1 Shift-Reduce Parsing 2 LR Parsers 3 SLR and LR(1) Parsers Last modifled: Fri Mar 06 2015 at 13:50:06 EST Version: 1.7 16:58:46 2016/01/29 Compiled at 12:57 on 2016/02/26

More information

Syntax Analysis Part I

Syntax Analysis Part I Syntax Analysis Part I Chapter 4: Context-Free Grammars Slides adapted from : Robert van Engelen, Florida State University Position of a Parser in the Compiler Model Source Program Lexical Analyzer Token,

More information

Ambiguous Grammars and Compactification

Ambiguous Grammars and Compactification Ambiguous Grammars and Compactification Mridul Aanjaneya Stanford University July 17, 2012 Mridul Aanjaneya Automata Theory 1/ 44 Midterm Review Mathematical Induction and Pigeonhole Principle Finite Automata

More information

Lecture Notes on Shift-Reduce Parsing

Lecture Notes on Shift-Reduce Parsing Lecture Notes on Shift-Reduce Parsing 15-411: Compiler Design Frank Pfenning, Rob Simmons, André Platzer Lecture 8 September 24, 2015 1 Introduction In this lecture we discuss shift-reduce parsing, which

More information

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata Lexical Analysis Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata Phase Ordering of Front-Ends Lexical analysis (lexer) Break input string

More information

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Lecture 11 Ana Bove April 26th 2018 Recap: Regular Languages Decision properties of RL: Is it empty? Does it contain this word? Contains

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler

More information

UNIT III & IV. Bottom up parsing

UNIT III & IV. Bottom up parsing UNIT III & IV Bottom up parsing 5.0 Introduction Given a grammar and a sentence belonging to that grammar, if we have to show that the given sentence belongs to the given grammar, there are two methods.

More information

1 Lexical Considerations

1 Lexical Considerations Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Spring 2013 Handout Decaf Language Thursday, Feb 7 The project for the course is to write a compiler

More information

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam Compilers Parsing Yannis Smaragdakis, U. Athens (original slides by Sam Guyer@Tufts) Next step text chars Lexical analyzer tokens Parser IR Errors Parsing: Organize tokens into sentences Do tokens conform

More information

Syntax Analysis Check syntax and construct abstract syntax tree

Syntax Analysis Check syntax and construct abstract syntax tree Syntax Analysis Check syntax and construct abstract syntax tree if == = ; b 0 a b Error reporting and recovery Model using context free grammars Recognize using Push down automata/table Driven Parsers

More information

LL(1) predictive parsing

LL(1) predictive parsing LL(1) predictive parsing Informatics 2A: Lecture 11 Mary Cryan School of Informatics University of Edinburgh mcryan@staffmail.ed.ac.uk 10 October 2018 1 / 15 Recap of Lecture 10 A pushdown automaton (PDA)

More information

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters : Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Scanner Parser Static Analyzer Intermediate Representation Front End Back End Compiler / Interpreter

More information

Talen en Compilers. Jurriaan Hage , period 2. November 13, Department of Information and Computing Sciences Utrecht University

Talen en Compilers. Jurriaan Hage , period 2. November 13, Department of Information and Computing Sciences Utrecht University Talen en Compilers 2017-2018, period 2 Jurriaan Hage Department of Information and Computing Sciences Utrecht University November 13, 2017 1. Introduction 1-1 This lecture Introduction Course overview

More information

Chapter 3. Describing Syntax and Semantics

Chapter 3. Describing Syntax and Semantics Chapter 3 Describing Syntax and Semantics Chapter 3 Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax Attribute Grammars Describing the Meanings of Programs:

More information

CSCE 314 Programming Languages

CSCE 314 Programming Languages CSCE 314 Programming Languages Syntactic Analysis Dr. Hyunyoung Lee 1 What Is a Programming Language? Language = syntax + semantics The syntax of a language is concerned with the form of a program: how

More information

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1 Defining Program Syntax Chapter Two Modern Programming Languages, 2nd ed. 1 Syntax And Semantics Programming language syntax: how programs look, their form and structure Syntax is defined using a kind

More information

SLR parsers. LR(0) items

SLR parsers. LR(0) items SLR parsers LR(0) items As we have seen, in order to make shift-reduce parsing practical, we need a reasonable way to identify viable prefixes (and so, possible handles). Up to now, it has not been clear

More information

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Formal Languages and Compilers Lecture VII Part 3: Syntactic A Formal Languages and Compilers Lecture VII Part 3: Syntactic Analysis Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/

More information

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators Jeremy R. Johnson 1 Theme We have now seen how to describe syntax using regular expressions and grammars and how to create

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler

More information

LANGUAGE PROCESSORS. Presented By: Prof. S.J. Soni, SPCE Visnagar.

LANGUAGE PROCESSORS. Presented By: Prof. S.J. Soni, SPCE Visnagar. LANGUAGE PROCESSORS Presented By: Prof. S.J. Soni, SPCE Visnagar. Introduction Language Processing activities arise due to the differences between the manner in which a software designer describes the

More information

Parser Tools: lex and yacc-style Parsing

Parser Tools: lex and yacc-style Parsing Parser Tools: lex and yacc-style Parsing Version 6.11.0.6 Scott Owens January 6, 2018 This documentation assumes familiarity with lex and yacc style lexer and parser generators. 1 Contents 1 Lexers 3 1.1

More information

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38 Syntax Analysis Martin Sulzmann Martin Sulzmann Syntax Analysis 1 / 38 Syntax Analysis Objective Recognize individual tokens as sentences of a language (beyond regular languages). Example 1 (OK) Program

More information

LR Parsing Techniques

LR Parsing Techniques LR Parsing Techniques Introduction Bottom-Up Parsing LR Parsing as Handle Pruning Shift-Reduce Parser LR(k) Parsing Model Parsing Table Construction: SLR, LR, LALR 1 Bottom-UP Parsing A bottom-up parser

More information

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1 Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1 1. Introduction Parsing is the task of Syntax Analysis Determining the syntax, or structure, of a program. The syntax is defined by the grammar rules

More information

Lecture 7: Deterministic Bottom-Up Parsing

Lecture 7: Deterministic Bottom-Up Parsing Lecture 7: Deterministic Bottom-Up Parsing (From slides by G. Necula & R. Bodik) Last modified: Tue Sep 20 12:50:42 2011 CS164: Lecture #7 1 Avoiding nondeterministic choice: LR We ve been looking at general

More information

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous. Section A 1. What do you meant by parser and its types? A parser for grammar G is a program that takes as input a string w and produces as output either a parse tree for w, if w is a sentence of G, or

More information

Chapter 4: LR Parsing

Chapter 4: LR Parsing Chapter 4: LR Parsing 110 Some definitions Recall For a grammar G, with start symbol S, any string α such that S called a sentential form α is If α Vt, then α is called a sentence in L G Otherwise it is

More information

CMSC 330: Organization of Programming Languages. Context Free Grammars

CMSC 330: Organization of Programming Languages. Context Free Grammars CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler

More information

PROGRAMMING IN HASKELL. Chapter 2 - First Steps

PROGRAMMING IN HASKELL. Chapter 2 - First Steps PROGRAMMING IN HASKELL Chapter 2 - First Steps 0 The Hugs System Hugs is an implementation of Haskell 98, and is the most widely used Haskell system; The interactive nature of Hugs makes it well suited

More information

Lecture 9: General and Bottom-Up Parsing. Last modified: Sun Feb 18 13:49: CS164: Lecture #9 1

Lecture 9: General and Bottom-Up Parsing. Last modified: Sun Feb 18 13:49: CS164: Lecture #9 1 Lecture 9: General and Bottom-Up Parsing Last modified: Sun Feb 18 13:49:40 2018 CS164: Lecture #9 1 A Little Notation Here and in lectures to follow, we ll often have to refer to general productions or

More information

In One Slide. Outline. LR Parsing. Table Construction

In One Slide. Outline. LR Parsing. Table Construction LR Parsing Table Construction #1 In One Slide An LR(1) parsing table can be constructed automatically from a CFG. An LR(1) item is a pair made up of a production and a lookahead token; it represents a

More information

Context-Free Grammars

Context-Free Grammars Department of Linguistics Ohio State University Syntax 2 (Linguistics 602.02) January 3, 2012 (CFGs) A CFG is an ordered quadruple T, N, D, P where a. T is a finite set called the terminals; b. N is a

More information

PROGRAMMING IN HASKELL. CS Chapter 6 - Recursive Functions

PROGRAMMING IN HASKELL. CS Chapter 6 - Recursive Functions PROGRAMMING IN HASKELL CS-205 - Chapter 6 - Recursive Functions 0 Introduction As we have seen, many functions can naturally be defined in terms of other functions. factorial :: Int Int factorial n product

More information

The List Datatype. CSc 372. Comparative Programming Languages. 6 : Haskell Lists. Department of Computer Science University of Arizona

The List Datatype. CSc 372. Comparative Programming Languages. 6 : Haskell Lists. Department of Computer Science University of Arizona The List Datatype CSc 372 Comparative Programming Languages 6 : Haskell Lists Department of Computer Science University of Arizona collberg@gmail.com All functional programming languages have the ConsList

More information

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology MIT 6.035 Parse Table Construction Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Parse Tables (Review) ACTION Goto State ( ) $ X s0 shift to s2 error error goto s1

More information

Chapter 4. Lexical and Syntax Analysis

Chapter 4. Lexical and Syntax Analysis Chapter 4 Lexical and Syntax Analysis Chapter 4 Topics Introduction Lexical Analysis The Parsing Problem Recursive-Descent Parsing Bottom-Up Parsing Copyright 2012 Addison-Wesley. All rights reserved.

More information

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence. Bottom-up parsing Recall For a grammar G, with start symbol S, any string α such that S α is a sentential form If α V t, then α is a sentence in L(G) A left-sentential form is a sentential form that occurs

More information

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Massachusetts Institute of Technology Language Definition Problem How to precisely define language Layered structure

More information

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

MIT Specifying Languages with Regular Expressions and Context-Free Grammars MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Language Definition Problem How to precisely

More information

Lexical and Syntax Analysis. Top-Down Parsing

Lexical and Syntax Analysis. Top-Down Parsing Lexical and Syntax Analysis Top-Down Parsing Easy for humans to write and understand String of characters Lexemes identified String of tokens Easy for programs to transform Data structure Syntax A syntax

More information

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1 CS453 : JavaCUP and error recovery CS453 Shift-reduce Parsing 1 Shift-reduce parsing in an LR parser LR(k) parser Left-to-right parse Right-most derivation K-token look ahead LR parsing algorithm using

More information

Lexical Analysis (ASU Ch 3, Fig 3.1)

Lexical Analysis (ASU Ch 3, Fig 3.1) Lexical Analysis (ASU Ch 3, Fig 3.1) Implementation by hand automatically ((F)Lex) Lex generates a finite automaton recogniser uses regular expressions Tasks remove white space (ws) display source program

More information

Languages and Compilers

Languages and Compilers Principles of Software Engineering and Operational Systems Languages and Compilers SDAGE: Level I 2012-13 3. Formal Languages, Grammars and Automata Dr Valery Adzhiev vadzhiev@bournemouth.ac.uk Office:

More information

Lecture 8: Deterministic Bottom-Up Parsing

Lecture 8: Deterministic Bottom-Up Parsing Lecture 8: Deterministic Bottom-Up Parsing (From slides by G. Necula & R. Bodik) Last modified: Fri Feb 12 13:02:57 2010 CS164: Lecture #8 1 Avoiding nondeterministic choice: LR We ve been looking at general

More information

INFOB3TC Solutions for the Exam

INFOB3TC Solutions for the Exam Department of Information and Computing Sciences Utrecht University INFOB3TC Solutions for the Exam Johan Jeuring Monday, 13 December 2010, 10:30 13:00 lease keep in mind that often, there are many possible

More information

The CYK Algorithm. We present now an algorithm to decide if w L(G), assuming G to be in Chomsky Normal Form.

The CYK Algorithm. We present now an algorithm to decide if w L(G), assuming G to be in Chomsky Normal Form. CFG [1] The CYK Algorithm We present now an algorithm to decide if w L(G), assuming G to be in Chomsky Normal Form. This is an example of the technique of dynamic programming Let n be w. The natural algorithm

More information

COMPILER DESIGN - QUICK GUIDE COMPILER DESIGN - OVERVIEW

COMPILER DESIGN - QUICK GUIDE COMPILER DESIGN - OVERVIEW COMPILER DESIGN - QUICK GUIDE http://www.tutorialspoint.com/compiler_design/compiler_design_quick_guide.htm COMPILER DESIGN - OVERVIEW Copyright tutorialspoint.com Computers are a balanced mix of software

More information

Unit 13. Compiler Design

Unit 13. Compiler Design Unit 13. Compiler Design Computers are a balanced mix of software and hardware. Hardware is just a piece of mechanical device and its functions are being controlled by a compatible software. Hardware understands

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Context Free Grammars and Parsing 1 Recall: Architecture of Compilers, Interpreters Source Parser Static Analyzer Intermediate Representation Front End Back

More information