5.1 Subprograms and Procedures. "INPUTSET.x" procedure z := "GCD.x" := 33, 111

Size: px
Start display at page:

Download "5.1 Subprograms and Procedures. "INPUTSET.x" procedure z := "GCD.x" := 33, 111"

Transcription

1 Chapter 5 Advanced Topics If a little knowledge is dangerous, where is the man who has so much as to be out of danger? Thomas Huxley There are a number of interesting topics that cannot all be covered in a term For this book, so closely tied to a term project, it is also important to get to the operational material early so that the projects can get underway The writers were faced with a choice: present all of the material in its most natural order and depend on the instructor to pick and skip, or to present one complete track through the material and then organize the rest as additional topics, some of which can serve as lecture material toward the end of the term, when it is too late to change the course of a term project but not too late to think about the next project Thus Chapters 1 4 are designed to mesh with the stages of the project There are few sections in them that can be skipped There are a few topics that should not be skipped, but interact with the main track in such a way as to make it more difficult to present They include two final features of x: procedures and macros, and the bottom-up alternative for parsing Static Analysis, macros, linkers and loaders and some other sections are optional 51 Subprograms and Procedures subprogram "INPUTSETx" procedure z := "GCDx" := 33, 111 The inition of X, and therefore the implementation of X, was left incomplete in Chapters 1, 2, and 3 Two of the missing features are presented here While the details differ from similar constructs in traditional programming languages, the implementations are conventional The consequence is that, like 177

2 178 CHAPTER 5 ADVANCED TOPICS the earlier semantic material, this material too can be applied to a variety of languages Definition of Subprogram A complete X program can be inserted into a statement list in any other program without violating syntactic or scoping rules The meaning of insertion is as if the combined programs were analyzed after the insertion If the inserted program is a block, its private variables are, by inition, kept separate from the variables in the surrounding text The global variables of the inserted block are the only path of information into and out of the block A program that, when run by itself, caused input or output may no longer cause input or output after combination with another program Output is a response to a variable never used on the right The other program may contain a left use of variable so that in the combined program the variable is used on the left and right, and therefore no longer used for output Suppose a program has a some input variables If one inserts a list of assignments to those variables into that program, the assignments will take precedence over the implicit input caused by right-only use in the containing program This is how batch input is realized One can also analogously cause batch output by inserting a list of assignments to new, unique, names The form of insertion is syntactically a statement: "ProgramName" where the program name inside quotes is known to the system Typically it will be a file name 1 It has exactly the same kind of semantics as the typical include statement such as that of C although it is somewhat more interesting (surprising?) in X because of the inference mechanisms for variable type and use Definition of Procedure A subprogram with parameters is a procedure containing the program: Supposing GCDx is a file x, y := X, Y; it if x < y -> y := y - x :: x > y -> x := x - y :: x = y -> exit fi ti; Z := x 1 This design has good points and bad points The string quotes free the name from the syntactic constraints of x On the other hand one may find procedure names constrained by file naming conventions

3 51 SUBPROGRAMS AND PROCEDURES 179 If run as a program, it would request input for X and Y and report the value of Z One might invoke GCDx as a procedure by: gcd := "GCDx" := 17, 51 The effect of the invocation is as if the file GCDx were surrounded in its implicit be eb pair, all its outer variables made private, an assignment of the actual parameter values to the input variables placed immediately after the nomenclature, and an assignment of the output variables (in this case, Z) placed immediately before the closing eb The result, for GCDx, is as follows: be x y X Y Z X, Y := 17, 51; "GCDx"; gcd := Z eb The order of the actual parameters is determined by the order of appearance of the input and output variables in the text of the procedure inition This information is typically supplied by the procedure author in documentation rather than by the user examining the text of the inition Running a program standalone in Hyper exhibits the input and output variables, in order, to assist the documentor A procedure call is ined as if by substitution The invocation is replaced by the ining block described above The number of actual parameters must match the number of input and output variables A procedure may be ined in terms of itself For file GCDx we might have had instead: 2 x,y := X, Y; if x = y -> z := y :: x < y -> z := "GCDx" := (y-1)//x+1, x :: x > y -> z := "GCDx" := y, x fi; Z := z Because of the potential for name clashes, the concepts of preactive and postactive regions are essential to the understanding of recursive procedures Even when GCDx is nested within itself, the inputs and outputs are kept separate despite the fact they have the same names in each nesting One can see this by carrying out the nesting one more level in the GCDx example above It is also interesting that a recursive procedure has an infinite inition The inition needs to be examined in any one execution only as deeply as the recursion actually goes There is a well understood mechanism to get the effect of recursion without needing to copy the inition 2 Reminder: operator // means remainder

4 180 CHAPTER 5 ADVANCED TOPICS Run Stack Implementing Recursion 52 Static Analysis Control Paths The access patterns of a program in D are represented as a regular expression over a vocabulary consisting of sets of variable names {x, y } a set of names V p set of names in construct p path x (p) r/w sequence for x in p B set of block designators Definitions: B {0, 1, 2 } (51) V x {x} (52) V c {} (53) V p V p (54) V p q V p V q (55) V (p) V p (56) { r x S read x (S) (57) λ x S { w x S write x (S) (58) λ x S path x (x 1, x k := p 1, p k ) read x ( V pi )write x ( {x i }) i i (59) path x (abort) λ (510) path x (skip) λ (511) path x (exit) λ (512) path x (if path x (pq) path x (p)path x (q) (513) p i q i fi) read x ( V pi )( path i x (q i )) (514) i i path x (it p ti) (path x (p)) + (515) Table 51: Static Analysis 53 Macros A macro is a text transformation mechanism

5 54 LINKERS AND LOADERS Linkers and Loaders In modern languages the process of getting a program into execution involves more than translation to machine executable form Typically the unit of compilation, called a module, is less than a whole program The translation of a single module must leave behind a record that can eventually be combined with other compilation records so that when all of the modules have been translated, they can be combined into a single runnable program One way is just to compile them all together not a bad idea when the compiler is very fast and the program is not too large But, in fact, programs are too large One s own relatively modest program may have to be combined with a much larger program written by someone else It is often the case that the source files for the other program are not available 3 The compilation record is called an object module The program that combines object modules is called a linker The program that takes a linked program and places it in execution is called a loader The linker and loader are often combined in one Object Files seeobjc Self-suspension 55 Automatic Parsing I never have a computer do something I can do by hand a mathematician I never do anything by hand I can do with a computer a hacker That mathematician had better not construct parsers a compiler writer The recursive descent technique for writing parsers requires the programmer to write 10 or so lines of recursive code for each nonterminal in the grammar There is another method, usually called bottom-up, in contrast to the top-down recursive descent, for building parsers for which no parser code need be written and for which it is guaranteed that there are no parser errors Bottom-up parsers also have a better chance to recover from input errors and proceed with useable analysis beyond the point of the first error The disadvantages with the bottomup method are that a table building tool must be available or be constructed, and a grammar for the source language, obeying the restrictions of the table builder, must be prepared There is no significant practical difference in the resulting parsers 3 Source files can get lost, become out of date with respect to available compilers, be in the wrong language, or even contain trade secrets

6 182 CHAPTER 5 ADVANCED TOPICS There are many ways of building bottom-up parsers The dominant technology goes under the initials lr One develops the lr technology by considering the input text as the catenation of a parse stack ρ and an as yet unprocessed input δ Then one reduces the process of parsing to a sequence of just two kinds of actions: shift move one symbol from the head of δ to the top of ρ, and reduce apply a rule to the top of ρ, thereby rewriting its rightmost symbols Recalling Exercise 1d in Chapter 2, start the process with ρ = λ and δ = f t If some authorative sargeant bawled out the cadence shift, reduce by r8, reduce by r6, reduce by r4, reduce by r2, shift, shift, reduce by r7, reduce by r6, reduce by r4, reduce by r1, reduce by r0, parser halt!, the parser could respond with the steps ρ λ f Boolean Complement Conjunction Disjunction Disjunction Disjunction t Disjunction Boolean Disjunction Complement Disjunction Conjunction Disjunction Proposition δ f t t t t t t t λ λ λ λ λ λ What we need is the sargeant and the parser It is tempting to think that the sargeant need not name the rule, since the right side of the rule must match the tail of ρ, but in fact more than one rule might match For example, whenever rule r1 is applied, rule r2 also matches the tail of ρ How, then, can the sargeant decide which rule to apply? The answer is lookahead There is nothing fundamental in this choice, it reflects the idea that the input is coming off the input tape and therefore only the next few symbols are conveniently available for examination A language for which the reduce choices can be made by looking ahead k symbols is called lr(k) The combinatorics of looking at k symbols keeps k = 1 for all practical purposes Language designers have learned to live within this constraint The lookahead might accidentally look beyond the input (as in the application of rule r1 above) To keep things simple, and within the model of using only

7 55 AUTOMATIC PARSING 183 grammars for language description, all lr(k) grammars are given an end-of-file symbol ; exactly k are appended to the last applied rule of the grammar In the case of the lr(1) grammar in Table 21, the first rule would become Proposition: Disjunction The implementation of is achieved by having the scanner return when the end of file is detected The many versions of lr differ principally in how lookahead is computed and used, in the size of the tables needed, in the generality of the resulting parser, and in the speed with which it operates One particular version, called lalr(1), is presented in this section Finite state, lr(0), and slr(1) parsers are also presented because they are intermediate steps in the construction of lalr(1) parsers Finite Automata Finite automata (fa) are abstract algorithms for the recognition of strings They are closely related to regular expressions 4 Automata have many applications They form the basis of pattern matching programs such as Unix egrep and are sometimes used as the basis for scanners The reason for presenting them here is that they are the basis for lr parsers Automata are also called state-transition machines The central idea is that an automaton, at any one time, is in a unique state and can transition to some other state by reading input The transitions are ined by a relation from stateinput pairs to states Transitions on the null string, ie non-reading transitions, are allowed The sequence of input values read is the string that is processed Processing is started in a unique start state At each step the automaton examines the text, and based on its state-transition table, goes to a another state Each time an automaton transitions on an input symbol, that symbol is discarded so that the next symbol may be processed If input appears for which there is no ined transition, the automaton is said to reject the input Processing continues until the input is rejected or there is no more input Whenever the automaton is in a designated final state, the input read so far is said to have been accepted Starting and stopping the automaton is done outside of it; typically an automaton is stopped when an end-of-input is detected or a final state accepting an end-of-input symbol is reached Finite automata can be deterministic (dfa) or nondeterministic (nfa) If the state transition relation is a single-valued function, and there are no null transitions, the automaton is a dfa A dfa executes in time proportional to the length of the input and is therefore convenient and efficient on conventional computers; a nfa is harder to execute Unfortunately, nfa often arise naturally in applications Fortunately there is an algorithm to transform any nfa into an equivalent dfa 4 See Section 25and Section 56 r0

8 184 CHAPTER 5 ADVANCED TOPICS One can draw an intuitive diagram representing an automaton The diagram below recognizes properly rounded values approximating 2/3 The value to be recognized must start with "0", is followed by any number of 6 s and terminates with a 7 Each state is boxed; the initial state is labelled G; the final state is double boxed G 0 A 6 B 7 C Figure 51 A dfa for rounded values of 2/3 Exercises 1 [1,1] Draw a diagram for a dfa that recognizes positive integers 5 2 [1,1] Draw a diagram for a dfa that recognizes rounded values of 1/7 3 [1,1] Suppose that you have a dfa that recognizes truncated representations of fraction 1/n How can you transform it into a DFA that recognizes rounded representations of 1/n? (Hint: does your solution work for 1/101?) 4 [1,1] Draw a diagram for a dfa that recognizes any sequence of nickels and dimes (N and D) that adds up to a quarter (Hint: let state k represent an accumulation of 5k cents) 5 [1,1] Draw a diagram for an automaton that recognizes a sequence of zero or more a s, followed by a sequence of zero or more b s, followed by one c (Hint: there is a 3-state nfa solution) 6 [1,1] Use a regular expression to describe the strings accepted by each of the automata in this list of exercises Definition of Finite Automata A ab (516) A B (517) A λ (518) Table 52: Schema for Automata 5 The statement of a recognition condition for an automaton implies in addition and rejects anything else

9 55 AUTOMATIC PARSING 185 An automaton can be ined by a cfg If the rules in Π are restricted to one of the three forms shown in Table 52 then a recognizer similar to the diagram in Figure 51 can be built The nonterminals correspond to states, the rules correspond to transitions and the terminals are the input The goal symbol is the start state The first kind of fa rule (Equation 516) ines the transitions (often called shifts) The shifts are deterministic if A V N a V T size({a ab A ab Π}) 1 That is to say, for no nonterminal A is there more than one shift ined for any terminal a The second kind of rule (Equation 517) is called an empty transition 6 If the shifts are deterministic and there are no empty transitions, then the automaton is deterministic The third kind of rule (Equation 518) is a final transition If one uses a fa grammar to produce a string, one starts with the goal symbol, and at each stage discards one nonterminal in the text and replaces it with another, until a final transition is applied, leaving no further nonterminals The set of states with final transitions is the set of final states of the automaton: V F = {A A λ Π} (519) The requirement that there be final states in a fa is the same as the constraint on cfgs to avoid nonterminating rules 7 Exercises 7 [1,1] Write down the grammars ining the automata derived in the previous set of exercises 8 [1,1] Write a program to execute the dfa in Figure 51 9 [1,1] Write a program to execute any dfa (Hint: represent Π as a 2- dimensional matrix The matrix ines a function mapping each statesymbol pair into a state and V N as a vector recording which states are final and which are not) 10 [1,1] Write a program to execute any nfa (Hint: provide backtracking) 11 [1,1] Show how to derive a grammar A for the sequence of states passed to accept a string from the grammar A ining the fa (Hint V T = V N) 12 [1,1] Show how to derive a grammar A for the sequence of transitions applied to accept a string from the grammar A ining the fa (Hint: V T = Π) 6 An empty transition is often called an ɛ transition in literature using letter ɛ to denote the empty string 7 See Section 23

10 186 CHAPTER 5 ADVANCED TOPICS Construction of a DFA from a NFA Given a nfa A, there is a dfa A such that L(A ) = L(A) The construction for V T, V N, G and Π follows The central idea is that the set of states of the nfa which can be reached by some string α are all mapped into a single state of the dfa Once the construction is complete, the states of the dfa can be relabelled to simplify the description of the dfa The construction of Π and V N is mutually dependent, as represented by the mutually recursive formulas below The construction is based on a function S which collects the set of states S(A, a) that can be reached from the set of states A via a transition on terminal symbol a The value of S is often the empty set, corresponding to an error state (rejection state) in the resulting dfa C(A ) S(A, a) = {B A A B V N A B} (520) = {B A C(A ) A ab Π} (521) Using S and C we can compute A V T G V N V F Π A = V T = C({G}) = D(Π ) = {A A V N A V F {}} = {A ab A V N B = S(A, a)} {A λ A V F } = V T, V N, G, Π Table 53: nfa to dfa Transformation To start the mutual recursion off, the start symbol G may be put into V N The error state is a member of V N in any constructed dfa that does not accept all strings (ie, L(A ) VT ) This happens because S(A, a) is empty whenever there is no transition on a out of any set A A in the nfa The error state is the unique place to which all rejected strings take the dfa The error state has many entries and no exits it is a trap from which no string returns The existence of the error state insures that the state-transition function for the dfa is a single-valued function Exercises 13 [1,1] Show A C({A}) 14 [1,1] Show A C(A ) 15 [1,1] Show that the error state is not a final state

11 55 AUTOMATIC PARSING [1,1] Show {} V N iff L(A ) V T 17 [1,1] Show that for any constructed dfa A size(π ) = size(v N ) size(v T ) + size(v F ) 18 [1,1] Suppose the nfa dfa transformation were applied to a dfa Would the input and output of the transformation necessarily be the same? 19 [1,1] Suppose the nfa dfa transformation were applied twice, to a nfa and then to the resulting dfa Would the input and output of the second transformation necessarily be the same? Example The following nfa is an answer to Exercise 5, to build an automaton to recognize a b c It is a nfa V T = {a, b, c} V N = {A, B, C} G = A V F = {C} Π = {A aa, A B, B bb, B cc, C λ} A = V T, V N, G, Π The corresponding dfa for a b c follows V T = {a, b, c} V N = {{A, B}, {B}, {C}, {}} G = {A, B} V F = {{C}} Π = {A, B} a{a, B}, {B} a{}, {C} a{}, {} a{}, {A, B} b{b}, {B} b{b}, {C} b{}, {} b{}, {A, B} c{c}, {B} c{c}, {C} c{}, {} c{}, {C} λ A = V T, V N, G, Π Exercises Using the example above: 20 [1,1] Verify the nfa dfa transformation

12 188 CHAPTER 5 ADVANCED TOPICS 21 [1,1] Verify the formula in Exercise [1,1] Label the states of the dfa for a b c and draw the diagram for it 23 [1,1] Repeat the previous three exercises for a b c LR Parsers The central fact of lr parsing is that, for any cfg, the set of parse stacks ρ generated during canonical parses is a fa language The application of a cfg rule A α leads to a text transformation ρaδ ραδ where δ is the unprocessed input text The set of all values ρ, over all steps of all parses is the fa language Therefore, there is a dfa to recognize it It is upon this dfa that the lr parser is built The construction of an lalr(1) parser proceeds in six steps The cfg for the language is the starting point 1 The lr(0) nfa for the parse stack is written down; 2 The lr(0) dfa is constructed from the nfa; 3 The lalr(1) shift function is taken from the dfa; 4 The Lookahead Grammar is constructed from the dfa and cfg; 5 The slr(1) lookahead is constructed for the Lookahead Grammar; 6 The slr(1) lookahead is added to the dfa build a lalr(1) reduce function for the cfg The starting point is a context-free grammar G The first objective is the lr(0) parser in Figure 52 It may be interesting to do Exercise 99 and then return to this point and resume reading The LR(0) NFA Construction One characteristic of the material to follow is the construction of grammars from grammars where the symbols of the constructed grammar are parts of the original grammar It is helpful to enhance the notation by introducing meta-brackets to turn strings α into symbols [α] Suppose the language for which a parser is desired is described by cfg G = V T, V N, G, Π The lr(0) nfa is written down as follows:

13 55 AUTOMATIC PARSING 189 V T V N G Π A = V T V N = {[A α] A αβ Π} = [G λ] = {[A α] B[A αb] B V T V N A αbγ Π} {[A α] [B λ] B V N A αbγ Π} {[A α] λ A α Π} = V T, V N, G, Π Table 54: cfg to lr(0) nfa Construction Since the parse stack can have both the terminal and nonterminal symbols from the cfg, the nfa must have transitions ined for all of them the terminal vocabulary V T is the whole vocabulary of the cfg The nonterminal vocabulary is partially complete rules from the cfg Each of them represents the state of having recognized part of that rule The canonical parse in the cfg requires that the rightmost nonterminal is expanded Here is the source of the surprising fact that the parse stack is describable by a fa As the fa walks the stack, it is walking across partially complete right-hand sides of rules Whenever the next symbol in a grammar rule is a nonterminal in the cfg, there are two possibilities Either the nonterminal is already there in the stack part of the right-hand side of one of its rules is already in the stack The first two schema for filling Π in Table 54 reflect these two possibilities The first schema steps to a state including the symbol (terminal or nonterminal) on the stack The second schema starts expanding the nonterminal by an empty transition to a state where the new right-hand side is empty and therefore ready to start being built from the input The third schema terminates the fa because a rule has been fully built on the parse stack There are terminating schema for each rule in the grammar The application of the lr(0) machine will repeatedly arrive in these terminating states and then be restarted to do the next step in the canonical parse The start state of the automaton is a rule for the goal symbol of the cfg with none of the right-hand side built The lr(0) dfa is then computed from the lr(0) nfa Exercises 24 [1,1] Show that the set of parse stacks ρ is a fa language 25 [1,1] Show that for an lr(0) automaton, V F = Π

14 190 CHAPTER 5 ADVANCED TOPICS A Worked Example: LR(0) Construction Consider the following grammar G where where V T = {f, (, ),, } V N = {P, D, C} G = P Π = {r0, r1, r2, r3, r4} r0 = P D r1 = D D C r2 = D C r3 = C f r4 = C ( D ) The grammar consists of five rules, including one ending in, signifying end-of-input The language is a combination of boolean value f, logical or operations and parentheses It is a subset of the language for which a recursive descent parser was written in Chapter 2 This language is picked for the example because it does not require any lookahead This is fine for purposes of illustrating the construction and use of the lr(0) machine, but slightly misleading since every useful programming language does require lookahead An lr(0) parser for this language follows The first step is to write down the nfa for the lr(0) machine The LR(0) NFA (example) V T = {P, D, C,,, f, (, )} V N = {m0, m1, m2, m3, m4, m5, m6, m7, m8, m9, m10, m11, m12} G = m0 V F = {m2, m6, m7, m9, m12} Π = {t0, t1, t2, t3, t4, t5, t6, t7, t8, t9, t10, t11, t12,, t19} A = V T, V N, G, Π where m0 = [P λ] m1 = [P D] m2 = [P D ] m3 = [D λ] m4 = [D D] m5 = [D D ] m6 = [D D C] m7 = [D C] m8 = [C λ] m9 = [C f] m10 = [C (] m11 = [C (D] m12 = [C (D)]

15 55 AUTOMATIC PARSING 191 and t0 = m0 D m1 t1 = m1 m2 t2 = m3 D m4 t3 = m4 m5 t4 = m5 C m6 t5 = m3 C m7 t6 = m8 f m9 t7 = m8 ( m10 t8 = m10 D m11 t9 = m11 ) m12 t10 = m0 m3 t11 = m3 m3 t12 = m3 m8 t13 = m5 m8 t14 = m10 m3 t15 = m2 λ t16 = m6 λ t17 = m7 λ t18 = m9 λ t19 = m12 λ Exercise 26 [1,1] Verify the construction of the example nfa The LR(0) DFA (example) The next step is to compute the dfa A from the nfa A The majority of transitions involve the generated error state something that is relatively unsightly to write down and also not very informative We establish the convention of leaving all transitions to the error state, and the error state itself, out of the displayed computations and also the eventual dfa diagram In an implementation it is, on the other hand, of no cost to retain the error state and its transitions V T = {P, D, C,,, f, (, )} V N = {n0, n1, n2, n3, n4, n5, n6, n7, n8, n9, n10} G = n0 V F = {n2, n3, n5, n8, n9} Π = {v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12,, v19} A = V T, V N, G, Π where n0 = {m0, m3, m8} n1 = {m1, m4} n2 = {m7} n3 = {m9} n4 = {m3, m8, m10} n5 = {m2} n6 = {m5, m8} n7 = {m4, m11} n8 = {m6} n9 = {m12} and v0 = n0 D n1 v1 = n0 C n2 v2 = n0 f n3 v3 = n0 ( n4 v4 = n1 n5 v5 = n1 n6 v6 = n4 D n7 v7 = n4 C n2 v8 = n4 f n3 v9 = n4 ( n4 v10 = n6 C n8 v11 = n6 f n3 v12 = n6 ( n4 v13 = n7 n6 v14 = n7 ) n9 v15 = n2 λ v16 = n3 λ v17 = n5 λ v18 = n8 λ v19 = n9 λ

16 192 CHAPTER 5 ADVANCED TOPICS Exercise 27 [1,1] Verify the construction of the example dfa Diagram of the LR(0) DFA (example) To display the LR(0) machine, it is convenient to use the state names from the construction for the non-final states, and the rule names from the original grammar for the final states This allows the reader to readily apply the diagram to arbitrary input texts The mapping from states to rules here is n2 = r2, n3 = r3, n5 = r0, n8 = r1, n9 = r4 n0 D n1 C r2 r0 C n6 r1 f r3 f r3 ( n4 D n7 ) r4 C r2 ( f r3 Figure 52 A lr(0) dfa Exercises Each of the following grammars poses a problem for lr parsers Construct each lr(0) nfa and dfa Your results will be used in the discussion on lookahead 28 [1,1] lalr(1) shift-reduce conflict for lr(0) G E E T T T x T x 29 [1,1] lalr(1) reduce-reduce conflict for lr(0)

17 55 AUTOMATIC PARSING 193 G E E S x E T z S a T a 30 [1,1] lalr(1) reduce-reduce conflict for slr(1) G E E a T a E b T b E a x b T x 31 [1,1] lalr(1) erasure in the lookahead G E E S x E T U y S a T a U λ 32 [1,1] lalr(1) eats simple lookahead analysis of lr(0) nfa a nqlr example G E E b A d E a A c E b g c E a g d A B B g 33 [1,1] not lalr(1) G E E S x y E T x z S a T a 34 [1,1] Simple ambiguous grammar Parse xxx two ways G E E E E E x 35 [1,1] Classical dangling else ambiguity Parse iixtx two ways

18 194 CHAPTER 5 ADVANCED TOPICS G E E i E E i E t E E x Applying the LR(0) Machine The canonical parse is a sequence of grammar rule applications The rules are applied to the catenation of the parse stack ρ and the remaining input δ Initially ρ is empty and all of δ is available At each rule application, the right side of a rule matches a substring of ρδ The matched substring is removed and is replaced by the left side of the rule The lr(0) machine is repetitively applied, yielding one parse step per application As one might expect, the dfa is started in its initial state When a transition occurs, the transition symbol is taken from δ pushed onto ρ When the lr(0) machine reaches a final state, the grammar rule to be applied is given by the state label, and the right side of that rule is on the top of the parse stack The matched string is popped off ρ and replaced by the left side of the rule The process is repeated, starting again in the initial state and at the left of ρ One of two things finally happens: the goal symbol G of the grammar appears or the lr(0) machine rejects the input In the former case the sequence of rule applications is the canonical parse In the latter case an error diagnostic can be reported parse unread commentary stack input ρ δ (f f) starting in state n0 ( f f) read (, goto n4 (f f) read f, goto r3 (C f) apply r3 (C f) starting in state n0 ( C f) read (, goto n4 (C f) read C, goto r2 (D f) apply r2 (D f) starting in state n0 ( D f) read (, goto n4 (D f) read D, goto n1 (D f) read, goto n6 (D f ) read f, goto r3 (D C ) apply r3 (D C) starting in state n0 ( D C) read (, goto n4 (D C) read D, goto n1 (D C) read, goto n6 (D C ) read C, goto r1

19 55 AUTOMATIC PARSING 195 (D ) apply r1 (D) starting in state n0 ( D) read (, goto n4 (D ) read D, goto n7 (D) read ), goto r4 C apply r4 C starting in state n0 C read C, goto r2 D apply r2 D starting in state n0 D read D, goto n1 D read, goto r0 P apply r0, quit The canonical parse is r3, r2, r3, r1, r4, r2, r0 If ρ is the parse stack and δ the unread input, the invariant G ρδ holds throughout the parse Note that the actions after restarting the dfa are repetitious This is a consequence of the parse stack not changing to the left of the substitution Exercises 36 [1,1] Verify the invariant G ρδ where G = P for the parse of (f f) shown in the previous example 37 [1,1] Apply the lr(0) machine to strings f, (f), f f f, (f f) What is the canonical parse in each case? 38 [1,1] Apply the lr(0) machine to string ff What kind of diagnostic can be generated in this case? In general? 39 [1,1] Invent a hack to avoid the repetitious transitions across the parse stack ρ after a substitution Hint: if p is the length of the canonical parse, and i is the length of the input excluding, the number dfa steps (shift or reduce) should be only 2p + i Using the LR(0) DFA more efficiently (example) The last exercise above hinted at an inefficiency in the applications of the lr(0) machine Suppose we start this time with the valid text f (f) The canonical parse will be a sequence of rule applications resulting in a sequence of forms eventually converging to the goal symbol P It is convenient to write the string and the states of the dfa together, with the states between and below the symbols of the string Whenever the dfa gets to a final state the rewriting is done, removing some symbols from the string and replacing them by the phrase name When symbols are removed, so are the interpolated states which become invalid after the substitution The effect

20 196 CHAPTER 5 ADVANCED TOPICS is that one does not need to start from the left end of the parse stack after each rewrite To reestablish the state, the new phrase name is tacked onto the front of the input and a non-terminal transition gets things going again interpolated δ the stack and input stack n0 f (f) start n0f r3 (f) shift over f n0 C (f) apply rule r3 n0c r2 (f) shift over C n0 D (f) apply rule r2 n0d n1 (f) shift over D n0d n1 n6 (f) shift over n0d n1 n6 ( n4 f) shift over ( n0d n1 n6 ( n4 f r3 ) shift over f n0d n1 n6 ( n4 C) apply rule r3 n0d n1 n6 ( n4 C r2 ) shift over C n0d n1 n6 ( n4 D) apply rule r2 n0d n1 n6 ( n4 D n7 ) shift over D n0d n1 n6 ( n4 D n7 ) r4 shift over ) n0d n1 n6 C apply rule r4 n0d n1 n6 C r1 shift over C n0 D apply rule r1 n0d n1 shift over D n0d n1 r0 λ shift over n0 P apply rule r0, quit The canonical parse is r3, r2, r3, r2, r4, r1, r0 Now one can observe that the symbols in the interpolated stack (as contrasted to the states) are never used That is, the reduce step must discard one state and symbol from the interpolated stack for every symbol on the right side of the applied rule but need examine none of them while doing so The newly exposed top-of-stack is the restarting state A new stack consisting of only the states, can be used in place of ρ This is the form of parse stack used in the rest of this section We will call it the parse state stack to distinguish from the parse stack and use symbol σ to represent it Exercises 40 [1,1] Once again apply the lr(0) machine to strings f, (f), f f f, (f f), but in this case use the parse state stack σ instead of the parse stack ρ 41 [1,1] Does using σ affect the quality of the diagnostics that can be generated? 42 [1,1] Given the lr(0) dfa and some parse state stack σ, show how to compute the corresponding parse stack ρ

21 55 AUTOMATIC PARSING 197 The Failures of LR(0) p0 E p1 p4 G E E T T p2 x p5 T Tx x p3 T x Figure 53 Inadequate LR(0) dfa Taking the grammar from Exercise 28, we get the dfa in Figure 53 The problem arises with state p2, which is both a reduce state for rule E T and also a shift state carrying on by x to state p5 Having arrived in state p2, the sargeant 8 won t know what command to give The answer can be found by examining the lr(0) dfa The only allowed shift in state p2 is on symbol x One could take the attitude shift when you can and that would work in this case One can also imagine doing a trial reduction and see where that would leave the lr(0) machine In state p2, E will be pushed onto the head of the input and the top of σ will be p0 Then, shifting the E goes to state p1 in which only is valid input Thus, if reduce by rule E T (from state p2) was the right answer, the next symbol will surely be This resolves the sargeant s dilema: when in state p2, an x gets shifted but is left alone and a reduce by rule E T is done instead The next task is to generalize and formalize this insight Exercises 43 [1,1] Show that merely letting shift take precedence over reduce is the correct solution for the lr(0) machine in Figure [1,1] Each of the grammars in the set starting with Exercise 27 fails to be lr(0) Identify the failure(s) See if the lr(0) machine contains the resolution to the problem(s) as in the example worked above Lookahead The lr(0) machine must be augmented with lookahead for practical languages In fact, the dfa has been used as a stepping procedure, taking the form (in language x): σ, δ := "step" := σ, δ 8 See page 182

22 198 CHAPTER 5 ADVANCED TOPICS where at each step either a shift takes a symbol from δ and places the new state on the top of σ, or a reduce pops some states off σ and puts a nonterminal on the front of δ The arguments of step range over infinite sets therefore they cannot be directly tabulated The top of the parse state stack s and the leading symbol on the input D are the keys One can implement step with finite tables recording all of the possible decisions There are three possibilities following the arrival at any particular state s: shift (if there is a transition from s ined on D) reduce (if D is in the lookahead) reject (a syntax error has been discovered) What is needed are two functions: s := shift(s, D); which looks up a new state s r := reduce(s, D), which looks up the rule r to be applied Information about rules, such as the length of the rule and which nonterminal it ines, must also be available to the algorithm implementing step The functions shift, and reduce can each be represented by a fixed size table For cfg G = V T, V N, G, Π and derived lr(0) dfa A = V T, V N, G, Π, each table is a matrix of with size(v N ) (size(v T V N ) elements That is, for every state of the dfa and every symbol of the cfg, there is one entry in each table The entries for shift can be picked directly off the lr(0) machine: A DC Π iff shift(a, D) = C (522) Suppose that A V F Then there is some rule r Π from the cfg such that [r] A There are several strategies for recording values for reduce If there is nothing else in A except [r] then the state is lr(0) Only a reduction is allowed In this case it will never cause the parser to fail (although it may delay the detection of a syntax error) to give the value r to reduce for all possible lookaheads A = {[r]} r Π D V reduce(a, D) = r (523) It is only when {[r]} A r Π that lookahead must be added There are two algorithms of interest: slr and lalr, the latter being the more powerful of the two As it happens, the slr algorithm can be used to compute the lalr tables, so it makes sense to present slr first SLR(1) Lookahead Whenever a reduction is applied, a phrase is reduced to a nonterminal Whatever comes next in the input must follow that nonterminal in the cfg In the

23 55 AUTOMATIC PARSING 199 little cfg below (from Figure 53), E occurs once on the right-hand side of a rule There is a terminal symbol to its immediate right Therefore, whenever a reduction to E is made, the only acceptable following symbol is p0 E p1 p4 G E E T T p2 x p5 T Tx x p3 T x There is one other nonterminal in the little cfg It occurs at the right end of a rule ining E, which says that whatever follows E may also follow T T is also followed by x in the cfg We deduce that whenever a T is made, the following symbol may be either x or The slr(1) lookahead for the application of any rule A β is the set of symbols that can follow A In the little cfg, the problem arises in state p2 where shift(p2, x) = p5 and reduce(p2, ) = E T Without the lookahead it would not be clear what to do in state p2 The relation FB, meaning followed-by, is the information needed to compute the slr(1) lookahead Both are ined below: A FB D = α, δ G αadδ (524) r = A β [r] A r Π A FB D reduce(a, D) = r (525) The computation of the FB relation is complicated by erasure 9 If some symbol in a rule might disappear, then things to the right of what was erased must also be recorded as following, and so on All followed-by symbols ultimately derive from symbols next to each other in some grammar rule Supposing that the cfg is not pathological, 10 then A FB D iff γ, X, µ, B, C, ν γ λ X µbγcν Π B ηa C Dζ Symbols B and C are next to each other Any nonterminal ending B must be created when any symbol that can be a head of C appears All of the possibilities 9 See page See page 999

24 200 CHAPTER 5 ADVANCED TOPICS can be collected into three situations, erasure occurs on the left of a rule, in the middle of a rule, or on the right of a rule Suppose that string γ λ (ie, γ can be erased) Then we have three relations that can be read directly out of the grammar: They may be described by the phrases C < D = C γdδ Π (526) B = C = X αbγcδ Π (527) A > B = B αaγ Π (528) C < D = D starts some rule ining C B = C = B precedes C in some rule A > B = A ends some rule ining B The pointy end of the relation is toward the visible symbol within which the other symbols in the relation hide The reflexive transitive closure of the relations (ie, < and > ) exposes the hidden components One can show FB = > = < (529) One can understand the compound relation above by inserting the symbols B and C; A > = < D means there are symbols B and C, where A is a tail of B and D is a head of C and B is next to C in some rule That is: B, C A > B = C < D The computation of the followed-by relation is thereby reduced to picking information out of the cfg and doing some standard computations on relations If reduce is not single-valued, the language is not slr(1) (a reduce-reduce conflict) If both shift and reduce are ined for some parameters s and D, the language is not slr(1) (a shift-reduce conflict) If neither shift nor reduce is ined for some parameters s and D, there is no ined action If such a situation arises during translation, a syntax error has been discovered and a diagnostic may be issued 11 Otherwise the function that is ined is obeyed Since in practice shift and reduce are never both ined at any one fixed location [s, D], an implementation may use a single matrix to store them both 11 It is my personal opinion that the translator should gracefully quit at this point and wait for the error to be fixed Conventional wisdom includes attempting to repair the damage well enough so that translation can continue The difference of opinion probably is related to the speed of the compiler for which the issue is raised Error recovery is an interesting topic in itself

25 55 AUTOMATIC PARSING 201 Exercises 45 [1,1] Compute the relations <, =, >, <, >, FB for each of the cfgs starting with Exercise [1,1] Compute the functions shift for each of the cfgs starting with Exercise [1,1] Compute the slr(1) functions reduce for each of the cfgs starting with Exercise [1,1] Considering the results of the previous two exercises, which of the cfgs are not slr(1) and why not? 49 [1,1] What difficulties might arise from the storage efficiency hack of combining the implementing arrays for shift and reduce? The Lookahead Grammar The difference between slr and lalr is that slr keys off the nonterminal of the reduction where lalr keys off the state in which the reduction is made The result is to give lalr more information, and therefore a finer separation of lookahead sets The difference is enough in practice to justify the additional effort of extending slr(1) to lalr(1) The cost is entirely in building the tables; it does not affect the size of the tables or the efficiency of the parser Consider the sequence of lr(0) transitions through which the lr(0) dfa goes during the parse of some text When a rule is applied, the rhs of the rule is popped off the parse state stack, the nonterminal lhs if the rule is prefixed to the unprocessed input and the next transition is over that nonterminal Eventually the goal symbol is all that is left on the stack The set of all transition sequences over all valid input texts is also a language (recall Exercise 12) Suppose G is the original cfg and A describes the lr(0) dfa Then there is a cfg G, called the Lookahead Grammar, for the transition language G will be derived below; it is useful because the slr(1) lookahead for G gives the lalr(1) lookahead for the original cfg G Select some particular transition in Π over nonterminal B A BC B V N By the construction of the lr(0) dfa, one can deduce the existence of a path in A on the basis of the rules in G B β Π A β For example, there are transitions from state p0 over E and T in Figure 53 There are four paths p0 β:

26 202 CHAPTER 5 ADVANCED TOPICS p0 E p1 p4 p0 T p2 p0 T p2 x p5 p0 x p3 The vocabulary of the Lookahead Grammar is the set of transitions from the the lr(0) dfa The brackets will be used, as before, to distinguish the symbol [A BC ] V from the transition A BC Π The notation will be extended so that the symbol [A β] will signify the (perhaps empty) sequence of transitions from state A over the sequence of symbols β The lr(0) terminating rules A λ are not included on the end of the sequences [A β] All of this is summarized by the initions in Table 55 V T V N G Π G = {[A bα] A bα Π {G } b V T } = {[A Bα] A Bα Π {G } B V N } = [G G] = {[A Bα] [A β] A Bα Π {G } B β Π} = V T, V N, G, Π Table 55: Construction of the Lookahead Grammar The rules in the Lookahead Grammar have the same form as those in the original grammar G The difference is that there may be more than one rule in Π corresponding to a single rule in Π The application of a rule in G replaces the transitions that crossed the right-hand of the rule with a transition that crosses the left-hand of the rule, which is precisely the action associated with the use of the lr(0) dfa Thus the Lookahead Grammar follows the entire parse, rather than just a single parse step A Failure of slr(1) The cfg in Figure 54 is neither lr(0) nor slr(1) The Lookahead Grammar will be needed to construct the lalr(1) lookahead To see the necessity, the slr(1) lookahead will first be constructed and shown to fail The relation FB for G must be constructed to get the lookahead and therefore the values of function reduce

27 55 AUTOMATIC PARSING 203 q0 E q1 q4 a q2 T q5 a q9 G E x q6 b q10 E a T a E a x b b q3 T q7 b q11 E b T b T x x q8 Figure 54 A lr(0) dfa Showing slr(1) Failure The relation FB for slr(1) lookahead is constructed directly from the cfg The components of FB are given below: > G < E G E < a a > E G < a b > E > G x > E = E < b T G > a = T G < b G T = a T < x G < E a > E E > E a = x G < E < a G b > E x > T T > T x = b a > E < E < b E a b = T b > T = T < T < x T b b a < a x > x b < b > x < x < Combining the three center columns we get the slr(1) lookahead for G T FB a a FB T E FB T FB b a FB x a FB x FB a b FB T b FB x FB b b FB x The corresponding values of shift and reduce, ined in Equations 522 and 525, are given below:

28 204 CHAPTER 5 ADVANCED TOPICS shift(q0, E) = q1 shift(q0, a) = q2 shift(q0, b) = q3 shift(q1, ) = q4 shift(q2, T) = q5 shift(q2, x) = q6 shift(q3, T) = q7 shift(q3, x) = q8 shift(q5, a) = q9 shift(q6, b) = q10 shift(q7, b) = q11 reduce(q6, a) = T x reduce(q6, b) = T x reduce(q8, a) = T x reduce(q8, b) = T x reduce(q9, ) = E ata reduce(q10, ) = E axb reduce(q11, ) = E btb The failure to be slr(1) shows up in initions for both reduce and shift for state q6 and input symbol b The slr(1) lookahead is inadequate The cfg is not slr(1), and it is now time to try constructing the Lookahead Grammar and the lalr(1) tables The Lookahead Grammar (example) There are 11 transitions in the lr(0) dfa A That is, size(π ) = 11 where V T = {E, T,, a, b, x} V N = {q0, q1, q2, q3, q4, q5, q6, q11} G = q0 Π = {t0, t1, t2, t3, t4, t5, t6, t10} A = V T, V N, G, Π t0 = q0 E q1 t3 = q1 q4 t1 = q0 a q2 t4 = q2 T q5 t8 = q5 a q9 t5 = q2 x q6 t9 = q6 b q10 t2 = q0 b q3 t6 = q3 T q7 t10 = q7 b q11 t7 = q3 x q8 The Lookahead Grammar G is ined in terms of the transitions of the lr(0) dfa The vocabulary V is taken from Π augmented by one extra symbol G V T = {t1, t2, t3, t5, t7, t8, t9, t10} V N = {G, t0, t4, t6} G = [q0 G]

29 55 AUTOMATIC PARSING 205 Π = G t0 t3 t0 t1 t4 t8 t0 t1 t5 t9 t0 t2 t6 t10 G = V T, V N, G, Π t4 t5 t6 t7 The Lookahead Grammar cfg rules are easier to read if the transitions tn are spelled out [q0 G] [q0 E q1][q1 q4] [q0 E q1] [q0 a q2][q2 T q5][q5 a q9] [q0 E q1] [q0 a q2][q2 x q6][q6 b q10] [q0 E q1] [q0 b q3][q3 T q7][q7 b q11] [q2 T q5] [q2 x q6] [q3 T q7] [q3 x q8] The effective difference between G and G is two rules for T x This causes different lookahead to be computed for states q6 and q8 in Figure 54 Since it was q6 that caused the slr(1) inadequacy, the lalr(1) construction looks promising The computation of relation FB is repeated for G G t8 t9 t10 t5 t7 > t0 > t0 > t0 > t0 > t4 > t6 t3 t8 t9 t10 t5 t7 t0 t1 t2 t3 t10 G > G > t0 > t0 > t0 > t4 t0 > t1 t6 > t4 G > t1 t0 t5 > t1 t2 > t2 t6 > t3 > t10 = t3 = t4 = t8 = t5 = t9 = t6 = t10 G G G t0 t0 t4 t6 G t0 t1 t2 t10 < t0 < t1 < t2 < t1 < t2 < t5 < t7 < G < t0 < t1 < t2 < t10 G t0 t0 t4 t6 < t0 < t1 < t2 < t5 < t7 Combining the three center columns we get the slr(1) lookahead for G t8 FB t3 t9 FB t3 t10 FB t3 t5 FB t8 t7 FB t10 t0 FB t3 t1 FB t4 t4 FB t8 t1 FB t5 t5 FB t9 t2 FB t6 t6 FB t10 t2 FB t7

30 206 CHAPTER 5 ADVANCED TOPICS LALR(1) Lookahead Now comes the application of the Lookahead Grammar slr(1) information to the construction of the lalr(1) lookahead for the original cfg The shift function values are unchanged There are fewer reduce values because of the finer separation of lookahead in lalr(1) For every rule [A Bα] [A β] Π there is a rule B β Π and a destinationt state qn in the final symbol in the sequence [A β] Suppose [A Bα] FB [C Dω] Then the lalr(1) lookahead for state qn in the lr(0) dfa for G is given by: reduce(qn, D) = B β (530) Applying this formula to the running example, only symbols tn V N slr(1) lookahead and can give rise to lalr(1) lookahead for G have t0 FB t3 gives reduce(q9, ) = E ata reduce(q10, ) = E axb reduce(q11, ) = E btb t4 FB t8 gives reduce(q6, a) = T x t6 FB t10 gives reduce(q8, b) = T x the resulting values of reduce no longer conflict with shift 12 Exercises 50 [1,1] Verify the computation of the lalr(1) lookahead for the example 51 [1,1] Compute the relations <, =, >, <, >, FB for the lookahead grammars associated with each of the cfgs starting with Exercise [1,1] Compute the lalr(1) functions reduce for each of the cfgs starting with Exercise [1,1] Considering the results of the previous two exercises, which of the cfgs are not lalr(1) and why not? When LALR(1) Fails It will often be the case that lalr(1) is not enough to build tables for your favorite grammar While it is possible to increase the lookahead at the expense of complicating the tables and the algorithms that have to deal with them, it is almost always better to change the cfg There is an art to writing grammars 12 See page 204

31 55 AUTOMATIC PARSING 207 that are both pleasing and also lalr(1) The lalr(1) tables for any practical cfg are far beyond reasonable hand computation, so the failures are reported through a table building program Therefore, in addition to knowing what to do to change the cfg, one also needs to know how to interpret diagnostic messages from a relatively opaque algorithm 13 Once the simple syntactic constraints of the input cfg are met, troubles are always deep they come from within relatively massive computations on intermediate grammars the user never wants to see The symptom, shift-reduce or reduce-reduce conflict often has no obvious cause What can be reported (directly from the symptom) is the symbol that shows up in the conflict It is the second argument to the shift and reduce functions The original cfg rule for which the reduce decision cannot be tabulated can also be reported This leads to diagnostics of the form: lalr(1) shift-reduce conflict for symbol ) on rule Term = Term * Factor or lalr(1) reduce-reduce conflict for symbol ) on rules Term = Term * Factor SimpleDecl = * SimpleDecl There may be hundreds of such messages and only one error in the grammar The challenge of good diagnostics is to suppress messages that do not contribute to locating the error, and to give traceability information For instance the set of symbols for which the conflict symbol is FB eases the problem of finding how the conflict arose The following symbol(s) are followed by the conflict symbol ) Term Factor Declaration SimpleDecl The problem of giving reasonable diagnostics is also interesting when a correct lalr(1) parser detects an input error There is a particular bit of compilerwriter arrogance to avoid a syntax error may reflect simple carelessness on the part of the programmer; it may also reflect a reasonable generalization on the language for which the language designer should be criticized In any case the diagnostic is issued by a mere computer any accusatory tone in the diagnostic is out of place A cloying preamble O master, I do not understand your divine intent is overshoot, but in the right direction Once the compiler-writer s heart is in the right place, the compiler can get down to issuing just the facts: where the syntax error was detected, what was found, and what would have been acceptable That is usually enough for a syntax error, which is all a parser can detect 13 The situation is even worse for recursive descent techniques there one may end up with a parser that apparently works but in fact contains bugs

Context-free Grammars

Context-free Grammars 1 contents of Context-free Grammars Phrase Structure Everyday Grammars for Programming Language Formal Definition of Context-free Grammars Definition of Language Left-to-right Application cfg ects Transforming

More information

Lecture 7: Deterministic Bottom-Up Parsing

Lecture 7: Deterministic Bottom-Up Parsing Lecture 7: Deterministic Bottom-Up Parsing (From slides by G. Necula & R. Bodik) Last modified: Tue Sep 20 12:50:42 2011 CS164: Lecture #7 1 Avoiding nondeterministic choice: LR We ve been looking at general

More information

Lecture 8: Deterministic Bottom-Up Parsing

Lecture 8: Deterministic Bottom-Up Parsing Lecture 8: Deterministic Bottom-Up Parsing (From slides by G. Necula & R. Bodik) Last modified: Fri Feb 12 13:02:57 2010 CS164: Lecture #8 1 Avoiding nondeterministic choice: LR We ve been looking at general

More information

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous. Section A 1. What do you meant by parser and its types? A parser for grammar G is a program that takes as input a string w and produces as output either a parse tree for w, if w is a sentence of G, or

More information

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468 Parsers Xiaokang Qiu Purdue University ECE 468 August 31, 2018 What is a parser A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure

More information

Bottom-Up Parsing. Lecture 11-12

Bottom-Up Parsing. Lecture 11-12 Bottom-Up Parsing Lecture 11-12 (From slides by G. Necula & R. Bodik) 9/22/06 Prof. Hilfinger CS164 Lecture 11 1 Bottom-Up Parsing Bottom-up parsing is more general than topdown parsing And just as efficient

More information

S Y N T A X A N A L Y S I S LR

S Y N T A X A N A L Y S I S LR LR parsing There are three commonly used algorithms to build tables for an LR parser: 1. SLR(1) = LR(0) plus use of FOLLOW set to select between actions smallest class of grammars smallest tables (number

More information

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309 PART 3 - SYNTAX ANALYSIS F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 64 / 309 Goals Definition of the syntax of a programming language using context free grammars Methods for parsing

More information

Monday, September 13, Parsers

Monday, September 13, Parsers Parsers Agenda Terminology LL(1) Parsers Overview of LR Parsing Terminology Grammar G = (Vt, Vn, S, P) Vt is the set of terminals Vn is the set of non-terminals S is the start symbol P is the set of productions

More information

Syntax Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Syntax Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay Syntax Analysis (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay September 2007 College of Engineering, Pune Syntax Analysis: 2/124 Syntax

More information

Wednesday, September 9, 15. Parsers

Wednesday, September 9, 15. Parsers Parsers What is a parser A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure of a program (think: diagramming a sentence) Agenda

More information

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs: What is a parser Parsers A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure of a program (think: diagramming a sentence) Agenda

More information

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing Parsing Wrapup Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing LR(1) items Computing closure Computing goto LR(1) canonical collection This lecture LR(1) parsing Building ACTION

More information

Bottom-Up Parsing. Lecture 11-12

Bottom-Up Parsing. Lecture 11-12 Bottom-Up Parsing Lecture 11-12 (From slides by G. Necula & R. Bodik) 2/20/08 Prof. Hilfinger CS164 Lecture 11 1 Administrivia Test I during class on 10 March. 2/20/08 Prof. Hilfinger CS164 Lecture 11

More information

Review: Shift-Reduce Parsing. Bottom-up parsing uses two actions: Bottom-Up Parsing II. Shift ABC xyz ABCx yz. Lecture 8. Reduce Cbxy ijk CbA ijk

Review: Shift-Reduce Parsing. Bottom-up parsing uses two actions: Bottom-Up Parsing II. Shift ABC xyz ABCx yz. Lecture 8. Reduce Cbxy ijk CbA ijk Review: Shift-Reduce Parsing Bottom-up parsing uses two actions: Bottom-Up Parsing II Lecture 8 Shift ABC xyz ABCx yz Reduce Cbxy ijk CbA ijk Prof. Aiken CS 13 Lecture 8 1 Prof. Aiken CS 13 Lecture 8 2

More information

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino 3. Syntax Analysis Andrea Polini Formal Languages and Compilers Master in Computer Science University of Camerino (Formal Languages and Compilers) 3. Syntax Analysis CS@UNICAM 1 / 54 Syntax Analysis: the

More information

LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States

LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States TDDD16 Compilers and Interpreters TDDB44 Compiler Construction LR Parsing, Part 2 Constructing Parse Tables Parse table construction Grammar conflict handling Categories of LR Grammars and Parsers An NFA

More information

Bottom-Up Parsing II. Lecture 8

Bottom-Up Parsing II. Lecture 8 Bottom-Up Parsing II Lecture 8 1 Review: Shift-Reduce Parsing Bottom-up parsing uses two actions: Shift ABC xyz ABCx yz Reduce Cbxy ijk CbA ijk 2 Recall: he Stack Left string can be implemented by a stack

More information

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised: EDAN65: Compilers, Lecture 06 A LR parsing Görel Hedin Revised: 2017-09-11 This lecture Regular expressions Context-free grammar Attribute grammar Lexical analyzer (scanner) Syntactic analyzer (parser)

More information

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Formal Languages and Compilers Lecture VII Part 3: Syntactic A Formal Languages and Compilers Lecture VII Part 3: Syntactic Analysis Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/

More information

Wednesday, August 31, Parsers

Wednesday, August 31, Parsers Parsers How do we combine tokens? Combine tokens ( words in a language) to form programs ( sentences in a language) Not all combinations of tokens are correct programs (not all sentences are grammatically

More information

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh. Bottom-Up Parsing II Different types of Shift-Reduce Conflicts) Lecture 10 Ganesh. Lecture 10) 1 Review: Bottom-Up Parsing Bottom-up parsing is more general than topdown parsing And just as efficient Doesn

More information

Lexical and Syntax Analysis. Bottom-Up Parsing

Lexical and Syntax Analysis. Bottom-Up Parsing Lexical and Syntax Analysis Bottom-Up Parsing Parsing There are two ways to construct derivation of a grammar. Top-Down: begin with start symbol; repeatedly replace an instance of a production s LHS with

More information

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6 Compiler Design 1 Bottom-UP Parsing Compiler Design 2 The Process The parse tree is built starting from the leaf nodes labeled by the terminals (tokens). The parser tries to discover appropriate reductions,

More information

SLR parsers. LR(0) items

SLR parsers. LR(0) items SLR parsers LR(0) items As we have seen, in order to make shift-reduce parsing practical, we need a reasonable way to identify viable prefixes (and so, possible handles). Up to now, it has not been clear

More information

Context-free grammars

Context-free grammars Context-free grammars Section 4.2 Formal way of specifying rules about the structure/syntax of a program terminals - tokens non-terminals - represent higher-level structures of a program start symbol,

More information

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1 CSE P 501 Compilers LR Parsing Hal Perkins Spring 2018 UW CSE P 501 Spring 2018 D-1 Agenda LR Parsing Table-driven Parsers Parser States Shift-Reduce and Reduce-Reduce conflicts UW CSE P 501 Spring 2018

More information

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing Roadmap > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing The role of the parser > performs context-free syntax analysis > guides

More information

Chapter 3: Lexing and Parsing

Chapter 3: Lexing and Parsing Chapter 3: Lexing and Parsing Aarne Ranta Slides for the book Implementing Programming Languages. An Introduction to Compilers and Interpreters, College Publications, 2012. Lexing and Parsing* Deeper understanding

More information

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology MIT 6.035 Parse Table Construction Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Parse Tables (Review) ACTION Goto State ( ) $ X s0 shift to s2 error error goto s1

More information

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form Bottom-up parsing Bottom-up parsing Recall Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form If α V t,thenα is called a sentence in L(G) Otherwise it is just

More information

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution Exam Facts Format Wednesday, July 20 th from 11:00 a.m. 1:00 p.m. in Gates B01 The exam is designed to take roughly 90

More information

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F). CS 2210 Sample Midterm 1. Determine if each of the following claims is true (T) or false (F). F A language consists of a set of strings, its grammar structure, and a set of operations. (Note: a language

More information

Compiler Construction: Parsing

Compiler Construction: Parsing Compiler Construction: Parsing Mandar Mitra Indian Statistical Institute M. Mitra (ISI) Parsing 1 / 33 Context-free grammars. Reference: Section 4.2 Formal way of specifying rules about the structure/syntax

More information

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis. Topics Chapter 4 Lexical and Syntax Analysis Introduction Lexical Analysis Syntax Analysis Recursive -Descent Parsing Bottom-Up parsing 2 Language Implementation Compilation There are three possible approaches

More information

shift-reduce parsing

shift-reduce parsing Parsing #2 Bottom-up Parsing Rightmost derivations; use of rules from right to left Uses a stack to push symbols the concatenation of the stack symbols with the rest of the input forms a valid bottom-up

More information

LR Parsing LALR Parser Generators

LR Parsing LALR Parser Generators LR Parsing LALR Parser Generators Outline Review of bottom-up parsing Computing the parsing DFA Using parser generators 2 Bottom-up Parsing (Review) A bottom-up parser rewrites the input string to the

More information

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence. Bottom-up parsing Recall For a grammar G, with start symbol S, any string α such that S α is a sentential form If α V t, then α is a sentence in L(G) A left-sentential form is a sentential form that occurs

More information

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table. MODULE 18 LALR parsing After understanding the most powerful CALR parser, in this module we will learn to construct the LALR parser. The CALR parser has a large set of items and hence the LALR parser is

More information

3. Parsing. Oscar Nierstrasz

3. Parsing. Oscar Nierstrasz 3. Parsing Oscar Nierstrasz Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture notes. http://www.cs.ucla.edu/~palsberg/ http://www.cs.purdue.edu/homes/hosking/

More information

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7 Top-Down Parsing and Intro to Bottom-Up Parsing Lecture 7 1 Predictive Parsers Like recursive-descent but parser can predict which production to use Predictive parsers are never wrong Always able to guess

More information

CS 406/534 Compiler Construction Putting It All Together

CS 406/534 Compiler Construction Putting It All Together CS 406/534 Compiler Construction Putting It All Together Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy

More information

Lecture Bottom-Up Parsing

Lecture Bottom-Up Parsing Lecture 14+15 Bottom-Up Parsing CS 241: Foundations of Sequential Programs Winter 2018 Troy Vasiga et al University of Waterloo 1 Example CFG 1. S S 2. S AyB 3. A ab 4. A cd 5. B z 6. B wz 2 Stacks in

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Context Free Grammars and Parsing 1 Recall: Architecture of Compilers, Interpreters Source Parser Static Analyzer Intermediate Representation Front End Back

More information

Languages and Compilers

Languages and Compilers Principles of Software Engineering and Operational Systems Languages and Compilers SDAGE: Level I 2012-13 3. Formal Languages, Grammars and Automata Dr Valery Adzhiev vadzhiev@bournemouth.ac.uk Office:

More information

LR Parsing LALR Parser Generators

LR Parsing LALR Parser Generators Outline LR Parsing LALR Parser Generators Review of bottom-up parsing Computing the parsing DFA Using parser generators 2 Bottom-up Parsing (Review) A bottom-up parser rewrites the input string to the

More information

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7 Top-Down Parsing and Intro to Bottom-Up Parsing Lecture 7 1 Predictive Parsers Like recursive-descent but parser can predict which production to use Predictive parsers are never wrong Always able to guess

More information

CS 164 Programming Languages and Compilers Handout 9. Midterm I Solution

CS 164 Programming Languages and Compilers Handout 9. Midterm I Solution Midterm I Solution Please read all instructions (including these) carefully. There are 5 questions on the exam, some with multiple parts. You have 1 hour and 20 minutes to work on the exam. The exam is

More information

UNIT III & IV. Bottom up parsing

UNIT III & IV. Bottom up parsing UNIT III & IV Bottom up parsing 5.0 Introduction Given a grammar and a sentence belonging to that grammar, if we have to show that the given sentence belongs to the given grammar, there are two methods.

More information

Semantics via Syntax. f (4) = if define f (x) =2 x + 55.

Semantics via Syntax. f (4) = if define f (x) =2 x + 55. 1 Semantics via Syntax The specification of a programming language starts with its syntax. As every programmer knows, the syntax of a language comes in the shape of a variant of a BNF (Backus-Naur Form)

More information

LR Parsing Techniques

LR Parsing Techniques LR Parsing Techniques Introduction Bottom-Up Parsing LR Parsing as Handle Pruning Shift-Reduce Parser LR(k) Parsing Model Parsing Table Construction: SLR, LR, LALR 1 Bottom-UP Parsing A bottom-up parser

More information

Lecture Notes on Bottom-Up LR Parsing

Lecture Notes on Bottom-Up LR Parsing Lecture Notes on Bottom-Up LR Parsing 15-411: Compiler Design Frank Pfenning Lecture 9 1 Introduction In this lecture we discuss a second parsing algorithm that traverses the input string from left to

More information

14.1 Encoding for different models of computation

14.1 Encoding for different models of computation Lecture 14 Decidable languages In the previous lecture we discussed some examples of encoding schemes, through which various objects can be represented by strings over a given alphabet. We will begin this

More information

Syntactic Analysis. Top-Down Parsing

Syntactic Analysis. Top-Down Parsing Syntactic Analysis Top-Down Parsing Copyright 2017, Pedro C. Diniz, all rights reserved. Students enrolled in Compilers class at University of Southern California (USC) have explicit permission to make

More information

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Massachusetts Institute of Technology Language Definition Problem How to precisely define language Layered structure

More information

Chapter 4. Lexical and Syntax Analysis

Chapter 4. Lexical and Syntax Analysis Chapter 4 Lexical and Syntax Analysis Chapter 4 Topics Introduction Lexical Analysis The Parsing Problem Recursive-Descent Parsing Bottom-Up Parsing Copyright 2012 Addison-Wesley. All rights reserved.

More information

Downloaded from Page 1. LR Parsing

Downloaded from  Page 1. LR Parsing Downloaded from http://himadri.cmsdu.org Page 1 LR Parsing We first understand Context Free Grammars. Consider the input string: x+2*y When scanned by a scanner, it produces the following stream of tokens:

More information

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

MIT Specifying Languages with Regular Expressions and Context-Free Grammars MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Language Definition Problem How to precisely

More information

In One Slide. Outline. LR Parsing. Table Construction

In One Slide. Outline. LR Parsing. Table Construction LR Parsing Table Construction #1 In One Slide An LR(1) parsing table can be constructed automatically from a CFG. An LR(1) item is a pair made up of a production and a lookahead token; it represents a

More information

4. Lexical and Syntax Analysis

4. Lexical and Syntax Analysis 4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal

More information

CS606- compiler instruction Solved MCQS From Midterm Papers

CS606- compiler instruction Solved MCQS From Midterm Papers CS606- compiler instruction Solved MCQS From Midterm Papers March 06,2014 MC100401285 Moaaz.pk@gmail.com Mc100401285@gmail.com PSMD01 Final Term MCQ s and Quizzes CS606- compiler instruction If X is a

More information

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1) TD parsing - LL(1) Parsing First and Follow sets Parse table construction BU Parsing Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1) Problems with SLR Aho, Sethi, Ullman, Compilers

More information

Assignment 4 CSE 517: Natural Language Processing

Assignment 4 CSE 517: Natural Language Processing Assignment 4 CSE 517: Natural Language Processing University of Washington Winter 2016 Due: March 2, 2016, 1:30 pm 1 HMMs and PCFGs Here s the definition of a PCFG given in class on 2/17: A finite set

More information

4. Lexical and Syntax Analysis

4. Lexical and Syntax Analysis 4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal

More information

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1 CS453 : JavaCUP and error recovery CS453 Shift-reduce Parsing 1 Shift-reduce parsing in an LR parser LR(k) parser Left-to-right parse Right-most derivation K-token look ahead LR parsing algorithm using

More information

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! [ALSU03] Chapter 3 - Lexical Analysis Sections 3.1-3.4, 3.6-3.7! Reading for next time [ALSU03] Chapter 3 Copyright (c) 2010 Ioanna

More information

Compiler Construction 2016/2017 Syntax Analysis

Compiler Construction 2016/2017 Syntax Analysis Compiler Construction 2016/2017 Syntax Analysis Peter Thiemann November 2, 2016 Outline 1 Syntax Analysis Recursive top-down parsing Nonrecursive top-down parsing Bottom-up parsing Syntax Analysis tokens

More information

Question Bank. 10CS63:Compiler Design

Question Bank. 10CS63:Compiler Design Question Bank 10CS63:Compiler Design 1.Determine whether the following regular expressions define the same language? (ab)* and a*b* 2.List the properties of an operator grammar 3. Is macro processing a

More information

Optimizing Finite Automata

Optimizing Finite Automata Optimizing Finite Automata We can improve the DFA created by MakeDeterministic. Sometimes a DFA will have more states than necessary. For every DFA there is a unique smallest equivalent DFA (fewest states

More information

Parsing II Top-down parsing. Comp 412

Parsing II Top-down parsing. Comp 412 COMP 412 FALL 2018 Parsing II Top-down parsing Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled

More information

Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and

Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and Computer Language Theory Chapter 4: Decidability 1 Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and

More information

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward Lexical Analysis COMP 524, Spring 2014 Bryan Ward Based in part on slides and notes by J. Erickson, S. Krishnan, B. Brandenburg, S. Olivier, A. Block and others The Big Picture Character Stream Scanner

More information

Zhizheng Zhang. Southeast University

Zhizheng Zhang. Southeast University Zhizheng Zhang Southeast University 2016/10/5 Lexical Analysis 1 1. The Role of Lexical Analyzer 2016/10/5 Lexical Analysis 2 2016/10/5 Lexical Analysis 3 Example. position = initial + rate * 60 2016/10/5

More information

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata Lexical Analysis Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata Phase Ordering of Front-Ends Lexical analysis (lexer) Break input string

More information

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing Also known as Shift-Reduce parsing More powerful than top down Don t need left factored grammars Can handle left recursion Attempt to construct parse tree from an input string eginning at leaves and working

More information

Introduction to Lexical Analysis

Introduction to Lexical Analysis Introduction to Lexical Analysis Outline Informal sketch of lexical analysis Identifies tokens in input string Issues in lexical analysis Lookahead Ambiguities Specifying lexical analyzers (lexers) Regular

More information

CS 406/534 Compiler Construction Parsing Part I

CS 406/534 Compiler Construction Parsing Part I CS 406/534 Compiler Construction Parsing Part I Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy and Dr.

More information

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1 CSE 401 Compilers LR Parsing Hal Perkins Autumn 2011 10/10/2011 2002-11 Hal Perkins & UW CSE D-1 Agenda LR Parsing Table-driven Parsers Parser States Shift-Reduce and Reduce-Reduce conflicts 10/10/2011

More information

Syntax Analysis Part I

Syntax Analysis Part I Syntax Analysis Part I Chapter 4: Context-Free Grammars Slides adapted from : Robert van Engelen, Florida State University Position of a Parser in the Compiler Model Source Program Lexical Analyzer Token,

More information

3.5 Practical Issues PRACTICAL ISSUES Error Recovery

3.5 Practical Issues PRACTICAL ISSUES Error Recovery 3.5 Practical Issues 141 3.5 PRACTICAL ISSUES Even with automatic parser generators, the compiler writer must manage several issues to produce a robust, efficient parser for a real programming language.

More information

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis Chapter 4 Lexical and Syntax Analysis Introduction - Language implementation systems must analyze source code, regardless of the specific implementation approach - Nearly all syntax analysis is based on

More information

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones Parsing III (Top-down parsing: recursive descent & LL(1) ) (Bottom-up parsing) CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones Copyright 2003, Keith D. Cooper,

More information

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam Compilers Parsing Yannis Smaragdakis, U. Athens (original slides by Sam Guyer@Tufts) Next step text chars Lexical analyzer tokens Parser IR Errors Parsing: Organize tokens into sentences Do tokens conform

More information

UNIT-III BOTTOM-UP PARSING

UNIT-III BOTTOM-UP PARSING UNIT-III BOTTOM-UP PARSING Constructing a parse tree for an input string beginning at the leaves and going towards the root is called bottom-up parsing. A general type of bottom-up parser is a shift-reduce

More information

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant Syntax Analysis: Context-free Grammars, Pushdown Automata and Part - 4 Department of Computer Science and Automation Indian Institute of Science Bangalore 560 012 NPTEL Course on Principles of Compiler

More information

CSCI312 Principles of Programming Languages

CSCI312 Principles of Programming Languages Copyright 2006 The McGraw-Hill Companies, Inc. CSCI312 Principles of Programming Languages! LL Parsing!! Xu Liu Derived from Keith Cooper s COMP 412 at Rice University Recap Copyright 2006 The McGraw-Hill

More information

CS 4120 Introduction to Compilers

CS 4120 Introduction to Compilers CS 4120 Introduction to Compilers Andrew Myers Cornell University Lecture 6: Bottom-Up Parsing 9/9/09 Bottom-up parsing A more powerful parsing technology LR grammars -- more expressive than LL can handle

More information

Introduction to Parsing. Lecture 5

Introduction to Parsing. Lecture 5 Introduction to Parsing Lecture 5 1 Outline Regular languages revisited Parser overview Context-free grammars (CFG s) Derivations Ambiguity 2 Languages and Automata Formal languages are very important

More information

Automata & languages. A primer on the Theory of Computation. The imitation game (2014) Benedict Cumberbatch Alan Turing ( ) Laurent Vanbever

Automata & languages. A primer on the Theory of Computation. The imitation game (2014) Benedict Cumberbatch Alan Turing ( ) Laurent Vanbever Automata & languages A primer on the Theory of Computation The imitation game (24) Benedict Cumberbatch Alan Turing (92-954) Laurent Vanbever www.vanbever.eu ETH Zürich (D-ITET) September, 2 27 Brief CV

More information

Lecture Notes on Bottom-Up LR Parsing

Lecture Notes on Bottom-Up LR Parsing Lecture Notes on Bottom-Up LR Parsing 15-411: Compiler Design Frank Pfenning Lecture 9 September 23, 2009 1 Introduction In this lecture we discuss a second parsing algorithm that traverses the input string

More information

LECTURE NOTES ON COMPILER DESIGN P a g e 2

LECTURE NOTES ON COMPILER DESIGN P a g e 2 LECTURE NOTES ON COMPILER DESIGN P a g e 1 (PCCS4305) COMPILER DESIGN KISHORE KUMAR SAHU SR. LECTURER, DEPARTMENT OF INFORMATION TECHNOLOGY ROLAND INSTITUTE OF TECHNOLOGY, BERHAMPUR LECTURE NOTES ON COMPILER

More information

General Overview of Compiler

General Overview of Compiler General Overview of Compiler Compiler: - It is a complex program by which we convert any high level programming language (source code) into machine readable code. Interpreter: - It performs the same task

More information

1 Parsing (25 pts, 5 each)

1 Parsing (25 pts, 5 each) CSC173 FLAT 2014 ANSWERS AND FFQ 30 September 2014 Please write your name on the bluebook. You may use two sides of handwritten notes. Perfect score is 75 points out of 85 possible. Stay cool and please

More information

Syntax Analysis, VI Examples from LR Parsing. Comp 412

Syntax Analysis, VI Examples from LR Parsing. Comp 412 Midterm Exam: Thursday October 18, 7PM Herzstein Amphitheater Syntax Analysis, VI Examples from LR Parsing Comp 412 COMP 412 FALL 2018 source code IR IR target Front End Optimizer Back End code Copyright

More information

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Parsing III. (Top-down parsing: recursive descent & LL(1) ) Parsing III (Top-down parsing: recursive descent & LL(1) ) Roadmap (Where are we?) Previously We set out to study parsing Specifying syntax Context-free grammars Ambiguity Top-down parsers Algorithm &

More information

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice. Parsing Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students

More information

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root Parser Generation Main Problem: given a grammar G, how to build a top-down parser or a bottom-up parser for it? parser : a program that, given a sentence, reconstructs a derivation for that sentence ----

More information

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019 Comp 411 Principles of Programming Languages Lecture 3 Parsing Corky Cartwright January 11, 2019 Top Down Parsing What is a context-free grammar (CFG)? A recursive definition of a set of strings; it is

More information

LR Parsing E T + E T 1 T

LR Parsing E T + E T 1 T LR Parsing 1 Introduction Before reading this quick JFLAP tutorial on parsing please make sure to look at a reference on LL parsing to get an understanding of how the First and Follow sets are defined.

More information

Programming Languages Third Edition

Programming Languages Third Edition Programming Languages Third Edition Chapter 12 Formal Semantics Objectives Become familiar with a sample small language for the purpose of semantic specification Understand operational semantics Understand

More information