Compiler Construction Class Notes

Size: px
Start display at page:

Download "Compiler Construction Class Notes"

Transcription

1 Compiler Construction Class Notes Reg Dodds Department of Computer Science University of the Western Cape c 2006,2017 Reg Dodds March 22, 2017

2 Introduction What is a Compiler? What is an Interpreter? Why Compiler Construction? What languages? An example of a very simple compilation. Why write a compiler? Layout of a compiler. 1

3 What is interpretation? Let L L be a programming language, with L = {Fortran, Lisp, Algol, COBOL, PL/1, BA- SIC, APL, SNOBOL, Pascal, C, C++, Ada, SQL, Java, ML, Haskell, }. I L is an interpreter for a program p L L, and input A is data, where A is usually called a character set and A is its Kleene-closure from which I L computes output data output A. The execution of the interpreter may abort and lead to an error condition: I L : L A A {error} p L } interpret output {error}, which we input by I L may also write as: I L (p L,input) = output {error} A single process takes place: the source program is directly interpreted. 2

4 Making interpreters efficient In a production quality interpreter it is advantageous to produce some sort of compact interpretable code by a process that is similar to compilation, once, and then subsequently reinterpret this compact code repeatedly. This process is used by Java and many interpreters for BASIC such as GWBasic. Typically a commandline interface interprets the command directly. Anevenbetterideaistocompileblocksofcodeincrementally, directly to executable machine code. When a block is altered its corresponding code is replaced with new code. Interpreters often have direct access to the original source code this is very useful for finding errors in the source program. Stepping mechanisms that move line-by-line through the source are easily implemented with interpreters. 3

5 One view of a compiler When compiling is involved, two processes are applied to execute a source program. A compiler C L for a language L translates a syntactically correct source program p L L into equivalent machine code. Source program Compiler machine language Examples: A source program in C++ is translated into Mips machine code. Visual Basic source code is compiled into Intelx86 machine code. A Java source program is translated into JVM byte code. 4

6 Execution of machine language The machine code produced by the compiler is somehow executed by hardware. Hardware may be emulated by microcode, or it may be hardwired. Some instructions may be entirely executable by hardware. Certain instructions may be emulated by microcode. The user is usually not aware that some of the machine code instructions, or even all of them, are being emulated. On some machines the machine instruction set may change dynamically, depending on the application. It is likely that compiled machine code, on any particular machine, runs faster than code running on an interpreter on the same machine. 5

7 What is compilation? Thesourceprogramp L Lisfirsttranslatedbya compilerc L intoanequivalentmachineexecutable program p M. Next p M is interpreted, or executed, by a machine plus its input to create output and/or an error. To run a program: (1) it is compiled and (2) then it is executed. C L (p L ) = p M {error}, if there are no compilation errors then the second step may be invoked:i M (p M,input)=output {error}. NoticethattheinterpreterI L hasnowbecomei M which is perhaps hardware. Interpreters and computers are different realizations of computing machines. Sun s picojava chip or the Java Virtual Machine on your computer can be used interchangeably to run the same byte code program p M. 6

8 Java source program public class simple { public static void main (Strings argsv[ ]){ int a; a = 41; a = a + 19; } } 7

9 Java byte code Compiled from public class simple.java public class simple extends java.lang.object{ public static void main (java.lang.string[ ]); public simple(); } Method void main(java.lang.string[ ]); 0 bipush 41 2 istore 1 3 iload 1 4 bipush 19 6 iadd 7 istore 1 8 return Method simple() 0 aload 0 1 invokespecial #12 <Methodjava.lang.Object.()> 4 return Notethereisamainmethodandaconstructormethod. 8

10 Overview of course Programs related to compilers. The compilation process: phases, intermediate code, structures. Bootstrapping and transfer, T-diagrams, Louden s TINY and TM. SEPL: interpreter, emulator, compiler. 9

11 Programs related to compilers (Louden p 4-6) interpreters assemblers linkers loaders preprocessors editors debuggers profilers project managers SCCS and RCS 10

12 The compilation process (Louden p 7, 8-14) Phases Intermediate code source code scanner lexical analyser tokens syntax analyser abstract syntax tree semantic analyser annotated syntax tree intermediate code optimizer intermediate code code generator target code target code optimizer optimized code linker-loader executable code Structures literals symboltable error handler temporary files 11

13 Bootstrapping and transfer of programming languages (Louden 1.6, p 18-21) T-diagrams next slide. Pascal in 1970 on CDC P-code compiler for Pascal in P-code emulator written in Algol 60, and in Fortran lead to widespread usage of Pascal. (Why?) 12

14 T-diagrams A T-diagram represents Source language being run in Host code to produce Target language. Source Target Host Let two compilers run on the same host machine. One compiler translates from language Start into an intermediate language IL and the other compiler translates from IL into language Final. Start IL IL Final Start Final Host Host Host We have produced a system that can compile from Start into Final. 13

15 T-diagrams One compiler for Pascal creates P-code, but runs on machine M. Another processor running on M can generate code for machine N. Pascal P code Pascal P code M M N N M We have produced a system that can compile from Pascal into P-code on a new machine. 14

16 T-diagrams: compiler-old compiler-new Pascal P code Pascal P code M M N N M 15

17 T-diagrams define the SEPL language. write an interpreter for it. develop a machine emulator or use an available one. develop a compiler that compiles to our machine machine code. add an optimizing phase to the compiler. alter the compiler to produce code for another machine. 16

18 Students Educational Programming Language (SEPL) Various projects lie ahead. Define the SEPL language Louden calls his TINY Develop its syntax and informal semantics. Write an interpreter for it using flex/lex and bison/yacc. Decide on target machine. Develop a machine emulator for the target or use a real machine. Develop a compiler that produces executable code. Introduce optimization phase not really enough time. How much time required to produce the compiler? 17

19 Scanning Lexical analysis (Louden Chapter 2) tokens from lexemes is done quite well by flex. regular expressions (Louden p 38). extension of notation for regular expressions does not give the notation any more power, but simplifies its practical use. regular expressions are widely used: flex, vim, sed, emacs, python, bash, tcl/tk, grep, awk, perl, etc. regular expressions and FSAs (Louden p 47 ). DFSA-FSA relationship (Louden p46 72). minimization of number of states. Louden s TINY-scanner: Gives insight into direct connection between FSA and scanner. (Louden 2.5) application of f lex for scanning lexical analysis. 18

20 Context-free languages (CFLs) and syntax analysis (Louden Chapter 3) Syntax analysers are based on CFLs. list of tokens analyser abstract syntax tree syntaxtree = analyse(); 19

21 Parse trees have dynamic structure. recursive structure. tree keeps track of attributes such as: types, scope, liveliness, nesting and values. subscript expression integer id a integer[] e.g. a[i] = 6; id i integer assignment number 6 integer 20

22 Context-free grammars (CFGs) (Louden 3.2) Formally a CFG is a fourtuple G = (N,T,P,S) where N and T are alphabets, N is the set of non-terminals or variables and T is the set of terminals, P N (N T) is the set productionrules and S N is the startsymbol. Example: N = {exp,op}, T = {number,+,, }, P = {exp exp op exp (exp) number, op + } and S = exp Note that number is treated as a token. The source string (117 17) 5 is first tokenized to (number number) number before it is analysed. P 1 = {E E O E (E) n,o + } is a set of productions not different from P. 21

23 Derivations sententialform: anystring (N T) derived from S, the start symbol. direct derivations: if one production is applied to a part of a sentential form and transforms it by matching the right hand side of a production with this part and then replaces it with a non-terminal. Example: The production exp (exp). can be applied to bring about the direct derivation exp number (exp) number. derivation when a chain of direct derivations are applied one after the other to transform the sentential form s 0 to another sentential form s n. It is written as s 0 sn. language: all strings s T that can be derived from the start symbol S, symbolically: L(G) = {s T S s}. 22

24 Derivation: exp (number number) number [exp exp op exp], exp exp op exp, [exp number], exp op number, [op ], exp number, [exp (exp)], (exp) number, [exp exp op exp], (exp op exp) number, [exp number], (exp op number) number, [op ], (exp number) number, [exp number], (number number) number, 23

25 language, sentence, examples language: all strings s T that can be derived from the start symbol S, symbolically: L(G) = {s T S s} sentence: the elements of the language L(G), s L(G), are known as sentences. Example: G = ({E},{a,(,)},{E (E) a},e) E a, i.e. E a, i.e. E a,a L(G). similarly E (E) (a), i.e. E (a) and E (E) ((E)) ((a)), i.e. E ((a)). Theorem: E ( n a) n, n N 0 Proof: Using induction. P 0 : E ( 0 a) 0 = a, since E E a. P 1 : E (a), because E (E) (a). P k P : Assume that P k+1 k holds, i.e. E ( k a) k. Now E (E), in other words E (E) ( k+1 a) k+1, and E ( n a) n, n N, i.e. L(G) = {( n a) n n N 0 }. (( k a) k ) 24

26 Examples Problem with empty base If P = {E (E)} is L(G) = {} =. This is empty because it is impossible to form bases P 0, or P 1. Since the base does not exist an infinite regress ensues. However, we can prove that E ( n E) n, but this is of little value, since E can not be reduced to a terminal. CFL using regular expressions If P = {E E +a a}, is L(G) = a(+a), where a(+a) {a,a+a,a+a+a,...}. 25

27 An if-statement G = ({statement, if-statement, expression}, {0,1,if,else,other}, {statement if-statement other if-statement if (expression) statement if (expression) statement else statement expression 0 1}, statement) and L(G) = { other,if (0) other,if (1) other, if (0) other else other,if (1) other else other, if (0) if (0) other,if (1) if (0) other, if (0) if (1) other,if (1) if (1) other, if (0) if (0) other else other, if (1) if (0) other else other, if (0) if (1) other else other, if (1) if (1) other else other,...} 26

28 The use of ε Consider the grammar we only show the productionsp: {statement if-statement other, if-statement if (expression) statement if (expression) statement else statement, expression 0 1} It may be written if using an ε-grammar if follows: {statement if-statement other, if-statement if (expression) statement else-part, else-part else statement ε, expression 0 1} ε is also useful for lists: list statement; list statement statement s This generates the language L(G) = {s, s;s, s;s;s,...} s + It is rewritten using ε if follows: list non-ε-list ε non-ε-list statement; non-ε-list statement statement s 27

29 Left- and right recursion The regular grammar a + is represented as follows with left recursive productions: A Aa a. a L(G) since A a, thus A a, but A Aa, and A aa, and we again expect that it may be replaced in A Aa and it follows that A aaa. It is simple to prove with mathematical induction that L(G) = a +. Our notation is rather informal: the set represented by a +, was formerly represented more exactly by L(a + ), which represents the set {a,aa,aaa,...}. Similarly we can prove that a grammar using the right recursive productions A aa a generates the same language. How is a represented? A Aa ε or using A aa ε What is L(G) for the grammar with the productions A (A)A ε? 28

30 Parse trees and abstract syntax trees (ASTs) It is convenient to distinguish between a parse tree and an abstract syntax tree. An abstract syntax tree is often called a syntax tree. A parse tree contains all the information concerning the syntactical Consider the parse tree and its corresponding stripped down (abstract) syntax tree generated by the derivation on the next slide. Syntax trees usually show the actual values at the terminals and not merely the tokens. 29

31 Right derivation for exp (number number) number The derivation below is executed in a determinate order. The rightmost non-terminal is replaced in each step until no more non-terminals remain. (1) [exp exp op exp], exp exp op exp, (2) [exp number], exp op number, (3) [op ], exp number, (4) [exp (exp)], (exp) number, (5) [exp exp op exp], (exp op exp) number, (6) [exp number], (exp op number) number, (7) [op ], (exp number) number, (8) [exp number], (number number) number, 64

32 Parse tree and syntax tree for the derivation exp (29-11) * 47 Parse tree for (29-11) * 47 1 exp exp op exp ( 5 exp ) * number exp 7 op 6 exp 47 number number Syntax tree for (29-11) * 47 *

33 Right derivation for exp (number number) number The derivation below is executed in a determinate order. The rightmost non-terminal is replaced in each step until no more non-terminals remain. (1) [exp exp op exp], exp exp op exp, (2) [exp number], exp op number, (3) [op ], exp number, (4) [exp (exp)], (exp) number, (5) [exp exp op exp], (exp op exp) number, (6) [exp number], (exp op number) number, (7) [op ], (exp number) number, (8) [exp number], (number number) number, 64

34 Parse tree for right derivation of exp (number number) number 8 exp 1 4exp 3 op 2 ( 5 exp ) * exp 7 op 6 exp exp number number - number 65

35 Leftmost derivation for exp (number number) number The derivation below is executed in a determinate order. The leftmost non-terminal of the sentential form is replaced each time reduced until there are no more non-terminals. (1) [exp exp op exp], exp exp op exp, (2) [exp (exp)], (exp) op exp, (3) [exp exp op exp], (exp op exp) op exp, (4) [exp number], (number op exp) op exp, (5) [op ], (number exp) op exp, (6) [exp number], (number number) op exp, (7) [op ], (number number) exp, (8) [exp number], (number number) number, 66

36 A Parse tree for the derivation of exp (number number) number exp ( exp ) exp op exp op exp * exp number number - number 67

37 Right and left derivations for number + number A left derivation (1) exp exp op exp, number op exp, number + exp, number + number, exp exp op exp number + number // 68

38 Rightmost derivation A rightmost derivation for number + number 4 (1) exp exp op exp, exp op number, exp + number, number + number, exp 1 exp 3 op 2 exp number + number 69

39 Ambiguous grammars The grammar with P = { exp exp op exp (exp) number op + } is ambiguous because it has two different parse trees. It will also therefore have two different left and rightmost derivations, because each parse tree has a unique leftmost derivation. exp exp op exp exp op exp * number number and now the other tree. - number 70

40 Ambiguous grammars A different parse tree for number +number exp exp op exp number - exp op exp number * number Ambiguous: If two different parse trees can be derived from a given grammar then it is ambiguous. It is preferable to use an unambiguous grammar for defining a computing language. Ambiguity can be eliminated in two ways: the grammar can be altered so that it becomes unambiguous, or the way bison/yacc does it precedence rules or association rules can be applied when there ambiguities. 71

41 The dangling else problem (Louden p ) The string if (0) if (1) other else other has two parse trees. This is the dangling else problem. statement if statement if ( exp ) statement else statement 0 if statement other exp if ( ) statement 1 other statement if statement if ( exp ) statement 0 if statement if ( exp ) statement else statement 1 other other 72

42 The dangling else problem The C code if (x!= 0) if (y == 1/x) OK = TRUE; else z = 1/x; could have had two interpretations: if (x!= 0) { if (x!= 0) { if (y == 1/x) OK = TRUE; if (y == 1/x) OK = TRUE; } else z = 1/x; else z = 1/x; } C disambiguates if with the most closely nested rule which resolves the ambiguity right-hand side. The grammar rules may be adapted as follows: if-statement matched unmatched matched if (exp) matched else matched other unmatched if (expression) if-statement if (exp) matched else unmatched expression 0 1 The next slide shows the unambiguous parse tree. 73

43 An unambiguous grammar for C s if-statement if-statement matched unmatched matched if (exp) matched else matched other unmatched if (expression) if-statement if (exp) matched else unmatched expression 0 1 if statement unmatched if ( exp ) if statement 0 matched if ( exp ) matched else matched 1 other other 74

44 Representations of syntax: BNF BNF Bacchus-Naur form. The metasymbol ::= is used like in production rules, is separates alternatives. Angle brackets, < and > delimit non-terminals. Terminals are written in plain text, or in bold face. The code below defines a <program>: <program> ::= program <declaration-list> begin <statement-list> end. A program, starts with program, and is followed by a list of declarations, then a begin, and a list of statements terminated with end and a fullstop. EBNF Extended BNF. BNF was made more convenient to use by extending it slightly. 75

45 Representations of syntax: EBNF EBNF Extended BNF. Put optional items inside brackets [ and ], <if-statement> ::= if <boolean> then <statement-list> [else <statement-list>] end if ; Repetition is done using braces, { and }. <identifier> ::= <letter> { <letter> <digit> } An <identifier> is a word that starts with a letter and has any number of letters of digits. <statement-list> ::= <statement> { ; <statement-list> } An <statement-list> is a <statement> or a list of <statement>s separated by semicolons. 76

46 Representations of syntax: EBNF tramline diagrams used by Wirth for Pascal, and for ANS Fortran. two-level grammar Algol 68. etc. 77

47 Formal properties of CFLs (Louden p ) Vide Louden. 78

48 The Chomsky hierarchy (Louden p. 131) Chomsky-type: Description 3: Regular languages Let A N and α T then productions in the grammar have the form A α or A Aα or alternatively: the recursion may be right about. Onlyonekindofrecursionmaybepresent, i.e. left or right otherwise G is a CFL. 2: Let A N and γ (N T) and A γ. In a context-f ree language A can always be replaced in any context by γ. 1: IftheproductionA γ isinacontextsensitive language, then it may be applied only in a predetermined context, i.e., A may produce γ only if A is in a given context lê, e.g. αaβ αγβ, where α ε. Such a rule is context sensitive. An example of context sensitivity is the restriction that variables must be declared before they may be used. 0: Phrase structure grammars are the most powerful. 79

49 Top-down parsing (Louden Chapter 4) Recursive-descent LL(1) parsing first and follow sets Error recovery in top-down-parsers 80

50 Top-down parsing A top-down parser executes a lef tmost derivation. It starts from the start symbol and works itswaydowntotheterminalsintheformoftokens. P redictive parser: attempts to forecast the next construction by using lookahead tokens. Backtracking parser: attempts different possibilities for parsing the known input, and backs up when it hits dead ends. Slower than predictive parsers. Use exponential time. More powerful. Recursive-descent parsing is usually applied to hand-written compilers Wirth s compilers often use RD parsers. Your 1st-year compiler was RD. LL(1) parsing L on left input is followed from left to right. L on right derivation is leftmost. The1meansthatonlyonetokenisusedtopredict the progress of the parser. 81

51 LL(1) parsing LL(1) parsers work from left to right through the input and follow a leftmost derivation that uses one lookahead token. Viable-prefix property easy to see very quickly in such languages that there is an error when the lookahead token does not correspond with what we expect. The viable prefix corresponds to f irst. LL(k) parsers are also possible where k > 1. More difficult to see errors. first and follow sets derived from the grammar are used to construct the tables that will be used for LL(1) parsing. 82

52 first and follow sets The set first(x), where X is a terminal or ε, is simply {X}. SupposeX is a nonterminal then first(x) is the set of all xs such that {X xβ}, where β may be ε. The definition may be altered to accommodate LL(k) parsers by replacing x with strings of k terminals, or if β is ε x < k. Inotherwordsfirstisthesetofleadingterminals of the sentential forms derivable from X. The definition may be altered to accommodate LL(k) parsers by replacing x with strings of k terminals, or if β is ε x < k. (See also Louden p. 168) 83

53 first sets In the grammar for arithmetic expressions: exp exp addop term term addop + term term mulop factor factor mulop factor ( exp ) number first(addop) = { +, } first(mulop) = { } first(exp) = { (, number} first(term) = { (, number} first(factor) = { (, number} 84

54 first in the grammar for an if-statement G = ({statement, if-statement, expression}, {0,1,if,else,rest}, {statement if-statement rest if-statement if (expression) statement else-part else-part else statement ε expression 0 1}, statement}) f irst(statement) = {if, rest} f irst(expression) = {0, 1} f irst(if-statement) = {if} f irst(else-part) = {else, ε} 85

55 Basic LL(1) parsing (Louden p. 152) LL(1) parsers use a push-down-stack rather than backtracking from recursive procedure calls. Consider S ( S ) S ε Initialize stack to $S Parse action P arsing stack Input Action 1 $ S ()$ S (S)S 2 $ S)S( ()$ match 3 $ S)S )$ S ε 4 $ S) )$ match 5 $ S $ S ε 6 $ $ accept Two actions: 1. Replace A N at the top of the stack by α, where A α, where α (N T) and 2. Match the token on top of the stack with the next input token. 86

56 LL(1) parsing Parse action P arsing stack Input Action 1 $ S ()$ S (S)S 2 $ S)S( ()$ match 3 $ S)S )$ S ε 4 $ S) )$ match 5 $ S $ S ε 6 $ $ accept At step 1 the stack contains S and the input is ()$. Apply rule S (S)S. The RHS is place stacked item-by-item onto the stack so that it appears reversed. Remove the matched on top of the stack ( in step 2 because it matches the token at the start of the input. 87

57 LL(1)-recursion-free productions for arithmetic (Louden p.160) exp term exp exp addop term exp ε addop + term factor term term mulop factor term ε mulop factor ( exp ) number 88

58 89

59 Parse tree and syntax tree for (Louden p. 161) The parse tree for the expression does not represent the left associativity of subtraction. The parser should still construct the left associative syntax tree. 1. The value 3 must be passed up to the root exp 2. Theroot exphands3downto exp whichsubtracts 4 from it. 3. The resulting 1 is passed down to the next exp, 4. which subtracts 5 yielding 6, 5. which is passed to the next exp. 6. The rightmost exp has an ε child and finally passes the 6 back to the root exp. 87

60 Building the syntax tree with an LL(1)-grammar Implement exp term exp as follows exp(){ term; exp ; } To compute the expression it is rewritten as: int exp(){int temp; temp = term; return exp (temp); } 88

61 Code for arithmetic The code for exp addop term exp ε is exp () { switch(token) { + : match( + ); term; exp ; break; - : match( - ); term; exp ; break; } } To compute the expression it could be rewritten as: int exp (int val) { switch(token) { + : match( + ); val += term; return exp (val); - : match( - ); val -= term; return exp (val); default: return val; } Note that exp requires a parameter passed from exp. 89

62 Left factoring Lef t f actoring is needed when right-hand sides of productions share a common prefix, e.g. A αβ αγ Typical practical examples are: stmt-sequence stmt;stmt-sequence stmt stmt s and if-stmnt if ( exp ) statement if ( exp ) statement else statement An LL(1) parser cannot distinguish between such productions. The solution is to factor out the common prefix as follows: A αa, A β γ For factoring to work properly α should be the longest left prefix. Louden gives a left-factoring algorithm and many examples on pp

63 follow sets In this discussion we regard $ as a terminal. Recallthatfirst(A)isthesetofleadingterminals of the sentential forms derivable from A. Informally, f ollow(a) is the set of terminals that may be derived from nonterminals appearing after A on the right-hand side of productions, or it is the set of those terminals that follow A in such productions. Since $ is regarded as a terminal, if A is the start symbol then $ is in follow(a). Formally: f ollow(a) is the set of terminals such that if there is a production B αaγ, 1. then first(γ) \ {ε} is in follow(a), and 2. and if ε is in first(γ), then follow(a) contains follow(b). f ollow sets are only defined for nonterminals. 91

64 An algorithm for f ollow(a) Algol style for all nonterminals A do follow(a) = { }; f ollow(start-symbol) = {$}; while there are changes to any follow sets do for each production A X 1 X 2...X n do for each X i that is a nonterminal do add first(x i+1 X i+2...x n ) \ {ε}to follow(x i ) /* Note: if i = n then X i+1 X i+2...x n = ε*/ if ε first(x i+1 X i+2...x n ) then add follow(a) to follow(x i ) 92

65 An algorithm for f ollow(a) C-style for (all nonterminals A) follow(a) = { }; f ollow(start-symbol) = {$}; while (there are changes to any follow sets) for (each production A X 1 X 2...X n ) for (each X i that is a nonterminal){ add first(x i+1 X i+2...x n ) \ {ε}to follow(x i ) /* Note: if i = n then X i+1 X i+2...x n = ε*/ if ε first(x i+1 X i+2...x n ) then add follow(a) to follow(x i ) } 93

66 Construct follow from the first set In the grammar for arithmetic expressions: (1) exp exp addop term (2) exp term (3) addop + (4) addop - (5) term term mulop factor (6) term factor (7) mulop * (8) factor ( exp ) (9) factor number first(addop) = { +, - } first(mulop) = { * } first(factor) = { (, number } first(term) = { (, number } first(exp) = { (, number } 94

67 Constructing f ollow from f irst In the grammar for arithmetic expressions: (1) exp exp addop term (2) exp term (3) addop + (4) addop - (5) term term mulop factor (6) term factor (7) mulop * (8) factor ( exp ) (9) factor number Ignore (3), (4), (7) and (9) no RH nonterminals Set all follow(a) = { }; follow(exp) = {$} (1) affects follow of exp, addop and term first(addop) is added to follow(exp), so follow(exp) = { $,-,+} and f irst(term) is added to f ollow(addop), so follow(addop) = { (,number} and follow(exp) is added to follow(term), so follow(term) = { $,+,-} 95

68 Constructing f ollow from f irst In the grammar for arithmetic expressions: (1) exp exp addop term (2) exp term (3) addop + (4) addop - (5) term term mulop factor (6) term factor (7) mulop * (8) factor ( exp ) (9) factor number (2)causesfollow(exp)tobeaddedtofollow(term), which does not add anything new. (5) is similar to (1). first(mulop) is added to follow(term), so follow(term) = { $,+,-,*} and first(factor) is added to follow(mulop), so follow(mulop) = { (,number} and follow(term) is added to follow(factor), so follow(factor) = { $,+,-,*} 96

69 Constructing f ollow from f irst In the grammar for arithmetic expressions: (1) exp exp addop term (2) exp term (3) addop + (4) addop - (5) term term mulop factor (6) term factor (7) mulop * (8) factor ( exp ) (9) factor number (6) adds f ollow(term) to f ollow(f actor) no effect. (8) adds first()) to follow(exp), such that follow(exp) = { $,+,-,)} Duringthesecondpass(1)adds)tofollow(factor), so that follow(factor) = { $,+,-,*,)} 97

70 Constructing LL(1) parse tables The parse table M[A, a] contains productions added according to the rules 1. If A α is a production rule such that there is a derivation α aβ, where a is a token, then the rule A α is added to M[A,a]. 2. If A α ε is an ε-production and there is a derivation S$ αaaβ, where S is the start symbol and a is a token, or $, then the production A ε is added to M[A,a]. The token a in Rule 1 is in first(α) and the token in Rule 2 is in follow(a). This is repeatedly applied for each nonterminal A and each production A α. 1. For each token a in first(α), add A α to the entry M[A,a]. 2. If ε first(α), for each element a follow(a), add A α to M[A,a]. 98

71 Characterizing an LL(1) grammar A grammar in BNF is LL(1) if the following conditions are satisfied: 1. ForeveryproductionA α 1 α 2... α n,first(α i ) first(α j ) is empty for all i and j and i,j [1..n],i j. 2. For every nonterminal A such that first(a) ε, first(a) follow(a) is empty. 99

72 Examples See Louden s examples on p

73 Bottom-up parsing Overview. Finite automata of LR(0) items and LR(0) parsing. SLR(1) parsing. General LR(1) and LALR(1) parsing. bison an LALR(1) parser generator. Generation of a parser using bison. Error recovery in bottom-up parsers. 101

74 Bottom-up parsing an overview The most general bottom-up parser is the LR(1) parser the L indicates that the input is processed fromthelefttotheright,andtherindicatesthat a rightmost derivation is applied, and the one indicates that a single token is used for lookahead. LR(0) parsers are also possible where there is no lookahead, i.e. the lookahead token can be examined after it appears on the parse stack. SLR(1) parsers improve on LR(0) parsing. An even more powerful method, but still not as general as LR(1) parsers is the LALR(1) parser. Bottom-up parsers are generally more powerful than their top-down counterparts for example left recursion can be handled. Bottom-up parsers are unsuitable for hand coding, so parser generators like bison are used. 102

75 Bottom-up parsing overview Parse stack contains tokens and nonterminals P LU S state information. Parse stack starts empty and ends with start symbol alone on the stack and an empty input string. Actions: shif t, reduce and accept. A shift merely moves a token from the input to the top of the stack. A reduce replaces the string α on top of the stack with a nonterminal A, given A α. Top-down parsers are generate-match parsers and bottom-up parsers are shift-reduce parsers. If the grammar does not possess a unique start symbol that only appears once in the grammar, then bottom-up parsers are always augmented by such a start symbol. 103

76 Bottom-up parse of () ConsiderthegrammarwithP = {S (S )S ε}. Augment it by adding: S S. A bottom-up parse for the parenthesis grammar of () follows: P arsing stack Input Action 1 $ ()$ shift 2 $ ( )$ reduce S ε 3 $ ( S )$ shift 4 $ ( S ) $ reduce S ε 5 $ ( S ) S $ reduce S ( S ) S 6 $ S $ reduce S S 7 $ S $ accept The bottom-up parser looks deeper into its parse stack and thus requires arbitrary stack lookahead. The derivation is: S S (S)S (S) () Clearly the rightmost nonterminal is reduced at each derivation step. 104

77 A bottom-up parse of + grammar Consider the grammar with P ={E E+n n}. Augment it by adding: E E. A bottom-up parse for the + grammar of n+n: P arsing stack Input Action 1 $ n + n$ shift 2 $ n + n$ reduce E n 3 $ E + n$ shift 4 $ E + n$ shift 5 $ E + n $ reduce E E + n 6 $ E $ reduce E E 7 $ E $ accept The derivation is: E E E +n n+n We see that the rightmost nonterminal is reduced at each derivation step. 105

78 Bottom-up parse overview P arsing stack Input Action 1 $ n + n$ shift 2 $ n + n$ reduce E n 3 $ E + n$ shift 4 $ E + n$ shift 5 $ E + n $ reduce E E + n 6 $ E $ reduce E E 7 $ E $ accept Inderivation: E E E+n n+n, eachof the intermediate strings is called a right sentential form, and it is split between the parse stack and the input. E+n occurs in step 3 of the parse as E +n, and as E + n in step 4, and finally as E +n. The string of symbols on top of the stack is called a viable prefix of the right sentential form. E, E+ and E +n are all viable prefixes of E +n. The viable prefixes of n+n are ε and n, but n+ and n+n are not. 106

79 Bottom-up parse overview A shift-reduce parser will shift terminals to the stack until it can perform a reduction to obtain the next right sentential form. This occurs when the stack top matches the righthand side of a production. This string together with the position in the right sentential form where it occurs and the production used to reduce it, is known as the handle. Handles are unique in unambiguous grammars. The handle of n + n is thus E n and the handle of E + n, to which the previous form is reduced is E E +n. The main task of a shift-reduce parser is finding the next handle. 107

80 Bottom-up parse overview P arsing stack Input Action 1 $ ()$ shift 2 $ ( )$ reduce S ε 3 $ ( S )$ shift 4 $ ( S ) $ reduce S ε 5 $ ( S ) S $ reduce S ( S ) S 6 $ S $ reduce S S 7 $ S $ accept The main task of a shift-reduce parser is finding the next handle. Reductions only occur when the reduced string is a right sentential form. In step 3 above the reduction S ε cannot be performed because the resulting string after the shift of ) onto the stack would be (S S) which is not a right sentential form. Thus S ε is not a handle at this position of the sentential form (S. To reduce with S (S)S the parser knows that (S)S appears on the right of a production and that it is already on the stack by using a DFA of items. 108

81 LR(0) items ThegrammarwithP = {S S, S (S)S ε} has three productions and eight LR(0) items: S.S S S. S.(S)S S (.S)S S (S.)S S (S).S S (S)S. S. When P = {E E, E E +n n} there are three productions and eight LR(0) items: E.E E E. E.E +n E E.+n E E +.n E E +n. E.n E n. 109

82 LR(0) parsing LR(0) items An LR(0) item of a CFG is a production with a distinguished position in its right-hand side. The distinguished position is usually denoted with the meta symbol:. i.e. period. e.g. if A α and β and γ are any two strings of symbols including ε such that α = βγ then A.βγ, A β.γ and A βγ. are all LR(0) items. They are called LR(0) items because they contain no explicit reference to lookahead. The item records the recognition of the righthand side of a particular production. Specifically A β.γ constructed from A βγ denotes that the β part has already been seen and it may be possible to derive the next input tokens from γ. 110

83 LR(0) parsing LR(0) items The item A.α indicates that A could be reduced from α it is called an initial item. The item A α. indicates that α is on the top of the stack and may be the handle if A α is used to reduce α to A it is called a complete item. The LR(0) items are used as states of a finite automaton that maintains information about the parse stack and the progress of a shift-reduce parse. 111

84 LR(0) parsing finite automata of items LR(0) items denote the states of a FSA that maintains the progress of a shift-reduce parse. One approach is to first construct a nondeterministic FSA of LR(0) items and then derive a DFA from it. Another approach is to construct the DFA of sets of LR(0) items directly. What transitions are represented in the NFA of LR(0) items? Suppose that the symbol X (N T). Let A α.xη be an LR(0) item which represents a state reached where α has been recognized and where the focal point, is directly before X. If X is a token, then there is a transition on the token X to next LR(0) state: A αx.η. A α.xη X A αx.η 112

85 LR(0) parsing finite automata of items We are considering A α.xη where the focal point, is directly before X. Suppose that X is a nonterminal, then it cannot be directly matched with a token on the input stream. The transition: A α.xη X A αx.η corresponds to pushing X onto the stack as a result of a reduction of some β to X as a result of applying the rule X β Such a reduction must be preceded by the recognition of β. The state denoted by X.β represents the start of the process of recognizing β. So when X is a nonterminal ε-transitions must also be provided: leaving from A α.xη for every production X β with X on the left and going to the LR(0) state X.β. A α.xη ε X.β 113

86 LR(0) parsing finite automata of items The two transitions: A α.xη X A αx.η and A α.xη ε X.β are the only ones in the NFA of LR(0) items. The start state of the NFA must correspond to the initial conditions of the parser: the parse stack is empty and the S the start symbol is about to be parsed, i.e. any initial item S.α can be used. Since we want the start state to be unique, the simple device of augmenting the grammar with a new, unique start symbol S for which S S suffices. The start state then is S.S. 114

87 LR(0) parsing finite automata of items What are the accepting states of the NFA? The NFA does not need accepting states. The NFA is not being used to do the recognition of the language. The NFA is merely being applied to keep track of the state of the parse. The parser itself determines when it accepts an input stream by determining that the input stream is empty and the start symbol is on the top of the parse stack. 115

88 LR(0) parsing finite automata of items ThegrammarwithP = {S S, S (S)S ε} has three productions and eight LR(0) items: S.S S S. S.(S)S S (.S)S S (S.)S S (S).S S (S)S. S. The NFA of LR(0) items for the S grammar: S.S S S S. ε ε S.(S)S S. S (S)S. ε ( ε ε S S (.S)S S ) S (S.)S S (S).S ε The next step is to produce the DFA that corresponds to the NFA. 116

89 LR(0) parsing Converting the NFA into a DFA S.S S S S. ε ε S.(S)S S. S (S)S. ε ( ε ε S S (.S)S S ) S (S.)S S (S).S ε Form the ε-closure of each LR(0) item. The closure always contains the set itself Add each item for which there are ε-transitions from the the original set. Then recursively add all sets which are ε-reachable from the sets already aggregated. Do this for every LR(0) item in the NFA. Add the terminal transitions that leave each aggregate. 117

90 LR(0) parsing an NFA and its corresponding DFA The NFA for the S grammar: S.S S S S. ε ε S.(S)S S. S (S)S. ε ( ε ε S S (.S)S S ) S (S.)S S (S).S ε The DFA derived from the NFA: S.S S.(S)S S. 0. S S S. 1. ( S (.S)S S.(S)S S. ( 2. S ( S (S.)S ) S (S).S S.(S)S S S S (S)S

91 LR(0) parsing finite automata of items When P = {E E, E E +n n} there are three productions and eight LR(0) items: E.E E E. E.E +n E E.+n E E +.n E E +n. E.n E n. The NFA of LR(0) items for the E grammar: ε E.E E E E. ε ε ε n E.E +n E.n E n. E E E.+n + n E E +.n E E +n. The next step is to produce the DFA that corresponds to the NFA. 119

92 LR(0) parsing: NFA and equivalent DFA The NFA for the E grammar: ε E.E E E E. ε ε E.E +n ε E.n n E n. E E E.+n + E E +.n n E E +n. The DFA derived from the above NFA: E.E E E E. E.E +n E E.+n E.n 0. + n 1. E n. 2. E E +.n 3. n E E +n. 4. The items that are added by the ε-closure are known as closure items and those items that originate the state are called kernel items. 120

93 LR(0) parsing The LR(0) algorithm keeps track of the current state in the DFA of LR(0) items. The parse stack need hold only state numbers since they represent all the necessary information. For the sake of simplifying the description of the algorithm the symbol will also be pushed onto the parse stack before the state number. The parse starts with: P arsing stack Input 1 $ 0 input string$ Suppose the token n is shifted onto the stack and the next state is 2: P arsing stack Input 2 $ 0 n 2 rest of input string$ The LR(0) parsing algorithm chooses its next action depending on the state on the top of the stack and the current input token. 121

94 The LR(0) parsing algorithm Let s be the current state. 1. If state s contains the item A α.xβ where X is a terminal, then the action is a shift. If the token is X then the next state is A αx.β. If the token is not X then there is an error. 2. If s contains a complete action such as A γ. then the action is to reduce γ by the rule A γ. WhenthestartsymbolS isreducedbytherule S S and the input is empty, then accept; if it is not empty then announce an error. In every other case the next state is computed as follows: (a) pop γ off the stack. (b) Set s = top, which contains B αa.β. (c) push(a) and push(b αa.β). 122

95 LR(0) parsing shif t-reduce and reduce-reduce conflicts A grammar is said to be an LR(0) grammar if the parser rules are unambiguous. If a statecontainsthecompleteitem A α.then it can contain no other items. If such a state were also to contain the shift item A α.xβ, where X is a terminal, then an ambiguityarisesastowhetheraction(1)or(2)mustbe executed. This is called a shif t-reduce conf lict. If such a state were also to contain another complete item B β., then an ambiguity arises as to whichproductiontoapply A α.orb β. this is known as a reduce-reduce conflict. A grammar is therefore LR(0) if and only if each state is either a shift state or a reduce state containing a single complete item. 123

96 SLR(1) parsing The SLR(1) parsing algorithm. Disambiguating rules for parsing conflicts. Limits of SLR(1) parsing power. SLR(k) grammars. 124

97 The SLR(1) parsing algorithm Simple LR(1), i.e. SLR(1) parsing, uses a DFA of sets of LR(0) items. The power of LR(0) is significantly increased by using the next token in the input stream to direct its actions in two ways: 1. The input token is consulted before a shift is made, to ensure that an appropriate DFA transition exists, and 2. It uses the follow set of a nonterminal to decide if a reduction should be performed. This is powerful enough to parse almost all common language constructs. 125

98 The SLR(1) parsing algorithm Let s be the current state, i.e. the state on top of the stack. 1. If s contains any item of the form A α.xβ, where X is the next token in the input stream, then shift X onto the stack and push the state containing the item A αx.β. 2. If s contains the complete item A γ. and the next token in the input stream is in follow(a), then reduce by the rule A γ. more details follow on next slide. 3. If the next input token is not accommodated by the (1) or (2), then an error is declared. 126

99 The SLR(1) parsing algorithm If s contains the complete item A γ. and the next token in the input stream is in follow(a), then reduce by the rule A γ. The reduction by S S, where S is the start state, and the next token is $, implies acceptance, otherwise the new state is computed as follows: (a) Remove the string γ and all its corresponding states from the parse stack. (b) Back up the DFA to the state where the construction of γ started. (c) By construction, this state contains an item of the form B γ.aβ. Push A onto the stack and push the item containing B γa.β. 127

100 SLR(1) grammar AgrammarisanSLR(1)grammariftheapplicationof the SLR(1) parsing rules do not result in an ambiguity. A grammar is an SLR(1) grammar. 1. For any item A α.xβ, where X is a token there is no complete item B γ. in s with X follow(b). A violation of this condition is a shift-reduce conflict. 2. For any two complete items A α. s and A β. s, follow(a) follow(b) =. A violation of this condition is a reduce-reduce conflict. 128

101 Table-driven SLR(1) grammar The grammar with P = {E E,E E + n n} is not LR(0) but is SLR(1) and its DFA of sets of items is: E.E E E E. E.E +n E E.+n E.n 0. + n 1. E n. 2. E E +.n 3. n E E +n. 4. follow(e ) = {$}, and follow(e) = {$,+} q(1,$) = accept instead of r(e E) State Input Go to n + $ E 0 s2 1 1 s3 accept 2 r(e n) r(e n) 3 s4 4 r(e E +n) r(e E +n) 129

102 SLR(1) parse of n+n+n State Input Go to n + $ E 0 s2 1 1 s3 accept 2 r(e n) r(e n) 3 s4 4 r(e E +n) r(e E +n) P arsing stack Input Action 1 $ 0 n + n + n$ shift 2 2 $ 0 n 2 + n + n$ reduce E n 3 $ 0 E 1 + n + n$ shift 3 4 $ 0 E n + n$ shift 4 5 $ 0 E n 4 + n$ reduce E E + n 6 $ 0 E 1 + n$ shift 3 7 $ 0 E n$ shift 4 8 $ 0 E n 4 $ reduce E E + n 9 $ 0 E 1 $ accept 130

103 SLR(1) parse of ()() State Input Go to ( ) $ S 0 s2 r(s ε) r(s ε) 1 1 accept 2 s2 r(s ε) r(s ε) 3 3 s4 4 s2 r(s ε) r(s ε) 5 5 r(s (S)S ) r(s (S)S ) P arsing stack Input Action 1 $ 0 ()()$ shift 2 2 $ 0 ( 2 )()$ reduce S ε 3 $ 0 ( 2 S 3 ()$ shift 4 4 $ 0 ( 2 S 3 ) 4 ()$ shift 2 5 $ 0 ( 2 S 3 ) 4 ( 2 )$ reduce S ε 6 $ 0 ( 2 S 3 ) 4 ( 2 S 3 $ shift 4 7 $ 0 ( 2 S 3 ) 4 ( 2 S 3 ) 4 $ reduce S ε 8 $ 0 ( 2 S 3 ) 4 ( 2 S 3 ) 4 S 5 $ reduce S (S)S 9 $ 0 ( 2 S 3 ) 4 S 5 $ reduce S (S)S 10 $ 0 S 1 $ accept 131

104 Disambiguating rules for parsing conflicts shif t-reduce have a natural disambiguating rule: prefer the shift over the reduce. reduce-reduce conflicts are more complex to resolve they usually require the grammar to be altered. Preferring the shif t over the reduce in the danglingelse ambiguity, leads to incorporating the mostclosely-nested-if rule. The grammar with the following productions is ambiguous: statement if-statement other if-statement if(exp)statement if(exp)statement else statement exp 0 1 We will consider the even simpler grammar: S I other I ifs ifselses 132

105 Disambiguating a shif t-reduce conflict Consider the grammar: S I other I if S if S else S Since follow(i) = follow(s) = {$,else}, there is a parsing conflict in state 5 the complete item I if S. indicates a reduction on inputtingelse or $, but theitemi ifs.elses indicatesashiftwhenelse is read. S.S S S.I S S. S.other I.if S I I.if S else S I other S other other if 1. I if.s I if.s else S S.I S.other I.if S I.if S else S S I if S. I if.s else S S I if other 2. else I I ifs else.s S.I S.other I.if S I.if S else S S I ifs else S

106 SLR(1) table without conflicts The rules are numbered: The SLR(1) parse table: (1) S I (2) S other (3) I if S (4) I if S else S State Input Go to if else other $ S I 0 s4 s accept 2 r1 r1 3 r2 r2 4 s4 s s6 r3 6 s4 s r4 r4 134

107 Limits of SLR(1) parsing power Consider the grammar which describes parameterless procedures and assignment statements: stmt call-stmt assign-stmt call-stmt identifier assign-stmt var := exp var var [ exp] identifier exp var number Assignments and procedure calls both start with an identifier. The parser can only decide at the end of the statement or when the token := appears if a call or an assignment is being processed. 135

108 Limits of SLR(1) parsing power Consider the simplified grammar: S id V := E V id E V n The start state of the DFA of sets items contains: S.S S.id S.V := E V.id Thestatehasashifttransitiononidtothestate: S id. V id. follow(s) = {$} and follow(v) = {:=,$}. On getting the input token $ the SLR(1) parser will try to reduce by both the rules S id and V id this is a reduce-reduce conflict. This simple problem can be solved by using an SLR(k) grammar. 136

109 SLR(k) grammars The SLR(1) algorithm can be extended to SLR(k) parsing, with k 1 lookahead symbols. Use first k and follow k sets and the two rules: 1. If s A α.xβ where X is a token and Xw first k (Xβ) are the next k tokens in the input stream, then the action is to shift the current input token onto the stack, and to pushthestatecontainingtheitema αx.β 2. If s A α. and w follow k (A) are the next tokens in the input string, then the action is to reduce by the rule A α SLR(k) parsing is more powerful than SLR(1) parsingwhenk > 1,butitissubstantiallyslower,since the cost of parsing grows exponentially in k. Typical non-slr(1) constructs are handled using an LALR(1) parser, by using standard disambiguating rules, or by rewriting the grammar. 137

110 General LR(1) and LALR(1) parsing LR(1), also called canonical LR(1), parsing overcomes the problem with SLR(1) parsing but also at the cost of increased time complexity. Lookahead LR(1) or LALR(1) preserves the efficiency of SLR(1) parsing and retains the benefits of general LR(1) parsing. We will discuss: Finite automata of LR(1) items. The LR(1) parsing algorithm. LALR(1) parsing. 138

111 Finite automata of LR(1) items (Louden p ) SLR(1) applies lookahead af ter constructing the DFA of LR(0) items the construction ignores the advantages that may ensue from considering lookaheads. General LR(1) uses the new DFA that has lookaheads built in from the start. ThisDFAusesitemsthatareanextensionofLR(0) items. They are called LR(1) items because they include a single lookahead token in each item. LR(1) items are written: [A α.β,a] where A α.β is an LR(0) item, and a is the lookahead token. Next the transitions between LR(1) items will be defined. 139

112 Transitions between LR(1) items There are several similarities withdfas of LR(0) items. They include ε-transitions. The DFA states are also built from ε-closures. However, transitions between LR(1) items must keep track of the lookahead token. Normal, i.e. non-ε-transitions, are quite similar to those in DFAs of LR(0) items. The major difference lies in the definition of ε- transitions. 140

113 Definition of LR(1)-transitions Given an LR(1) item, [A α.xγ,a], where X N T, thereisatransitiononxtotheitem[a αx.γ,a]. Given an LR(1) item, [A α.bγ,a], where B N, there are ε-transitions to items [B.β,b] for every production B β and for every token b first(γa). Only ε-transitions create new lookaheads. 141

114 DFA of sets of LR(0) items for A (A) a (Louden p. 208) TheaugmentedgrammarwithP = {A A,A (A) a} has the DFA of sets of LR(0) items: A.A S.(A) A.a ( S (.A) S.(A) A.a ( A a a A A A. A a. S (A.) ) S (A). 5. The parsing actions for the input ((a)) follow: P arsing stack Input Action 1 $ 0 ((a))$ shift 2 $ 0 ( 3 (a))$ shift 3 $ 0 ( 3 ( 3 a))$ shift 4 $ 0 ( 3 ( 3 a 2 ))$ reducea a 5 $ 0 ( 3 ( 3 A 4 ))$ shift 6 $ 0 ( 3 ( 3 A 4 ) 5 )$ reducea (A) 7 $ 0 ( 3 A 4 )$ shift 8 $ 0 ( 3 A 4 ) 5 $ reducea (A) 9 $ 0 A 1 $ accept 142

115 DFA of sets of LR(1) items for A (A) a (Louden p. 218) Augment the grammar by adding A A. State 0: first put [A.A,$]into State 0. To complete the closure, add ε-transitions to items with an A on the left of productions with $ as the lookahead: [A.(A),$], and [A.a,$]. [A.A,$] [A.(A),$] [A.a,$] 0. State 1: There is a transition from State 0 on A totheclosureofthesetthatincludes[a A.,$]. The action for this state will be to accept. [A A.,$]

116 DFA of sets of LR(1) items for A (A) a [A.A,$] [A.(A),$] [A.a,$] 0. State2: Thereisatransitionon ( leavingstate0 to the closure of the LR(1) item [A (.A),$] which forms the basis of State 2. Since there are ε-transitions from this item to [A.(A),)] and to[a.a,)]becausethefollowoftheainparentheses is first()$) = {)}. Note that there is a new lookahead item. The complete State 2 is: [A (.A),$] [A.(A),)] [A.a,)]

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing Roadmap > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing The role of the parser > performs context-free syntax analysis > guides

More information

3. Parsing. Oscar Nierstrasz

3. Parsing. Oscar Nierstrasz 3. Parsing Oscar Nierstrasz Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture notes. http://www.cs.ucla.edu/~palsberg/ http://www.cs.purdue.edu/homes/hosking/

More information

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals. Bottom-up Parsing: Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals. Basic operation is to shift terminals from the input to the

More information

CSC 4181 Compiler Construction. Parsing. Outline. Introduction

CSC 4181 Compiler Construction. Parsing. Outline. Introduction CC 4181 Compiler Construction Parsing 1 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL1) parsing LL1) parsing algorithm First and follow sets Constructing LL1) parsing table

More information

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous. Section A 1. What do you meant by parser and its types? A parser for grammar G is a program that takes as input a string w and produces as output either a parse tree for w, if w is a sentence of G, or

More information

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview n Tokens and regular expressions n Syntax and context-free grammars n Grammar derivations n More about parse trees n Top-down and

More information

3. Context-free grammars & parsing

3. Context-free grammars & parsing 3. Context-free grammars & parsing The parsing process sequences of tokens parse tree or syntax tree a / [ / index / ]/= / 4 / + / 2 The parsing process sequences of tokens parse tree or syntax tree a

More information

Wednesday, September 9, 15. Parsers

Wednesday, September 9, 15. Parsers Parsers What is a parser A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure of a program (think: diagramming a sentence) Agenda

More information

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs: What is a parser Parsers A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure of a program (think: diagramming a sentence) Agenda

More information

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309 PART 3 - SYNTAX ANALYSIS F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 64 / 309 Goals Definition of the syntax of a programming language using context free grammars Methods for parsing

More information

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview Tokens and regular expressions Syntax and context-free grammars Grammar derivations More about parse trees Top-down and bottom-up

More information

Monday, September 13, Parsers

Monday, September 13, Parsers Parsers Agenda Terminology LL(1) Parsers Overview of LR Parsing Terminology Grammar G = (Vt, Vn, S, P) Vt is the set of terminals Vn is the set of non-terminals S is the start symbol P is the set of productions

More information

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino 3. Syntax Analysis Andrea Polini Formal Languages and Compilers Master in Computer Science University of Camerino (Formal Languages and Compilers) 3. Syntax Analysis CS@UNICAM 1 / 54 Syntax Analysis: the

More information

CSE 3302 Programming Languages Lecture 2: Syntax

CSE 3302 Programming Languages Lecture 2: Syntax CSE 3302 Programming Languages Lecture 2: Syntax (based on slides by Chengkai Li) Leonidas Fegaras University of Texas at Arlington CSE 3302 L2 Spring 2011 1 How do we define a PL? Specifying a PL: Syntax:

More information

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38 Syntax Analysis Martin Sulzmann Martin Sulzmann Syntax Analysis 1 / 38 Syntax Analysis Objective Recognize individual tokens as sentences of a language (beyond regular languages). Example 1 (OK) Program

More information

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F). CS 2210 Sample Midterm 1. Determine if each of the following claims is true (T) or false (F). F A language consists of a set of strings, its grammar structure, and a set of operations. (Note: a language

More information

Syntax Analysis Part I

Syntax Analysis Part I Syntax Analysis Part I Chapter 4: Context-Free Grammars Slides adapted from : Robert van Engelen, Florida State University Position of a Parser in the Compiler Model Source Program Lexical Analyzer Token,

More information

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs Chapter 2 :: Programming Language Syntax Programming Language Pragmatics Michael L. Scott Introduction programming languages need to be precise natural languages less so both form (syntax) and meaning

More information

Wednesday, August 31, Parsers

Wednesday, August 31, Parsers Parsers How do we combine tokens? Combine tokens ( words in a language) to form programs ( sentences in a language) Not all combinations of tokens are correct programs (not all sentences are grammatically

More information

Context-free grammars

Context-free grammars Context-free grammars Section 4.2 Formal way of specifying rules about the structure/syntax of a program terminals - tokens non-terminals - represent higher-level structures of a program start symbol,

More information

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468 Parsers Xiaokang Qiu Purdue University ECE 468 August 31, 2018 What is a parser A parser has two jobs: 1) Determine whether a string (program) is valid (think: grammatically correct) 2) Determine the structure

More information

SYNTAX ANALYSIS 1. Define parser. Hierarchical analysis is one in which the tokens are grouped hierarchically into nested collections with collective meaning. Also termed as Parsing. 2. Mention the basic

More information

4. Lexical and Syntax Analysis

4. Lexical and Syntax Analysis 4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal

More information

Compiler Construction: Parsing

Compiler Construction: Parsing Compiler Construction: Parsing Mandar Mitra Indian Statistical Institute M. Mitra (ISI) Parsing 1 / 33 Context-free grammars. Reference: Section 4.2 Formal way of specifying rules about the structure/syntax

More information

Syntax Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Syntax Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay Syntax Analysis (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay September 2007 College of Engineering, Pune Syntax Analysis: 2/124 Syntax

More information

Parsing. Lecture 11: Parsing. Recursive Descent Parser. Arithmetic grammar. - drops irrelevant details from parse tree

Parsing. Lecture 11: Parsing. Recursive Descent Parser. Arithmetic grammar. - drops irrelevant details from parse tree Parsing Lecture 11: Parsing CSC 131 Fall, 2014 Kim Bruce Build parse tree from an expression Interested in abstract syntax tree - drops irrelevant details from parse tree Arithmetic grammar ::=

More information

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill Syntax Analysis Björn B. Brandenburg The University of North Carolina at Chapel Hill Based on slides and notes by S. Olivier, A. Block, N. Fisher, F. Hernandez-Campos, and D. Stotts. The Big Picture Character

More information

VIVA QUESTIONS WITH ANSWERS

VIVA QUESTIONS WITH ANSWERS VIVA QUESTIONS WITH ANSWERS 1. What is a compiler? A compiler is a program that reads a program written in one language the source language and translates it into an equivalent program in another language-the

More information

Bottom-Up Parsing. Lecture 11-12

Bottom-Up Parsing. Lecture 11-12 Bottom-Up Parsing Lecture 11-12 (From slides by G. Necula & R. Bodik) 2/20/08 Prof. Hilfinger CS164 Lecture 11 1 Administrivia Test I during class on 10 March. 2/20/08 Prof. Hilfinger CS164 Lecture 11

More information

Lecture 8: Deterministic Bottom-Up Parsing

Lecture 8: Deterministic Bottom-Up Parsing Lecture 8: Deterministic Bottom-Up Parsing (From slides by G. Necula & R. Bodik) Last modified: Fri Feb 12 13:02:57 2010 CS164: Lecture #8 1 Avoiding nondeterministic choice: LR We ve been looking at general

More information

Homework. Lecture 7: Parsers & Lambda Calculus. Rewrite Grammar. Problems

Homework. Lecture 7: Parsers & Lambda Calculus. Rewrite Grammar. Problems Homework Lecture 7: Parsers & Lambda Calculus CSC 131 Spring, 2019 Kim Bruce First line: - module Hmwk3 where - Next line should be name as comment - Name of program file should be Hmwk3.hs Problems How

More information

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011 Syntax Analysis COMP 524: Programming Languages Srinivas Krishnan January 25, 2011 Based in part on slides and notes by Bjoern Brandenburg, S. Olivier and A. Block. 1 The Big Picture Character Stream Token

More information

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1 Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1 1. Introduction Parsing is the task of Syntax Analysis Determining the syntax, or structure, of a program. The syntax is defined by the grammar rules

More information

4. Lexical and Syntax Analysis

4. Lexical and Syntax Analysis 4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal

More information

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam Compilers Parsing Yannis Smaragdakis, U. Athens (original slides by Sam Guyer@Tufts) Next step text chars Lexical analyzer tokens Parser IR Errors Parsing: Organize tokens into sentences Do tokens conform

More information

Lecture 7: Deterministic Bottom-Up Parsing

Lecture 7: Deterministic Bottom-Up Parsing Lecture 7: Deterministic Bottom-Up Parsing (From slides by G. Necula & R. Bodik) Last modified: Tue Sep 20 12:50:42 2011 CS164: Lecture #7 1 Avoiding nondeterministic choice: LR We ve been looking at general

More information

Compiler Construction 2016/2017 Syntax Analysis

Compiler Construction 2016/2017 Syntax Analysis Compiler Construction 2016/2017 Syntax Analysis Peter Thiemann November 2, 2016 Outline 1 Syntax Analysis Recursive top-down parsing Nonrecursive top-down parsing Bottom-up parsing Syntax Analysis tokens

More information

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram CS6660 COMPILER DESIGN Question Bank UNIT I-INTRODUCTION TO COMPILERS 1. Define compiler. 2. Differentiate compiler and interpreter. 3. What is a language processing system? 4. List four software tools

More information

Dr. D.M. Akbar Hussain

Dr. D.M. Akbar Hussain Syntax Analysis Parsing Syntax Or Structure Given By Determines Grammar Rules Context Free Grammar 1 Context Free Grammars (CFG) Provides the syntactic structure: A grammar is quadruple (V T, V N, S, R)

More information

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis Chapter 4 Lexical and Syntax Analysis Introduction - Language implementation systems must analyze source code, regardless of the specific implementation approach - Nearly all syntax analysis is based on

More information

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc. Syntax Syntax Syntax defines what is grammatically valid in a programming language Set of grammatical rules E.g. in English, a sentence cannot begin with a period Must be formal and exact or there will

More information

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

CS2210: Compiler Construction Syntax Analysis Syntax Analysis Comparison with Lexical Analysis The second phase of compilation Phase Input Output Lexer string of characters string of tokens Parser string of tokens Parse tree/ast What Parse Tree? CS2210: Compiler

More information

Lexical and Syntax Analysis. Top-Down Parsing

Lexical and Syntax Analysis. Top-Down Parsing Lexical and Syntax Analysis Top-Down Parsing Easy for humans to write and understand String of characters Lexemes identified String of tokens Easy for programs to transform Data structure Syntax A syntax

More information

CSE302: Compiler Design

CSE302: Compiler Design CSE302: Compiler Design Instructor: Dr. Liang Cheng Department of Computer Science and Engineering P.C. Rossin College of Engineering & Applied Science Lehigh University March 20, 2007 Outline Recap LR(0)

More information

Chapter 3: Lexing and Parsing

Chapter 3: Lexing and Parsing Chapter 3: Lexing and Parsing Aarne Ranta Slides for the book Implementing Programming Languages. An Introduction to Compilers and Interpreters, College Publications, 2012. Lexing and Parsing* Deeper understanding

More information

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;} Compiler Construction Grammars Parsing source code scanner tokens regular expressions lexical analysis Lennart Andersson parser context free grammar Revision 2012 01 23 2012 parse tree AST builder (implicit)

More information

S Y N T A X A N A L Y S I S LR

S Y N T A X A N A L Y S I S LR LR parsing There are three commonly used algorithms to build tables for an LR parser: 1. SLR(1) = LR(0) plus use of FOLLOW set to select between actions smallest class of grammars smallest tables (number

More information

Question Bank. 10CS63:Compiler Design

Question Bank. 10CS63:Compiler Design Question Bank 10CS63:Compiler Design 1.Determine whether the following regular expressions define the same language? (ab)* and a*b* 2.List the properties of an operator grammar 3. Is macro processing a

More information

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017 Compilerconstructie najaar 2017 http://www.liacs.leidenuniv.nl/~vlietrvan1/coco/ Rudy van Vliet kamer 140 Snellius, tel. 071-527 2876 rvvliet(at)liacs(dot)nl college 3, vrijdag 22 september 2017 + werkcollege

More information

Compiler Design Concepts. Syntax Analysis

Compiler Design Concepts. Syntax Analysis Compiler Design Concepts Syntax Analysis Introduction First task is to break up the text into meaningful words called tokens. newval=oldval+12 id = id + num Token Stream Lexical Analysis Source Code (High

More information

Top down vs. bottom up parsing

Top down vs. bottom up parsing Parsing A grammar describes the strings that are syntactically legal A recogniser simply accepts or rejects strings A generator produces sentences in the language described by the grammar A parser constructs

More information

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program. COMPILER DESIGN 1. What is a compiler? A compiler is a program that reads a program written in one language the source language and translates it into an equivalent program in another language-the target

More information

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

MIT Specifying Languages with Regular Expressions and Context-Free Grammars MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Language Definition Problem How to precisely

More information

Chapter 2 :: Programming Language Syntax

Chapter 2 :: Programming Language Syntax Chapter 2 :: Programming Language Syntax Michael L. Scott kkman@sangji.ac.kr, 2015 1 Regular Expressions A regular expression is one of the following: A character The empty string, denoted by Two regular

More information

CS 314 Principles of Programming Languages

CS 314 Principles of Programming Languages CS 314 Principles of Programming Languages Lecture 5: Syntax Analysis (Parsing) Zheng (Eddy) Zhang Rutgers University January 31, 2018 Class Information Homework 1 is being graded now. The sample solution

More information

UNIT-III BOTTOM-UP PARSING

UNIT-III BOTTOM-UP PARSING UNIT-III BOTTOM-UP PARSING Constructing a parse tree for an input string beginning at the leaves and going towards the root is called bottom-up parsing. A general type of bottom-up parser is a shift-reduce

More information

Introduction to Syntax Analysis. The Second Phase of Front-End

Introduction to Syntax Analysis. The Second Phase of Front-End Compiler Design IIIT Kalyani, WB 1 Introduction to Syntax Analysis The Second Phase of Front-End Compiler Design IIIT Kalyani, WB 2 Syntax Analysis The syntactic or the structural correctness of a program

More information

LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States

LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States TDDD16 Compilers and Interpreters TDDB44 Compiler Construction LR Parsing, Part 2 Constructing Parse Tables Parse table construction Grammar conflict handling Categories of LR Grammars and Parsers An NFA

More information

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised: EDAN65: Compilers, Lecture 06 A LR parsing Görel Hedin Revised: 2017-09-11 This lecture Regular expressions Context-free grammar Attribute grammar Lexical analyzer (scanner) Syntactic analyzer (parser)

More information

More Bottom-Up Parsing

More Bottom-Up Parsing More Bottom-Up Parsing Lecture 7 Dr. Sean Peisert ECS 142 Spring 2009 1 Status Project 1 Back By Wednesday (ish) savior lexer in ~cs142/s09/bin Project 2 Due Friday, Apr. 24, 11:55pm My office hours 3pm

More information

Chapter 4. Lexical and Syntax Analysis

Chapter 4. Lexical and Syntax Analysis Chapter 4 Lexical and Syntax Analysis Chapter 4 Topics Introduction Lexical Analysis The Parsing Problem Recursive-Descent Parsing Bottom-Up Parsing Copyright 2012 Addison-Wesley. All rights reserved.

More information

Chapter 3. Describing Syntax and Semantics ISBN

Chapter 3. Describing Syntax and Semantics ISBN Chapter 3 Describing Syntax and Semantics ISBN 0-321-49362-1 Chapter 3 Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax Copyright 2009 Addison-Wesley. All

More information

Unit 13. Compiler Design

Unit 13. Compiler Design Unit 13. Compiler Design Computers are a balanced mix of software and hardware. Hardware is just a piece of mechanical device and its functions are being controlled by a compatible software. Hardware understands

More information

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University CS415 Compilers Syntax Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Limits of Regular Languages Advantages of Regular Expressions

More information

Bottom-Up Parsing. Lecture 11-12

Bottom-Up Parsing. Lecture 11-12 Bottom-Up Parsing Lecture 11-12 (From slides by G. Necula & R. Bodik) 9/22/06 Prof. Hilfinger CS164 Lecture 11 1 Bottom-Up Parsing Bottom-up parsing is more general than topdown parsing And just as efficient

More information

Concepts Introduced in Chapter 4

Concepts Introduced in Chapter 4 Concepts Introduced in Chapter 4 Grammars Context-Free Grammars Derivations and Parse Trees Ambiguity, Precedence, and Associativity Top Down Parsing Recursive Descent, LL Bottom Up Parsing SLR, LR, LALR

More information

CS 406/534 Compiler Construction Parsing Part I

CS 406/534 Compiler Construction Parsing Part I CS 406/534 Compiler Construction Parsing Part I Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy and Dr.

More information

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI UNIT I - LEXICAL ANALYSIS 1. What is the role of Lexical Analyzer? [NOV 2014] 2. Write

More information

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing Review of Parsing Abstract Syntax Trees & Top-Down Parsing Given a language L(G), a parser consumes a sequence of tokens s and produces a parse tree Issues: How do we recognize that s L(G)? A parse tree

More information

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Massachusetts Institute of Technology Language Definition Problem How to precisely define language Layered structure

More information

It parses an input string of tokens by tracing out the steps in a leftmost derivation.

It parses an input string of tokens by tracing out the steps in a leftmost derivation. It parses an input string of tokens by tracing out CS 4203 Compiler Theory the steps in a leftmost derivation. CHAPTER 4: TOP-DOWN PARSING Part1 And the implied traversal of the parse tree is a preorder

More information

Introduction to Syntax Analysis

Introduction to Syntax Analysis Compiler Design 1 Introduction to Syntax Analysis Compiler Design 2 Syntax Analysis The syntactic or the structural correctness of a program is checked during the syntax analysis phase of compilation.

More information

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES THE COMPILATION PROCESS Character stream CS 403: Scanning and Parsing Stefan D. Bruda Fall 207 Token stream Parse tree Abstract syntax tree Modified intermediate form Target language Modified target language

More information

Languages and Compilers

Languages and Compilers Principles of Software Engineering and Operational Systems Languages and Compilers SDAGE: Level I 2012-13 3. Formal Languages, Grammars and Automata Dr Valery Adzhiev vadzhiev@bournemouth.ac.uk Office:

More information

CS 403: Scanning and Parsing

CS 403: Scanning and Parsing CS 403: Scanning and Parsing Stefan D. Bruda Fall 2017 THE COMPILATION PROCESS Character stream Scanner (lexical analysis) Token stream Parser (syntax analysis) Parse tree Semantic analysis Abstract syntax

More information

Part 5 Program Analysis Principles and Techniques

Part 5 Program Analysis Principles and Techniques 1 Part 5 Program Analysis Principles and Techniques Front end 2 source code scanner tokens parser il errors Responsibilities: Recognize legal programs Report errors Produce il Preliminary storage map Shape

More information

Bottom up parsing. The sentential forms happen to be a right most derivation in the reverse order. S a A B e a A d e. a A d e a A B e S.

Bottom up parsing. The sentential forms happen to be a right most derivation in the reverse order. S a A B e a A d e. a A d e a A B e S. Bottom up parsing Construct a parse tree for an input string beginning at leaves and going towards root OR Reduce a string w of input to start symbol of grammar Consider a grammar S aabe A Abc b B d And

More information

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis. Topics Chapter 4 Lexical and Syntax Analysis Introduction Lexical Analysis Syntax Analysis Recursive -Descent Parsing Bottom-Up parsing 2 Language Implementation Compilation There are three possible approaches

More information

Bottom-Up Parsing II. Lecture 8

Bottom-Up Parsing II. Lecture 8 Bottom-Up Parsing II Lecture 8 1 Review: Shift-Reduce Parsing Bottom-up parsing uses two actions: Shift ABC xyz ABCx yz Reduce Cbxy ijk CbA ijk 2 Recall: he Stack Left string can be implemented by a stack

More information

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing Abstract Syntax Trees & Top-Down Parsing Review of Parsing Given a language L(G), a parser consumes a sequence of tokens s and produces a parse tree Issues: How do we recognize that s L(G)? A parse tree

More information

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing Review of Parsing Abstract Syntax Trees & Top-Down Parsing Given a language L(G), a parser consumes a sequence of tokens s and produces a parse tree Issues: How do we recognize that s L(G)? A parse tree

More information

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1 CIT3136 - Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1 Definition of a Context-free Grammar: An alphabet or set of basic symbols (like regular expressions, only now the symbols are whole tokens,

More information

Building Compilers with Phoenix

Building Compilers with Phoenix Building Compilers with Phoenix Syntax-Directed Translation Structure of a Compiler Character Stream Intermediate Representation Lexical Analyzer Machine-Independent Optimizer token stream Intermediate

More information

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax Susan Eggers 1 CSE 401 Syntax Analysis/Parsing Context-free grammars (CFG s) Purpose: determine if tokens have the right form for the language (right syntactic structure) stream of tokens abstract syntax

More information

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh. Bottom-Up Parsing II Different types of Shift-Reduce Conflicts) Lecture 10 Ganesh. Lecture 10) 1 Review: Bottom-Up Parsing Bottom-up parsing is more general than topdown parsing And just as efficient Doesn

More information

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing 8 Parsing Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces strings A parser constructs a parse tree for a string

More information

COMPILER DESIGN - QUICK GUIDE COMPILER DESIGN - OVERVIEW

COMPILER DESIGN - QUICK GUIDE COMPILER DESIGN - OVERVIEW COMPILER DESIGN - QUICK GUIDE http://www.tutorialspoint.com/compiler_design/compiler_design_quick_guide.htm COMPILER DESIGN - OVERVIEW Copyright tutorialspoint.com Computers are a balanced mix of software

More information

Lexical and Syntax Analysis

Lexical and Syntax Analysis Lexical and Syntax Analysis (of Programming Languages) Top-Down Parsing Lexical and Syntax Analysis (of Programming Languages) Top-Down Parsing Easy for humans to write and understand String of characters

More information

Syntactic Analysis. Top-Down Parsing

Syntactic Analysis. Top-Down Parsing Syntactic Analysis Top-Down Parsing Copyright 2017, Pedro C. Diniz, all rights reserved. Students enrolled in Compilers class at University of Southern California (USC) have explicit permission to make

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Context Free Grammars and Parsing 1 Recall: Architecture of Compilers, Interpreters Source Parser Static Analyzer Intermediate Representation Front End Back

More information

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3 Programming Language Specification and Translation ICOM 4036 Fall 2009 Lecture 3 Some parts are Copyright 2004 Pearson Addison-Wesley. All rights reserved. 3-1 Language Specification and Translation Topics

More information

CS 321 Programming Languages and Compilers. VI. Parsing

CS 321 Programming Languages and Compilers. VI. Parsing CS 321 Programming Languages and Compilers VI. Parsing Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = words Programs = sentences For further information,

More information

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2 Formal Languages and Grammars Chapter 2: Sections 2.1 and 2.2 Formal Languages Basis for the design and implementation of programming languages Alphabet: finite set Σ of symbols String: finite sequence

More information

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones Parsing III (Top-down parsing: recursive descent & LL(1) ) (Bottom-up parsing) CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones Copyright 2003, Keith D. Cooper,

More information

Theory and Compiling COMP360

Theory and Compiling COMP360 Theory and Compiling COMP360 It has been said that man is a rational animal. All my life I have been searching for evidence which could support this. Bertrand Russell Reading Read sections 2.1 3.2 in the

More information

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form Bottom-up parsing Bottom-up parsing Recall Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form If α V t,thenα is called a sentence in L(G) Otherwise it is just

More information

ICOM 4036 Spring 2004

ICOM 4036 Spring 2004 Language Specification and Translation ICOM 4036 Spring 2004 Lecture 3 Copyright 2004 Pearson Addison-Wesley. All rights reserved. 3-1 Language Specification and Translation Topics Structure of a Compiler

More information

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant Syntax Analysis: Context-free Grammars, Pushdown Automata and Part - 4 Department of Computer Science and Automation Indian Institute of Science Bangalore 560 012 NPTEL Course on Principles of Compiler

More information

Part 3. Syntax analysis. Syntax analysis 96

Part 3. Syntax analysis. Syntax analysis 96 Part 3 Syntax analysis Syntax analysis 96 Outline 1. Introduction 2. Context-free grammar 3. Top-down parsing 4. Bottom-up parsing 5. Conclusion and some practical considerations Syntax analysis 97 Structure

More information

Principles of Programming Languages

Principles of Programming Languages Principles of Programming Languages h"p://www.di.unipi.it/~andrea/dida2ca/plp- 14/ Prof. Andrea Corradini Department of Computer Science, Pisa Lesson 8! Bo;om- Up Parsing Shi?- Reduce LR(0) automata and

More information