General Overview of Compiler

Size: px

Start display at page:

Download "General Overview of Compiler"

Lisa Martin
6 years ago
Views:

1 General Overview of Compiler Compiler: - It is a complex program by which we convert any high level programming language (source code) into machine readable code. Interpreter: - It performs the same task of compiler but in line by line passion. Assembler: - It converts assembly level instructions into machine level instructions as a binary code. Translator: - It is a program which converts any language into any other language for synchronization. Compiler is also a translator. Compiler and its Stages or Phases of Compiler or The structure of a Compiler Up to this point we have treated a compiler as a single box that maps a source program into semantically equivalent target program. If we open up this box a little, we see that there are two parts to this mapping analysis and synthesis. These analysis and synthesis parts are also known as the front end and the back end of the compiler. The analysis part breaks up the source program into constituent pieces and imposes a grammatical structure on them. The analysis part also collects information about the source program and stores it in a data structure called symbol table. The synthesis part constructs the desired target program from the intermediate representation and the information in the symbol table. The analysis part is often called the front end of the compiler and the synthesis part is the back end of the compiler.

2 1. Lexical analysis 2. Syntax analysis Front end of compiler 3. Semantic analysis 4. Intermediate code generation 5. Code optimization Back end of compiler 6. Code generation Character Stream (Input file) Lexical Analysis Syntax Analysis Token stream Syntax tree Semantic Analysis Syntax tree SYMBOL TABLE Intermediate Code Generator Intermediate representation Code Optimizer Intermediate representation Code Generator Target-machine code Code optimizer Target-machine code (output file) ERROR

3 In the above diagram, figure shows the Phases of a compiler. Now we are going to talk about the general description of all the phases of a compiler. 1. Lexical Analysis:- It is a scanner which scans input value one by one in left to right manner. It produces output with entire description of each scanned value E.g. Position: = Initial + Rate * 60 In this example. Id 1 = Position := = Assignment Operator Id 2 = Initial + = Addition Operator Id 3 = Rate * = Multiplication Operator 60 = A number 2. Syntax Analysis (parser):- This phase validates the syntax of expression. For this purpose, we construct syntax tree or parser tree. E.g. Make a tree for the given equation. C := a + b Note: - Priority for symbols ( > ( *, /) > (+, -) > = or :=)

4 := c + a b The above equation is for c := a + b. Now the following equation is for Position := Initial + Rate * 60 := Position + Initial * Rate 60

5 3. Semantic Analysis :- This phase is used to match data types and context of programming language. It also converts program statement according to target language (Machine language). := Id 1 + Id 2 * Id 3 Int to Real 60 Output of the above tree by the semantic analysis:- id 1 = id 2 + id 3 * Intermediate Code Generation : - In this phase, we construct TAC (Three Address Codes) by using temporary registers. In TAC, we use maximum of three operands and minimum of two and we use maximum of two operators including necessary assignment operators.

6 1) t 1 = a + b (TAC condition applies) 2) t 2 = a (temporary register) Example:- id 1 = id 2 + id 3 * 60 t 1 = 60.0 t 1 = id 3 t 2 = id 3* t 1 OR t 2 = t 1 *60.0 t 3 = id 2 + t2 t 3 = id 2 + t 2 t 4 = t 3 id 1 = t 3 5. Code Optimization:- It is a technique where we modify, alter or re-arrange or minimize intermediate code sequence for better utilization memory and to increase speed of execution without changing the meaning of original code. Example: - t 1 = id 3 * 60.0, id 1 = id 2 + t 1 6. Code Generation: - The code generation takes an input as intermediate representation of the source program and maps it into the target language. If the target language is machine code, registers or memory locations are selected for each of the variable used by the program. For example: Using registers R 0 and R 1, the intermediate code given below might get translated into the machine code. t 1 = id 3 * 60.0, id 1 = id 2 + t 1

7 Operation Name Operation From Operation To Comments given MOV id 3 R 0 id 3 to R 0 MUL 60.0 R 0 t 1 is in R 0 MOV id 2 R 1 id 2 is in R 1 ADD R 0 R 1 id 1 in R 1, R 0 is empty MOV R 1 id 1 R 1 is empty because of leftmost side ***********************************************************

8 Basic Concepts 1) The scanning work is also known as lexical analysis. 2) Mike Lesk and Shimdit were the inventors of lexical analysis. 3) Output of lexical analysis is also called lexemes. If c = a + b Then c, a and b are lexemes. 4) Our eye is the best example of lexical analysis because it first scans the thing and then identifies it. 5) The program which is used for lexical analysis is called a Lex Program. 6) Tokens are just the collection of lexemes. C = a + b Where a, b and c are identifiers 7) The lexemes are of three types:- 1. Static 2. Dynamic 3. Variable 8) Example of white spaces:- endl, /n, extra <space> etc. 9) Regular expressions are just the predefined writing syntax. 10) Some basic formulas:- 1. a* = Kleen s closure = { ε, a, aa, aaa, aaaa, aaaaa..} 2. a + = Positive closure = {a, aa, aaa, aaaa, aaaaa,.} 3. (a + b)* = { ε, a, aa, aaa ab, ba, aba.} Note:- This is called regular set of (a + b) 4. Id = letter(letter/digit)* Note:- It implies that, first position of any id is always letter. 5. <> is the sign of not equals in programming language. ***********************************************************

9 Some More Basic Concepts 1) Yacc (Yet Another Compiler Compiler). It is used to help the syntax analysis to make the tree after lexical analysis. 2) a*, a +, ab, a+b, abb, (a+b)*, a/b, (a,b), aub are all regular expressions ab a/b anb AND Operation a,b ab aub OR Operation a+b a* = Kleen s Closure (a+b)* =Universal closure expression or universal regular expression. = { ε, a, aa, aba, bab, abb..} 3) letter(letter/digit)*, it means, we can write: c12 = a + b or c12a = a + b But we cannot write the following one: 21 = a + b 4) Lex is the super scanner of lexical analysis. 5) In the following figure a, b and c are known as lexeme values. := c + a b 6) yylval stands for yy lexical value. It means the lexeme value is going directly to yacc. Install_id( ) Install and forward the id or identifier.

10 Overview of Finite State Automata 1. It is used in lexical analysis or scanning phase of compiler. 2. It is used to implement statement or regular expression of any programming language. Finite Automata and regular expressions are acting as foundation of lexical analysis. 3. Automata:- It is an automatic machine developed or designed by a developer, programmer or manufacturer to complete any desired task. E.g. Automobiles, calculators, computers, home appliances, super computers, microwave technologies, generators etc. 4. Finite Automata or Finite State Machine: A machine which compute finite number of computations is called finite automata or finite state machine 4.2. Formal Definition:- It consists of five tuples:- M = {Q,,, q 0, F} Where M = Machine Q = Non empty set of all states. = {q 0, q 1, q 2, q 3,..q n } = Non empty set of input values. = {a, b, c, 1, 2, 3, *, /, (, ),, } δ = Input transition function represented by transition or by transition diagram. q 0 = Default initial state. F = Set of final state. E.g. An example of transition table (Rotation of a fan). State Switch Rotation OFF ON 100rpm 100rpm 200rpm 200rpm 300rpm 300rpm OFF

11 Note:- Here rpm is rotation per minute Types of Finite Automata:- There are two types of Finite Automata:- 1. NFA or NDFA (Non Finite Automata or Non Deterministic Finite Automata) 2. DFA (Deterministic Finite Automata) 4.4. Technical definition :- 1. DFA:- δ (Q X ) Ⱶ Q 2. NFA:- δ (Q X ) Ⱶ 2 Q Where 2 Q is the power set of all the states (multiple outputs) E.g. If A = {a, b, c} Then 2 A = {ɸ, a, b, c, ab, bc, ca, abc, } Implementation of Lex Program with DFA Rules:- 1. For given lex program, check regular definition or regular expressions associated with it. 2. Construct NFA for each individual regular expressions and define initial state and final state 3. Give a unique name to each valuable state. 4. Assume a common initial state and connect it with all NFA by using ε transition. 5. Draw the empty DFA transition table and initialize it with joint initial state combination. 6. Construct new output states by checking input values and apply DFA construction rules accordingly.

12 7. Check input patterns associated with each input state and enter matched pattern value in patter announced column of DFA table. In this way, lex program will be implemented by DFA with associated patterns. Question:- Solution:- 1) NFA for a % { C declaration (empty) } % { regular definition } a abb a*b + % { translation rules (empty)} % Start aa 1 2 2) NFA for abb start a b b

13 3) NFA for a*b + Start b 7 8 ab 4) Now according to rule number 4, combine all NFAs with the help of ε. є a є a b b є b 7 8 ab Note: - In the following table, the Pattern announced are the common ways to reach the input state.

14 5) Transition table :- δ a b Pattern announced [0137] [247] [8] No pattern [ ] [7] [ 5 8 ] a [ 8 ] - [ 8 ] a*b + [ 7 ] [7] [ 8 ] a* [ 5 8 ] - [ 6 8 ] ab [ 6 8 ] - [ 8 ] abb *********************************************************** Grammar & Language 1. Language:- It is alpha numeric, symbolic, alphabetic, syntactical way of representation by which we form some words and by arranging words in a meaningful sequence we form some sentences. Sentences are helpful to establish a communication link or interaction between two machines, two humans and human beings with the machine. E.g. Programming language (C, C++, and JAVA), frameworks (Dot Net), general languages (English, Hindi, Urdu, Marathi, and Telugu etc.), interfaces and drivers. 2. Grammar: - Set of rules to define any language so that the communication will be meaningful. e.g. #include<iostream.h> can t be written as include#<iostream>.h. Every programming language has to follow some language and grammar rules.

15 Formal definition of Grammar It consists of four tuples:- G = {V, T, P, S} Where V = Non empty set of variables or non-terminals. = {A, B, C, D, E Z} T = Non empty set of terminals. = {a, b, c, d, e.z } P = Non empty set of production rules. S = Default starting production variable. We can understand with the following example. Example:- S ab/ba A d B g Note: - S ab/ba can also be written as S ab & S ba separately. Derivation: - Any string value can be derived by any grammar production and it is known as derivation. Acceptability: - If any string is generated by starting production variable that the string is accepted by the grammar. There are two types of Derivation:- 1. LMD (Left Most Derivation):- It is used in top down parsing approach. To generate any string, if we open left most non-terminal before other nonterminal, then, it is an LMD. This technique is based on backtracking. E.g.: S ABC Where A a B b C c

16 Then according to the rule of LMD, we can solve the above expression as follows S ABC S abc S abc S abc Note In the given expression S ABC, S is called α or derivative part and ABC is called as β or derived part 2. RMD (Right Most Derivation):- It is used in bottom up parsing approach. Whenever we open right most non terminal before others to generate any string, then, it is an RMD. This technique is also based on backtracking. E.g.: S ABC Where A a B b C c Then according to the rule of RMD, we can solve the above expression as follows S ABC S ABc S Abc S abc Derivation Tree:- Step by step, expansion process of any string can be expressed by a tree known as derivation or parser tree. E.g.: S ABC Where A a B b C c The derivation tree of the above expression can be made as follows.

17 S A B C a b c Question: - Generate the string for the following:- (1) id + id * id (2) (id + id) * id By the grammar as follows:- E E + T/T T T * F/ F F (E)/ id Solve the above equations by LMD and RMD. Solution:- (1) id + id * id Solve by LMD :- First we take the grammar E E + T/T and solve it by LMD E E + T E E + T E T + T E F + T E id + T E id + T * F E id + F * F E id + id * F E id + id * id E T E T E F E (E) This case is not possible because here the brackets ( ) are not there in the string id + id * id. Now, consider the following case.

18 E T E T * F E F * F E id * F Now, this case is also not possible because of * sign come first here after id. Now solve by RMD :- E E + T E E + T * F E E + T * id E E + F * id E E + id * id E T + id * id E F + id * id E id + id * id E T is also not possible here in RMD. (2) (id + id) * id Solve by LMD :- First we take the grammar E E + T/T and solve it by LMD E E + T Here the above grammar can t be possible in this case. So, without wasting our time, we need to go to the further case. E T E T E T * F E F * F E (E) * F E (E + T) * F E (T + T) * F E (F + T) * F E (id + T) * F E (id + F) * F E (id + id) * F

19 E (id + id) * id Solved by RMD :- E E + T is not possible here. E T E T E T * F E T * id E F * id E (E) * id E (E + T) * id E (E + F) * id E (T + id) * id E (F + id) * id E (id + id) * id If you want to make a derivation tree for the above grammars, then I will make a tree for you as an example. I am going to make a tree for grammar LMD of E T which is given as follows:- LMD of E T E T T * F F id (E) + T T F F id id

20 Left recursion: - Whenever any non-terminal produces itself at left most position of grammar production, then, it is a left recursion. Example:- S Sab/b Note: - S produces itself at leftmost part as indicated here. Drawbacks of recursion (also known as repetition):- 1. Ambiguous. 2. Repetition. Example:- S Sab S a b S a b Format of Left Recursion Technique:- If A Aα/β Then it is a single left recursion. Where A Є V α Є (V u T)* (any value) β Є (V u T)* (any value) Note: - Consider the following case:- S S + T/T A A α/β Where S represents A + T represents α T represents β

21 Multiple Left Recursions: - Whenever any non-terminal produces itself at left most position of grammar production but in a multiple way, then, it is a multiple left recursion. Consider an example as follows:- Example:- If A Aα 1 / Aα 2 / Aα 3 /.. /Aα n And A β 1 / β 2 / β 3 / β 4 /../β n It can also be written in the following manner:- E E + T/ E F/ E * T/a/b A A α 1 / A α 2 / A α 3 /β 1 /β 2 Where E = A +T = α 1 F = α 2 * T = α 3 a = β 1 And b = β 2 Removal of left recursion:- 1. Single A A α/β The removal formula of the above expression is:- A β B B α B/ є 2. Multiple A A α/β Then the removal formula of the above expression is:- A β 1 B/ β 2 B/ β 3 B/ β 4 B β n B B α 1 B/ α 2 B/ α 3 B/ α 4 B/ α n B Question:- Remove the recursion for the following grammar. E E + T/T

22 Solution:- E T B B +T B/ є Note :- When E E + T Then E T B B +T B/ є Now we can solve it by another method as follows:- E E + T (Convert E in this equation into E + T) E E + T + T (Convert E in this equation into T) E T + T + T Now take E EB (Convert B here into +TB) E T + TB (Convert B here into +TB) E T + T + T (Here B is converted into є) Question:- Remove the recursion. E E + T/E*F/a/b/d Solution:- E ab / bb / db B +T B / *FB / є Question :- Remove the left recursion. 1. E E + T/T 2. T T * F/F 3. F (E) / id Solution:- 1. E TB B +TB / є 2. T FB B *FB/ є

23 3. No left recursion is there. Indirect Left Recursion:- Whenever any non-terminal produces itself at many position of grammar production way (indirectly), then, it is a indirect left recursion. Example:- S AA/0 (1) A SS/1 (2) Now put equation (2) in (1), then we get S SSA/0 We have to take the following steps for the removal of these types of recursions. The steps are as follows:- 1. Reduce left recursion. 2. Removal of it. Now we are going to apply these steps in the following examples. Example:- 1) S SSA/1A/0 A SS/1 2) S 1AB/0B B SAB/ є A SS/1 There are some other ways to solve this problem which are given as follows:- 1) S AA/0 A AAS/0 S/1 2) S AA/0 B 0SB/ 1B A ASB/ є Left Factor: - Whenever any value repeat itself at leftmost position of any grammar production more than it is left factored value. Example:-

24 S ab / ac / ad / a / b / g S abc / abd / ab / g (One common factor i.e. a ) (Combination of left common factor i.e. ab ) Note: - There can be one or more than one left factors. Format for Left Factor:- A αβ 1 / αβ 2 / αβ 3 /.. αβ n /γ 1 /γ 2 /γ 3 γ n Example:- 1. S ab/ac/d Then the format is:- S aa/d A B / C / є 2. S abd / abg / ab / a / d Take ab as left factor. S aba/a/d A D / G / є S ac / d C BA / є A D / G / є S aa / d A BD / BG / B / є S aa / d A BC / є C D / G / є Parsing: - It is the technique where we construct a fixed parser record of any grammar. By using parser, we can check acceptability or rejection of any string by the grammar for which we construct parser. It is predictive technique. Note:- LMD and RMD are non-predictive or with backtracking technique.

25 Classification of Parsing:- The classification of parsing is given below. Because of lack of space, first I will define the abbreviations used in the classification and then provide the hierarchical diagram or classification diagram of parsing. The definitions are:- SLR (Simple) (Left to right scan) (RMD) LR (0) (Left to right scan) (RMD) (No look ahead values) SLR (1) (Simple) (Left to right scan) (RMD) (One entry is permitted in parsing table) LR (Left to right scan) CLR (Canonical) (Left to right scan) (RMD) LR (1) (Left to right scan) (RMD) (One set of look ahead values) LALR (Look Ahead) (Left to right scan) (RMD) LALR (1) - (Look Ahead) (Left to right scan) (RMD) (Single entry in the table) Note:-Canonical = One shape with different names. LALR = One shape with different names but are merged together to form a single entity. Now the classification diagram is given as follows:- Parsing Top Down Parser(LMD) Bottom Up Parser(RMD) With backtracking Without backtracking Shift reduce parsing Recursive decent parser Non-recursive Operation LR parser or table procedure driven parser parser SLR LR LALR [LL (1)] or or or LR(0) CLR LALR(1) or or or SLR(1) LR(1) Merge (LR)

26 FIRST & FOLLOW FIRST and FOLLOW:- FIRST It is first terminal value produced by any non-terminal at derived side in all possible ways If S ab Then FIRST(S) = a FOLLOW It is also a terminal value which appears after any non-terminal at derived side of grammar production. S aad Then FOLLOW (A) = d Algorithms for FIRST and FOLLOW:- Algorithm for FIRST:- Rules 1. If A ε is any production, then FIRST (A) = ε 2. It is a first terminal value produced by any non-terminal in all possible ways, which will be discussed in next lemmas or rules. 3. If A αβ is any production where A Є V α Є T β Є (V U T)* or in other words A is derived. α is single terminal. β can contain any value. Then FIRST (A) = α NOTE In S bd, b Є α and D Є β. Example:- If S abcdefgh Then FIRST (S) = a 4. If A αβ is any production, where α contain single non-terminal and α never tends to ε anywhere in the grammar, then:- FIRST (A) = FIRST (α)

27 Example:- If S AB A ab B d Then FIRST (S) = FIRST (A) = a 5. If A αβ is any production, where α contain single non-terminal and α produces ε anywhere in the grammar, then:- FIRST (A) = FIRST (α) But, for α ε possibility, we check next to α and apply rule 1, 3, 4 and 5. Example:- S AB A ab/ε B d Non- terminal S A B FIRST a, d a, ε d Algorithm for FOLLOW:- 1. A non-terminal for which we calculate FOLLOW value always appears derived side of production. The terminal value arrived after non terminal will be FOLLOW value of that non-terminal. 2. Add $ in FOLLOW of starting production variable directly. 3. If A αbβ is any production where FOLLOW (B) is to be calculated. α Є (V U T)* A Є V Then β Є T FOLLOW (B) = β Example:- S abd A abg B bbe Then FOLLOW (B) = {d, g, e} 4. If A αbβ, is any production where β contain single non-terminal and β never tends to ε, then:-

28 FOLLOW (B) = FIRST (β) Example:- S BA A ab/ba Then FOLLOW (B) = a, b = FIRST (A) 5. If A αbβ, is any production where β contain single non-terminal and β produces ε, then :- FOLLOW (B) = FIRST (β) and for β ε or for A Αb FOLLOW (B) = FOLLOW (A) Example:- S BA A ab/ba/ε Non-terminal FOLLOW S $ A $ B a, b, $ 6. If A αbβ is any production where β contains any value, then FOLLOW of B is totally dependent on FIRST of β. Apply rules 3, 4, 5 and 6 accordingly after checking FIRST of β. Also check next to β is possible. Example:- If S abdefgh Then FOLLOW (B) = d If S BAefgh Then FOLLOW (B) = efgh Question:- Find the FIRST and FOLLOW for the following. E TE E +TE / ε T FT T *FT / ε F (E)/ id Solution:- Non-terminal FIRST FOLLOW E {(, id} {$, ) }

29 E {+, ε} {$, ) } T {(, id} {+, $, ) } T {*, ε} {+, $, ) } F {(, id} {*, +, $, ) } 1. FIRST (F) = FIRST (T) = FIRST (E) = {(, id}. To see why, note that the two productions for F have bodies that start with these two terminal symbols, id and the left parenthesis. T has only one production, and its body starts with F. Since F does not derive ε, FIRST (T) must be the same as FIRST (F). The same argument covers FIRST (E). 2. FIRST (E') = {+, ε}. The reason is that one of the two productions for E' has a body that begins with terminal +, and the other's body is ε. whenever a nonterminal derives ε, we place ε in FIRST for that nonterminal. 3. FIRST (T') = {*, ε}. The reasoning is analogous to that for FIRST ( E ' )- 4. FOLLOW (E) = FOLLOW (E') = {), $}. Since E is the start symbol, FOLLOW (E) must contain $. The production body (E) explains why the right parenthesis is in FOLLOW (E). For E', note that this nonterminal appears only at the ends of bodies of E-productions. Thus, FOLLOW (E') must be the same as FOLLOW (E). 5. FOLLOW (T) = FOLLOW (T') = {+, ), $}. Notice that T appears in bodies only followed by E'. Thus, everything except ε that is in FIRST (E') must be in FOLLOW (T); that explains the symbol +. However, since FIRST (E') contains ε, and E' is the entire string following T in the bodies of the E-productions, everything in FOLLOW (E) must also be in FOLLOW (T). That explains the symbols $ and the right parenthesis. As for T', since it appears only at the ends of the T-productions, it must be that FOLLOW (T') = FOLLOW (T). 6. FOLLOW (F) = {+, *,), $}. The reasoning is analogous to that for T in point (5). ***********************************************************

30 LL (1)Parser Rules:- 1. Remove left recursion or left factor from the given grammar, if available. 2. Calculate FIRST and FOLLOW. 3. Construct LL (1) parsing table according to table construction rules. 4. Check LL (1) parsing table for multiple entries. If found, then, declare the parser is NOT LL (1) parser. 5. Check the acceptability of string by LL (1) parsing table. Practice questions for FIRST and FOLLOW:- Question: - Calculate the FIRST and FOLLOW for the following:- S CC S cc/d Solution:- Non-terminal FIRST FOLLOW S c, d $ C c, d c, d, $ Question: - Calculate the FIRST and FOLLOW for the following:- S aab A abd/b B ba/ε Solution:- Non-terminal FIRST FOLLOW Depends On S A $ A a, b, ε b, $ B B b, ε $, d, b Question: - Calculate the FIRST and FOLLOW for the following:- S asd / ABC A BC / bac B cb / CD / ecf C gba / hdi / D

31 D jd / Dk /ε Solution:- Non-terminal FIRST FOLLOW Depends On S a, b, c, e, g, h, j, k, ε $, d A b, c, e, g, h, j, k, ε g, h, j, k, c, e, $, d, f, b C B c, e, g, h, j, k, ε g, h, j, k, $, d, b, c, e, f A, C C g, h, j, k, ε f, j, k, $, d, g, h, c, e, b B, A D j, k, ε i, k, f, j, k, $, d, g, h, c, e, b B, C Note: - Put D as ε to get the k in the FIRST (D). Questions for LL (1):- Question: - Make LL (1) for the following grammar:- E E + T/T T T *F/ F F (E)/ id And strings:- (3) id + id *id (4) (id + id) * id Solution:- 1. Removal of left recursion. E TE E +TE / ε T FT T *FT / ε F (E)/ id 2. Calculate FIRST and FOLLOW. Non-terminal FIRST FOLLOW E {(, id} {$, ) } E {+, ε} {$, ) } T {(, id} {+, $, ) } T {*, ε} {+, $, ) } F {(, id} {*, +, $, ) }

32 3. Arrange the non-terminal row wise & all terminals column wise including $ and excluding ε. Input Symbols + * ( ) id $ E E T E' E T E' E' E +TE E ε E ε T T FT T FT T' T ε T *FT T ε T ε F F (E) F id Non-terminals Table entry rules (To fill out the above table):- 1) Enter FIRST generating production in row of FIRST generating non-terminal with column of FIRST terminal value. 2) Enter ε production in row of ε derivative with column of FOLLOW of ε derivative. Example:- N-T $ ) E E ε E ε If E ε, then Answer: - The above grammar and table satisfies the LL(1) grammar. 4. Make the LL (1) parsing table for string id + id *id. Now what do we do with this table? This table forms one part in a three part data structure. The other two parts are a stack of grammar symbols (E, E', T, T', F, +, *, (, ), int, and $), and an input stream (the expression we want to parse, already tokenized into lexemes by the scanner). We start our stack with the starting non-terminal E here. Stack Input Action $E id + id * id $ E TE $E T id + id * id $ T FT $E T F id + id * id $ F id

33 $E T id id + id * id $ POP id $E T + id * id $ T ε $E + id * id $ E + TE $ E T + + id * id $ POP + $ E T id * id $ T FT $ E T F id * id $ F id $ E T id id * id $ POP id $ E T * id $ T *FT $ E T F * * id $ POP * $ E T F id $ F id $ E T id id $ POP id $ E T $ T ε $ E $ E ε $ $ ACCEPTED 5. Now, make LL (1) table for (id + id) * id. Stack Input Action $E (id + id) * id $ E TE $E T (id + id) * id $ T FT $E T F (id + id) * id $ F (E) $E T ) E ( (id + id) * id $ POP ( $E T ) E id + id) * id $ E TE $E T ) E T id + id) * id $ T FT $E T ) E T F id + id) * id $ F id $E T ) E T id id + id) * id $ POP id $E T ) E T + id) * id $ T ε $E T ) E + id) * id $ E +TE $E T ) E T+ + id) * id $ POP + $E T ) E T id) * id $ T FT $E T ) E T F id) * id $ F id $E T ) E T id id) * id $ POP id $E T ) E T ) * id $ T ε $E T ) E ) * id $ E ε $E T ) ) * id $ POP ) $E T * id $ T *FT

34 $E T F* * id $ POP * $E T F id $ F id $E T id id $ POP id $E T $ T ε $E $ E ε $ $ ACCEPTED *********************************************************** Bottom-Up Parsing SLR/LR (0)/SLR (1) Rules:- 1. Calculate FIRST and FOLLOW for given grammar. 2. Numbering of productions. 3. Augmentation of grammar. (Initialization) 4. Construction of LR (0) item set. 5. Construction of LR (0) parsing table. 6. Parsing table entries. (SHIFT, REDUCE, GOTO & ACCEPT) 7. Declaration of parser by checking conflict in parsing table. 8. SHIFT or GOTO or GOTO SHIFT graph. (Optional) 9. Parsing of string or acceptability of any string. Question: - Construct SLR parser for given grammar and check the acceptability of ccdd. S CC C c C / d Solution:- 1. Calculate FIRST and FOLLOW:- Non Terminal FIRST FOLLOW S c,d $ C c, d $, c, d

35 2. Numbering of productions:- S CC R 1 C c C R 2 C d R 3 3. Augmentation: - The process where we initialize starting production variable by any auxiliary variable. Example: - Ignition of matchstick before burn the gas stove. So, ignition is an augmentation. S S Scanning Rule:- 1) Whenever. (Dot) scans any non-terminal then we write all productions of it with. (Dot). 2) Whenever. scans any terminal then we stop for only such possibilities. Now:- I 0 ; S S S CC C cc C d Dot Scanning Rules:- a. Similar scanning always moves together. b. At a time, only one scanning movement is possible. c. For non-terminal, we use GOTO operation and for terminal, we use SHIFT operation. d. Whenever any new collection is found, then, declare a new item set name otherwise refer previous name for it. Note:- If S S S SC

36 Then after one scanning move. S S S S C Now, move on to the question. 4. Construction of LR (0) item set. I 0 ; I 0 ; I 0 ; I 0 ; I 2 ; I 2 ; S Ⱶ GOTO S S Ⱶ I 1 C Ⱶ GOTO S c C C cc C d Ⱶ I 2 c Ⱶ SHIFT C c C C c C C d Ⱶ I 3 d Ⱶ SHIFT C d Ⱶ I 4 C Ⱶ GOTO S CC Ⱶ I 5 c Ⱶ SHIFT Ⱶ I 3

37 I 2 ; I 3 ; I 3 ; I 3 ; d Ⱶ SHIFT Ⱶ I 4 C Ⱶ GOTO C c C Ⱶ I 6 c Ⱶ SHIFT Ⱶ I 3 d Ⱶ SHIFT Ⱶ I 4 Construction of parsing table:- a. Arrange all item sets row wise b. Arrange all terminals including $ column wise in column of ACTION. c. Arrange all non-terminals column wise in column of GOTO. Note: - In following table, S stands for SHIFT moves. Items ACTION GOTO c d $ S C I 0 S 3 S I 1 ACCEPT I 2 S 3 S 4 5 I 3 S 3 S 4 6 I 4 R 3 R 3 R 3 I 5 R 1 I 6 R 2 R 2 R 2 Declaration:- There is no conflict in the table (no dual values in single cell). So, it is an SLR parser.

38 Acceptability of String by SLR, LR (1) and LALR:- Rules:- 1. Draw three columns for STACK, INPUT and ACTION, and do the following:- a. Enter the entire input string in input column followed by $. STACK INPUT ACTION $0 ccdd$ S 3 $0c3 cdd$ S 3 $0c3c3 dd$ S 4 $0c3c3d4 d$ R 3 $0c3c3C6 d$ R 2 b. Initialize stack with $ and initial item set number. c. Check top stack with first input and:- 1) If SHIFT entry is found then PUSH first input in top stack followed by shifting number, then repeat step (c) for next input with new top stack. 2) If reduce entry is found, then we enter reduction production in ACTION column. We check derived side of reduction production and we POP double values compared with derived side of reduction production. After POP operation, we PUSH derivative of reduction production in top stack. We check GOTO entry with previous top stack along with new top stack. Then, we repeat step (c). 3) If, we found ACCEPT entry, then only the string will be accepted.

39 $0c3C6 d$ R 2 $0C2 d$ S 4 $C2d4 $ R 3 $0C2C5 $ R 1 $051 $ ACCEPTED *********************************************************** LR (1) / CLR / LR 1. Numbering of production. 2. Augmentation. 3. Construction of canonical collection of LR (1) item set. 4. Construction of LR (1) parsing table. 5. Fill out parsing table entries. 6. Declaration of parsers by checking conflicts. 7. Construct graph (SHIFT, GOTO and GOTO SHIFT). 8. Acceptability of string. Look Ahead:- 1. It is a collection of values used for reduce entry. 2. $ is default LOOK AHEAD of augmentation variable. 3. We calculate LOOK AHEADS for each new production in three possible ways. Question: - Make LR (1) parser for the following grammar. S CC C c C / d Solution:- 1) Numbering 1. S CC R 1 2. C cc R 2 3. C d R 3 2) Augmentation S.S

40 3) Construction of canonical collection of LR (1) item set. I 0 ; S S $ S CC $ C cc cd C d cd I 0 ; S. GOTO S S. $ Ⱶ I 1 I 0 ; C Ⱶ GOTO S C C $ C cc $ C d $ Ⱶ I 2 I 0 ; c Ⱶ SHIFT C c C C c C C d ⱵI 3 cd cd cd I 0 ; d Ⱶ SHIFT C d Ⱶ I 4 cd I 2 ; C Ⱶ GOTO S CC $ Ⱶ I 5

41 I 2 ; I 2 ; I 3 ; c Ⱶ SHIFT C c C $ C c C $ C d $ ⱵI 6 d Ⱶ SHIFT C d $ Ⱶ I 7 C Ⱶ GOTO C c C cd Ⱶ I 8 I 3 ; I 3 ; I 6 ; I 6 ; I 6 ; c Ⱶ SHIFT Ⱶ I 3 d Ⱶ SHIFT Ⱶ I 4 C Ⱶ GOTO C c C $ Ⱶ I 9 c Ⱶ SHIFT ⱵI 6 d Ⱶ SHIFT Ⱶ I 7

42 4) Construction of LR (1) parsing table. ITEMS ACTION GOTO c d $ S C I 0 S 3 S I 1 ACCEPTED I 2 S 6 S 7 5 I 3 S 3 S 4 8 I 4 R 3 R 3 I 5 R 1 I 6 S 6 S 7 9 I 7 R 3 I 8 R 2 R 2 I 9 R 2 *********************************************************** Question:- Check that the grammar is SLR and LR. S L = R / R L *R / id R L Solution:- 1) Numbering 1. S L=R R 1 2. S R R 2 3. L * R R 3 4. L id R 4 5. R L R 5 2) Augmentation S S 3) Construction of canonical collection of LR (1) item set. LOOK AHEADS I 0 ; S S $ S L=R $ S R $

43 L *R =$ L id =$ R L $ I 0 ; I 0 ; I 0 ; I 0 ; I 0 ; I 2 ; S Ⱶ GOTO S S $ Ⱶ I 1 L Ⱶ GOTO S L =R $ R L $ Ⱶ I 2 R Ⱶ GOTO S R $ Ⱶ I 3 * Ⱶ SHIFT L * R =$ R L =$ L *R =$ L id =$ Ⱶ I 4 id Ⱶ SHIFT L id =$ Ⱶ I 5 = Ⱶ SHIFT S L= R $ R L $ L *R $ L id $

44 Ⱶ I 6 I 4 ; I 4 ; I 4 ; I 4 ; I 6 ; I 6 ; I 6 ; R Ⱶ GOTO L *R =$ Ⱶ I 7 L Ⱶ GOTO R L =$ Ⱶ I 8 * Ⱶ SHIFT L * R =$ R L =$ L *R =$ L id =$ Ⱶ I 4 id Ⱶ SHIFT Ⱶ I 5 R Ⱶ GOTO S L=R $ Ⱶ I 9 L Ⱶ GOTO R Ⱶ L $ Ⱶ I 10 * Ⱶ SHIFT L * R $ R L $ L *R $ L id $ Ⱶ I 11

45 I 6 ; I 11 ; I 11 ; id Ⱶ SHIFT L id $ Ⱶ I 12 R Ⱶ GOTO L *R $ Ⱶ I 13 L Ⱶ GOTO Ⱶ I 10 I 11 ; * Ⱶ SHIFT Ⱶ I 11 I 11 ; id Ⱶ SHIFT Ⱶ I 12 4) Construct LR (1) parsing table. ITEMS ACTION GOTO = * id $ S L R I 0 S 4 S I 1 ACCEPTED I 2 S 6 R 5 I 3 R 2 I 4 S 4 S I 5 R 4 R 4 I 6 S 11 S I 7 R 3 R 3 I 8 R 5 R 5 I 9 R 1 I 10 R 5 I 11 S 11 S I 12 R 4 I 13 R 3 As there is no multiple value in the same cell of the table, the grammar is said to be LR (1)

46 LALR (Direct Method) Rules:- 1. Numbering of productions. 2. Augmentation. 3. Construction of LALR item set. 4. LALR parsing table. 5. Fill out the table entries. 6. Declaration of parser after checking conflicts. 7. Construction of graph (SHIFT, GOTO and GOTO SHIFT). 8. Acceptability of string. Note: - There is also an indirect method. We only have to use the indirect method when the question is asking for both LR (1) and then LALR. An example of this is given as follows:- Example of INDIRECT method Question: - Construct the LR (1) and LALR for the following grammar. S CC C c C / d Solution:- For LR (1) - See previous method and For LALR As we have noted that in LR (1) item sets, item I 3 = I 6 and item I 4 = I 7. So, we have to merge these items and make a single item by combining the equal items as I 3 = I 6 = I 3, 6 I 4 = I 7 = I 4, 7 Now, construction of LALR table. ITEMS ACTION GOTO c d $ S C I 0 S 3, 6 S 4, I 1 ACCEPTED I 2 S 3, 6 S 4, 7 5 I 3, 6 S 3, 6 S 4, 7 8, 9 I 4, 7 R 3 R 3 R 3 I 5 R 1 I 8, 9 R 2 R 2 R 2

47 Question:- Construct LR (1) and LALR for the following grammar. S L=R/R L *R / id R L Solution:- Hint I 4 = I 11 = I 4, 11 I 5 = I 12 = I 5, 12 I 7 = I 13 = I 7, 13 I 8 = I 10 = I 8, 10 *********************************************************** Direct Method for LALR Question: - Check that the following grammar is LALR or not. S CC C cc / d Solution:- 1. Numbering of productions. S CC...1 C cc...2 C d Augmentation. S S 3. Construction of LALR item set. I 0 ; S S $ S CC $ C cc cd C d cd I 0 ; S Ⱶ GOTO S S $ Ⱶ I 1

48 I 0 ; C Ⱶ GOTO S C C $ C cc $ C d $ Ⱶ I 2 I 0 ; c Ⱶ SHIFT C c C C c C C d Ⱶ I 3 $cd $cd $cd I 0 ; d Ⱶ SHIFT C d Ⱶ I 4 $cd I 2 ; C Ⱶ GOTO S CC $ Ⱶ I 5 Now merge the LOOK AHEADS of I 3, we get the following. I 2 ; c Ⱶ SHIFT Ⱶ I 3 I 2 ; d Ⱶ SHIFT Ⱶ I 4 I 3 ; C Ⱶ GOTO C c C Ⱶ I 6 $cd

49 I 3 ; I 3 ; c Ⱶ SHIFT Ⱶ I 3 d Ⱶ SHIFT Ⱶ I 4 4. LALR parsing table with entries. Is given below >>>> ITEMS ACTION GOTO c d $ S C I 0 S 3 S I 1 ACCEPTED I 2 S 6 S 7 5 I 3 S 3 S 4 6 I 4 R 3 R 3 R 3 I 5 R 1 I 6 R 2 R 2 R 2 Question: - Check that the following grammar is LALR or not. Solution:- 1. Numbering of productions. 6. S L=R R 1 7. S R R 2 8. L * R R 3 9. L id R R L R 5 2. Augmentation S S

50 3. Construction of canonical collection of LR (1) item set. LOOK AHEADS I 0 ; S S $ S L=R $ S R $ L *R =$ L id =$ R L $ I 0 ; S Ⱶ GOTO S S $ Ⱶ I 1 I 0 ; L Ⱶ GOTO S L =R $ R L $ Ⱶ I 2 I 0 ; R Ⱶ GOTO S R $ Ⱶ I 3 I 0 ; * Ⱶ SHIFT L * R =$ R L =$ L *R =$ L id =$ Ⱶ I 4 I 0 ; id Ⱶ SHIFT L id =$ Ⱶ I 5

51 I 2 ; I 4 ; I 4 ; = Ⱶ SHIFT S L= R $ R L $ L *R $ L id $ Ⱶ I 6 R Ⱶ GOTO L *R =$ Ⱶ I 7 L Ⱶ GOTO R L =$ Ⱶ I 8 I 4 ; * Ⱶ SHIFT Ⱶ I 4 I 4 ; id Ⱶ SHIFT Ⱶ I 5 I 6 ; R Ⱶ GOTO S L=R $ Ⱶ I 9 I 6 ; L Ⱶ GOTO Ⱶ I 8 I 6 ; * Ⱶ GOTO Ⱶ I 4

52 I 6 ; id Ⱶ SHIFT L id $ Ⱶ I 5 4. LALR parsing table with entries. ITEMS ACTION GOTO id * = $ S L R I 0 S 5, 12 S 4, I 1 ACCEPTED I 2 S 6 R 5 I 3 R 2 I 4, 11 S 5, 12 S 4, 11 8, 10 7, 13 I 5, 12 R 4 R 4 I 6 S 5, 12 S 4, 11 8, 10 9 I 7, 13 R 3 R 3 I 8, 12 R 5 R 5 I 9 R 1 ***********************************************************

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino 3. Syntax Analysis Andrea Polini Formal Languages and Compilers Master in Computer Science University of Camerino (Formal Languages and Compilers) 3. Syntax Analysis CS@UNICAM 1 / 54 Syntax Analysis: the