Syn S t yn a t x a Ana x lysi y s si 1

Similar documents
Principles of Programming Languages

Principles of Programming Languages

Syntax Analysis Part IV

Syntax Analysis Part I

LR Parsing Techniques

Bottom up parsing. The sentential forms happen to be a right most derivation in the reverse order. S a A B e a A d e. a A d e a A B e S.

LALR Parsing. What Yacc and most compilers employ.

CS308 Compiler Principles Syntax Analyzer Li Jiang

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

UNIT-III BOTTOM-UP PARSING

Context-free grammars

Compiler Construction: Parsing

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

LR Parsing Techniques

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Syntax-Directed Translation

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form


Monday, September 13, Parsers

UNIT III & IV. Bottom up parsing

MODULE 14 SLR PARSER LR(0) ITEMS

CSE302: Compiler Design

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

Acknowledgements. The slides for this lecture are a modified versions of the offering by Prof. Sanjeev K Aggarwal

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Concepts Introduced in Chapter 4


Wednesday, August 31, Parsers

Syntax Analyzer --- Parser

Bottom-Up Parsing. Parser Generation. LR Parsing. Constructing LR Parser

3. Parsing. Oscar Nierstrasz

Syntactic Analysis. Chapter 4. Compiler Construction Syntactic Analysis 1

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Syntactic Analysis. Top-Down Parsing

shift-reduce parsing

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing.

Downloaded from Page 1. LR Parsing

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Lecture 7: Deterministic Bottom-Up Parsing

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Lecture 8: Deterministic Bottom-Up Parsing

Top down vs. bottom up parsing

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Introduction to parsers

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

CSCI Compiler Design

VIVA QUESTIONS WITH ANSWERS

S Y N T A X A N A L Y S I S LR

LR Parsers. Aditi Raste, CCOEW

Compiler Construction 2016/2017 Syntax Analysis

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

Formal Languages and Compilers Lecture VII Part 4: Syntactic A

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI

COP 3402 Systems Software Syntax Analysis (Parser)

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Syntax Analysis. Chapter 4

Principle of Compilers Lecture IV Part 4: Syntactic Analysis. Alessandro Artale

Using an LALR(1) Parser Generator

Compilers. Bottom-up Parsing. (original slides by Sam

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States

Bottom-Up Parsing. Lecture 11-12

Lex & Yacc. by H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Introduction to Yacc. General Description Input file Output files Parsing conflicts Pseudovariables Examples. Principles of Compilers - 16/03/2006

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

4. Lexical and Syntax Analysis

Yacc Yet Another Compiler Compiler

Lex & Yacc. By H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Bottom-Up Parsing. Lecture 11-12

CSE302: Compiler Design

4. Lexical and Syntax Analysis

COMPILER (CSE 4120) (Lecture 6: Parsing 4 Bottom-up Parsing )

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

CA Compiler Construction

Parsing How parser works?

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

CS415 Compilers. LR Parsing & Error Recovery

PRACTICAL CLASS: Flex & Bison

TDDD55 - Compilers and Interpreters Lesson 3

Yacc: A Syntactic Analysers Generator

Transcription:

Syntax Analysis 1

Position of a Parser in the Compiler Model Source Program Lexical Analyzer Token, tokenval Get next token Parser and rest of front-end Intermediate representation Lexical error Syntax error Semantic error Symbol Table 2

The Role Of Parser A parser implements a C-F grammar The role of the parser is twofold: 1. To check syntax (= string recognizer) And to report syntax errors accurately 2. To invoke semantic actions For static semantics checking, e.g. type checking of expressions, functions, etc. For syntax-directed translation of the source code to an intermediate representation 3

Syntax-Directed Translation One of the major roles of the parser is to produce an intermediate representation (IR) of the source program using syntax-directed translation methods Possible IR output: Abstract syntax trees (ASTs) Control-flow graphs (CFGs) with triples, three-address code, or register transfer list notation WHIRL (SGI Pro64 compiler) has 5 IR levels! 4

Error Handling A good compiler should assist in identifying and locating errors Lexical errors: important, compiler can easily recover and continue Syntax errors: most important for compiler, can almost always recover Static semantic errors: important, can sometimes recover Dynamic semantic errors: hard or impossible to detect at compile time, runtime checks are required Logical errors: hard or impossible to detect 5

Viable-Prefix Property The viable-prefix property of LL/LR parsers allows early detection of syntax errors Goal: detection of an error as soon as possible without further consuming unnecessary input How: detect an error as soon as the prefix of the input does not match a prefix of any string in the language Prefix for (;) Error is detected here Prefix Error is detected here DO 10 I = 1;0 6

Error Recovery Strategies Panic mode Discard input until a token in a set of designated synchronizing tokens is found Phrase-level recovery Perform local correction on the input to repair the error Error productions Augment grammar with productions for erroneous constructs Global correction Choose a minimal sequence of changes to obtain a global least-cost correction 7

Grammars (Recap) Context-free grammar is a 4-tuple G = (N, T, P, S) where T is a finite set of tokens (terminal symbols) N is a finite set of nonterminals P is a finite set of productions of the form where (N T)* N (N T)* and (N T)* S N is a designated start symbol 8

Notational Conventions Used Terminals a,b,c, T specific terminals: 0, 1, id, + Nonterminals A,B,C, N specific nonterminals: expr, term, stmt Grammar symbols X,Y,Z (N T) Strings of terminals u,v,w,x,y,z T* Strings of grammar symbols,, (N T)* 9

Derivations (Recap) The one-step derivation is defined by A where A is a production in the grammar In addition, we define is leftmost lm if does not contain a nonterminal is rightmost rm if does not contain a nonterminal Transitive closure * (zero or more steps) Positive closure + (one or more steps) The language generated by G is defined by L(G) = {w T* S + w} 10

Derivation (Example) Grammar G = ({E}, {+,*,(,),-,id}, P, E) with productions P = E E + E E E * E E ( E ) E - E E id Example derivations: E - E - id E rm E + E rm E + id rm id + id E * E E * id + id E + id * id + id 11

Chomsky Hierarchy: Language Classification A grammar G is said to be Regular if it is right linear where each production is of the form A w B or A w or left linear where each production is of the form A B w or A w Context free if each production is of the form A where A N and (N T)* Context sensitive if each production is of the form A where A N,,, (N T)*, > 0 Unrestricted 12

Chomsky Hierarchy L(regular) L(context free) L(context sensitive) L(unrestricted) Where L(T) = { L(G) G is of type T } That is: the set of all languages generated by grammars G of type T Examples: Every finite language is regular! (construct a FSA for strings in L(G)) L 1 = { a n b n n 1 } is context free L 2 = { a n b n c n n 1 } is context sensitive 13

Parsing Universal (any C-F grammar) Cocke-Younger-Kasimi Earley Top-down (C-F grammar with restrictions) Recursive descent (predictive parsing) LL (Left-to-right, Leftmost derivation) methods Bottom-up (C-F grammar with restrictions) Operator precedence parsing LR (Left-to-right, Rightmost derivation) methods SLR, canonical LR, LALR 14

Top-Down Parsing LL methods (Left-to-right, Leftmost derivation) and recursive-descent parsing Grammar: E T + T T ( E ) T - E T id Leftmost derivation: E lm T + T lm id + T lm id + id E E E E T T T T T T + id + id + id 15

Left Recursion (Recap) Productions of the form A A are left recursive When one of the productions in a grammar is left recursive then a predictive parser loops forever on certain inputs 16

General Left Recursion Elimination Method Arrange the nonterminals in some order A 1, A 2,, A n for i = 1,, n do for j = 1,, i-1 do replace each A i A j with A i 1 2 k where enddo A j 1 2 k enddo eliminate the immediate left recursion in A i 17

Immediate Left-Recursion Elimination Method Rewrite every left-recursive production A A A into a right-recursive production: A A R A R A R A R A R 18

Example Left Recursion Elim. A B C a B C A A b C A B C C a Choose arrangement: A, B, C i = 1: i = 2, j = 1: i = 3, j = 1: i = 3, j = 2: nothing to do B C A A b B C A B C b a b (imm) B C A B R a b B R B R C b B R C A B C C a C B C B a B C C a C B C B a B C C a C C A B R C B a b B R C B a B C C a (imm) C a b B R C B C R a B C R a C R C R A B R C B C R C C R 19

Left Factoring When a nonterminal has two or more productions whose right-hand sides start with the same grammar symbols, the grammar is not LL(1) and cannot be used for predictive parsing Replace productions A 1 2 n with A A R A R 1 2 n 20

Predictive Parsing Eliminate left recursion from grammar Left factor the grammar Compute FIRST and FOLLOW Two variants: Recursive (recursive calls) Non-recursive (table-driven) 21

FIRST (Revisited) FIRST( ) = { the set of terminals that begin all strings derived from } FIRST(a) = {a} if a T FIRST( ) = { } FIRST(A) = A FIRST( ) for A P FIRST(X 1 X 2 X k ) = if for all j = 1,, i-1 : FIRST(X j ) then add non- in FIRST(X i ) to FIRST(X 1 X 2 X k ) if for all j = 1,, k : FIRST(X j ) then add to FIRST(X 1 X 2 X k ) 22

FOLLOW FOLLOW(A) = { the set of terminals that can immediately follow nonterminal A } FOLLOW(A) = for all (B A ) P do add FIRST( )\{ } to FOLLOW(A) for all (B A ) P and FIRST( ) do add FOLLOW(B) to FOLLOW(A) for all (B A) P do add FOLLOW(B) to FOLLOW(A) if A is the start symbol S then add $ to FOLLOW(A) 23

Red : A Blue : First Set (2) G 0 S ase S B B bbe B C C cce C d

Red : A Step 1: Blue : First (S ase) First (S B ) First (B bbe) First Set (2) = First(a) ={a} = First(B) = First(b)={b} First (B C ) = First(C) First (C cce) = First(c) ={c} First (C d ) = First(d)={d} G 0 S ase S B B bbe B C C cce C d

Red : A Step 1: Blue : First (S ase) First (S B ) First (B bbe) First Set (2) = {a} = First(B) = {b} First (B C ) = First(C) First (C cce) = {c} First (C d ) = {d} G 0 S ase S B B bbe B C C cce C d Step First Set S B C a b c d Step 1 {a} First(B) {b} First(C) {c, d}

Red : A Step 2: Blue : First (S ase) First Set (2) = {a} First (S B ) = First(B) = {b} First(C) First (B bbe) = {b} First (B C ) = First(C) First (C cce) = {c} First (C d ) = {d} G 0 S ase S B B bbe B C C cce C d Step First Set S B C a b c d Step 1 {a} First(B) {b} First(C) {c, d}

Red : A Step 2: Blue : First (S ase) First Set (2) = {a} First (S B ) = {b} First(C) First (B bbe) = {b} First (B C ) = First(C) First (C cce) = {c} First (C d ) = {d} G 0 S ase S B B bbe B C C cce C d Step First Set S B C a b c d Step 1 {a} First(B) {b} First(C) {c, d} Step 2 {a} {b} First(C)

Red : A Step 2: Blue : First (S ase) First Set (2) = {a} First (S B ) = {b} First(C) First (B bbe) = {b} First (B C ) = First(C) = {c, d} First (C cce) = {c} First (C d ) = {d} G 0 S ase S B B bbe B C C cce C d Step First Set S B C a b c d Step 1 {a} First(B) {b} First(C) {c, d} Step 2 {a} {b} First(C)

Red : A Step 2: Blue : First (S ase) First Set (2) = {a} First (S B ) = {b} First(C) First (B bbe) = {b} First (B C ) = {c, d} First (C cce) = {c} First (C d ) = {d} G 0 S ase S B B bbe B C C cce C d Step First Set S B C a b c d Step 1 {a} First(B) {b} First(C) {c, d} Step 2 {a} {b} First(C) {b} {c, d} = {b,c,d} {c, d}

Red : A Step 3: Blue : First (S ase) First Set (2) = {a} First (S B ) = {b} First(C) = {b} {c, d} First (B bbe) = {b} First (B C ) = {c, d} First (C cce) = {c} First (C d ) = {d} G 0 S ase S B B bbe B C C cce C d Step First Set S B C a b c d Step 1 {a} First(B) {b} First(C) {c, d} Step 2 {a} {b} First(C) {b} {c, d} = {b,c,d} {c, d}

Red : A Step 3: Blue : First (S ase) First Set (2) = {a} First (S B ) = {b, c, d} First (B bbe) = {b} First (B C ) = {c, d} First (C cce) = {c} First (C d ) = {d} G 0 S ase S B B bbe B C C cce C d Step Step 1 Step 2 First Set S B C a b c d {a} First(B) {b} First(C) {c, d} {a} {b} First(C) {b} {c, d} = {b,c,d} {c, d} Step {a} {b} {c, d} = {a,b,c,d} {b} {c, d} = {b,c,d} {c, d}

Red : A Step 3: Blue : First (S ase) First Set (2) = {a} First (S B ) = {b, c, d} First (B bbe) = {b} First (B C ) = {c, d} First (C cce) First (C d ) Step = {c} = {d} G 0 S ase S B B bbe B C C cce C d If no more change The first set of a terminal symbol is itself First Set S B C a b c d Step 1 {a} First(B) {b} First(C) {c, d} Step 2 {a} {b} First(C) {b} {c, d} = {b,c,d} {c, d} Step 3 {a} {b} {c, d} = {a,b,c,d} {b} {c, d} = {b,c,d} {c, d} {a} {b} {c} {d}

Another Example.

Red : A Blue : First Set (2) G 0 S ABc A a A B b B

Red : A Step 1: Blue : First (S ABc) First (A a ) First (A ) First Set (2) = First(ABc) = First(a) = First( ) First( ) First (B b ) = First(b) First (B ) = First( ) First( ) G 0 S ABc A a A B b B

Red : A Step 1: Blue : First (S ABc) First (A a ) First (A ) First (B b ) = {b} First (B ) = { } First Set (2) = First(ABc) = {a} = { } G 0 S ABc A a A B b B Step First Set S A B a b c Step 1 First(ABc) {a, } {b, }

Red : A Step 2: Blue : First Set (2) First (S ABc) = First(ABc) = {a, } = {a, } - { } First(Bc) = {a} First(Bc) First (A a )) = {a} First (A ) = { } First (B b ) = {b} G 0 S ABc A a A B b B First (B ) = { } Step First Set S A B a b c Step 1 First(ABc) {a, } {b, } Step 2 {a} First(Bc) {a, } {b, }

Red : A Step 3: Blue : First (S ABc) First (A a ) First (A ) First (B b ) First (B ) First Set (2) = {a} First(Bc) = {a} {b, } = {a} {b, } - { } First(c) = {a} {b,c} = {a} = { } = {b} Step First Set = { } G 0 S ABc A a A B b B S A B a b c Step 1 First(ABc) {a, } {b, } Step 2 {a} First(Bc) {a, } {b, } Step 3 {a} {b, c}= {a,b,c} {a, } {b, }

Red : A Step 3: Blue : First (S ABc) First (A a ) First (A ) First Set (2) = {a,b,c} = {a} = { } First (B b ) = {b} First (B ) = { } Step Step 1 Step 2 First Set G 0 S ABc A a A B b B If no more change The first set of a terminal symbol is itself S A B a b c First(ABc) {a, } {b, } {a} First(Bc) {a, } {b, } Step {a} {b, c}= {a,b,c} {a, } {b, } {a} {b} {c}

LL(1) Grammar A grammar G is LL(1) if it is not left recursive and for each collection of productions A 1 2 n for nonterminal A the following holds: 1. FIRST( i ) FIRST( j ) = for all i j 2. if i * then 2.a. j * for all i j 2.b. FIRST( j ) FOLLOW(A) = for all i j 41

Non-LL(1) Examples Grammar S S a a S a S a S a R R S S a R a R S Not LL(1) because: Left recursive FIRST(a S) FIRST(a) For R: S * and * For R: FIRST(S) FOLLOW(R) 42

Recursive Descent Parsing (Recap) Grammar must be LL(1) Every nonterminal has one (recursive) procedure responsible for parsing the nonterminal s syntactic category of input tokens When a nonterminal has multiple productions, each production is implemented in a branch of a selection statement based on input look-ahead information 43

Using FIRST and FOLLOW to Write a Recursive Descent Parser expr term rest rest + term rest - term rest term id procedure rest(); begin if lookahead in FIRST(+ term rest) then match( + ); term(); rest() else if lookahead in FIRST(- term rest) then match( - ); term(); rest() else if lookahead in FOLLOW(rest) then return else error() end; FIRST(+ term rest) = { + } FIRST(+ term rest) = { + } FIRST(- term rest) = { - } FOLLOW(rest) = { $ } 44

Non-Recursive Predictive Parsing: Table-Driven Parsing Given an LL(1) grammar G = (N, T, P, S) construct a table M[A,a] for A N, a T and use a driver program with a stack input a + b $ stack X Y Z $ Predictive parsing program (driver) Parsing table M output 45

Constructing an LL(1) Predictive Parsing Table for each production A do for each a FIRST( ) do add A to M[A,a] enddo if FIRST( ) then for each b FOLLOW(A) do add A to M[A,b] enddo endif enddo Mark each undefined entry in M error 46

Example Table E T E R E R + T E R T F T R T R * F T R F ( E ) id A FIRST( ) FOLLOW(A) E T E R ( id $ ) E R + T E R + $ ) E R $ ) T F T R ( id + $ ) T R * F T R * + $ ) T R + $ ) F ( E ) ( * + $ ) F id id * + $ ) id + * ( ) $ E E T E R E T E R E R E R + T E R E R E R T T F T R T F T R T R T R T R * F T R T R T R F F id F ( E ) 47

LL(1) Grammars are Unambiguous Ambiguous grammar S i E t S S R a S R e S E b Error: duplicate table entry A FIRST( ) FOLLOW(A) S i E t S S R i e $ S a a e $ S R e S e e $ S R e $ E b b t a b e i t $ S S a S i E t S S R S R E E b S R S R e S S R 48

Predictive Parsing Program (Driver) push($) push(s) a := lookahead repeat X := pop() if X is a terminal or X = $ then match(x) // moves to next token and a := lookahead else if M[X,a] = X Y 1 Y 2 Y k then push(y k, Y k-1,, Y 2, Y 1 ) // such that Y 1 is on top invoke actions and/or produce IR output else error() endif until X = $ 49

Example Table-Driven Parsing Stack $E $E R T $E R T R F $E R T R id $E R T R $E R $E R T+ $E R T $E R T R F $E R T R id $E R T R $E R T R F* $E R T R F $E R T R id $E R T R $E R $ Input id+id*id$ id+id*id$ id+id*id$ id+id*id$ +id*id$ +id*id$ +id*id$ id*id$ id*id$ id*id$ *id$ *id$ id$ id$ $ $ $ Production applied E T E R T F T R F id T R E R + T E R T F T R F id T R * F T R F id T R E R R 50

Panic Mode Recovery Add synchronizing actions to undefined entries based on FOLLOW Pro: Cons: Can be automated Error messages are needed FOLLOW(E) = { ) $ } FOLLOW(E R ) = { ) $ } FOLLOW(T) = { + ) $ } FOLLOW(T R ) = { + ) $ } FOLLOW(F) = { + * ) $ } id + * ( ) $ E E T E R E T E R synch synch E R E R + T E R E R E R T T F T R synch T F T R synch synch T R T R T R * F T R T R T R F F id synch synch F ( E ) synch synch synch: the driver pops current nonterminal A and skips input till synch token or skips input until one of FIRST(A) is found 51

Phrase-Level Recovery Change input stream by inserting missing tokens For example: id id is changed into id * id Pro: Cons: Can be automated Recovery not always intuitive Can then continue here id + * ( ) $ E E T E R E T E R synch synch E R E R + T E R E R E R T T F T R synch T F T R synch synch T R insert * T R T R * F T R T R T R F F id synch synch F ( E ) synch synch insert *: driver inserts missing * and retries the production 52

Error Productions E T E R E R + T E R T F T R T R * F T R F ( E ) id Add error production : T R F T R to ignore missing *, e.g.: id id Pro: Cons: Powerful recovery method Cannot be automated id + * ( ) $ E E T E R E T E R synch synch E R E R + T E R E R E R T T F T R synch T F T R synch synch T R T R F T R T R T R * F T R T R T R F F id synch synch F ( E ) synch synch 53

Bottom-Up Parsing LR methods (Left-to-right, Rightmost derivation) SLR, Canonical LR, LALR Other special cases: Shift-reduce parsing Operator-precedence parsing 54

Operator-Precedence Parsing Special case of shift-reduce parsing We will not further discuss (you can skip textbook section 4.6) 55

Shift-Reduce Parsing Grammar: S a A B e A A b c b B d These match These match production s right-hand sides Reducing a sentence: a b b c d e a A b c d e a A d e a A B e S Shift-reduce corresponds to a rightmost derivation: S rm a A B e rm a A d e rm a A b c d e rm a b b c d e S A A A A A A B A B a b b c d e a b b c d e a b b c d e a b b c d e 56

Handles A handle is a substring of grammar symbols in a right-sentential form that matches a right-hand side of a production Grammar: S a A B e A A b c b B d a b b c d e a A b c d e a A d e a A B e S Handle a b b c d e a A b c d e a A A e? NOT a handle, because further reductions will fail (result is not a sentential form) 57

Stack Implementation of Shift-Reduce Parsing Grammar: E E + E E E * E E ( E ) E id Find handles to reduce Stack $ $id $E $E+ $E+id $E+E $E+E* $E+E*id $E+E*E $E+E $E Input id+id*id$ +id*id$ +id*id$ id*id$ *id$ *id$ id$ $ $ $ $ Action shift reduce E id shift shift reduce E id shift (or reduce?) shift reduce E id reduce E E * E reduce E E + E accept How to resolve conflicts? 58

Conflicts Shift-reduce and reduce-reduce conflicts are caused by The limitations of the LR parsing method (even when the grammar is unambiguous) Ambiguity of the grammar 59

Shift-Reduce Parsing: Shift-Reduce Conflicts Ambiguous grammar: S if E then S if E then S else S other Stack $ $ if E then S Input $ else $ Action shift or reduce? Resolve in favor Resolve in favor of shift, so else matches closest if 60

Shift-Reduce Parsing: Reduce-Reduce Conflicts Grammar: C A B A a B a Stack $ $a Input aa$ a$ Action shift reduce A a or B a? Resolve in favor Resolve in favor of reduce A a, otherwise we re stuck! 61

LR(k) Parsers: Use a DFA for Shift/Reduce Decisions start 0 C A a Grammar: S C C A B A a B a 1 2 3 B a 4 5 goto(i 0,C) State I 0 : S C C A B A a State I 1 : S C goto(i 0,A) State I 2 : C A B B a State I 4 : C A B goto(i 2,B) goto(i 2,a) Can only reduce A a (not B a) goto(i 0,a) State I 3 : A a State I 5 : B a 62

DFA for Shift/Reduce Decisions Grammar: S C C A B A a B a State I 0 : S C C A B A a goto(i 0,a) The states of the DFA are used to determine if a handle is on top of the stack State I 3 : A a Stack $ 0 $ 0 $ 0 a 3 $ 0 A 2 $ 0 A 2 a 5 $ 0 A 2 B 4 $ 0 C 1 Input aa$ aa$ a$ a$ $ $ $ Action start in state 0 shift (and goto state 3) reduce A a (goto 2) shift (goto 5) reduce B a (goto 4) reduce C AB (goto 1) accept (S C) 63

DFA for Shift/Reduce Decisions Grammar: S C C A B A a B a State I 0 : S C C A B A a goto(i 0,A) The states of the DFA are used to determine if a handle is on top of the stack State I 2 : C A B B a Stack $ 0 $ 0 $ 0 a 3 $ 0 A 2 $ 0 A 2 a 5 $ 0 A 2 B 4 $ 0 C 1 Input aa$ aa$ a$ a$ $ $ $ Action start in state 0 shift (and goto state 3) reduce A a (goto 2) shift (goto 5) reduce B a (goto 4) reduce C AB (goto 1) accept (S C) 64

DFA for Shift/Reduce Decisions Grammar: S C C A B A a B a State I 2 : C A B B a goto(i 2,a) The states of the DFA are used to determine if a handle is on top of the stack State I 5 : B a Stack $ 0 $ 0 $ 0 a 3 $ 0 A 2 $ 0 A 2 a 5 $ 0 A 2 B 4 $ 0 C 1 Input aa$ aa$ a$ a$ $ $ $ Action start in state 0 shift (and goto state 3) reduce A a (goto 2) shift (goto 5) reduce B a (goto 4) reduce C AB (goto 1) accept (S C) 65

DFA for Shift/Reduce Decisions Grammar: S C C A B A a B a State I 2 : C A B B a goto(i 2,B) The states of the DFA are used to determine if a handle is on top of the stack State I 4 : C A B Stack $ 0 $ 0 $ 0 a 3 $ 0 A 2 $ 0 A 2 a 5 $ 0 A 2 B 4 $ 0 C 1 Input aa$ aa$ a$ a$ $ $ $ Action start in state 0 shift (and goto state 3) reduce A a (goto 2) shift (goto 5) reduce B a (goto 4) reduce C AB (goto 1) accept (S C) 66

DFA for Shift/Reduce Decisions Grammar: S C C A B A a B a State I 0 : S C C A B A a goto(i 0,C) The states of the DFA are used to determine if a handle is on top of the stack State I 1 : S C Stack $ 0 $ 0 $ 0 a 3 $ 0 A 2 $ 0 A 2 a 5 $ 0 A 2 B 4 $ 0 C 1 Input aa$ aa$ a$ a$ $ $ $ Action start in state 0 shift (and goto state 3) reduce A a (goto 2) shift (goto 5) reduce B a (goto 4) reduce C AB (goto 1) accept (S C) 67

DFA for Shift/Reduce Decisions Grammar: S C C A B A a B a State I 0 : S C C A B A a goto(i 0,C) The states of the DFA are used to determine if a handle is on top of the stack State I 1 : S C Stack $ 0 $ 0 $ 0 a 3 $ 0 A 2 $ 0 A 2 a 5 $ 0 A 2 B 4 $ 0 C 1 Input aa$ aa$ a$ a$ $ $ $ Action start in state 0 shift (and goto state 3) reduce A a (goto 2) shift (goto 5) reduce B a (goto 4) reduce C AB (goto 1) accept (S C) 68

Model of an LR Parser stack input a 1... a i... a n $ S m X m S m-1 LR Parsing Algorithm output X m-1.. Action Table Goto Table S 1 X 1 S 0 s t a t e s terminals and $ four different actions s t a t e s non-terminal each item is a state number 69

A Configuration of LR Parsing Algorithm A configuration of a LR parsing is: ( S o X 1 S 1... X m S m, a i a i+1... a n $ ) Stack Rest of Input S m and a i decides the parser action by consulting the parsing action table. (Initial Stack contains just S o ) A configuration of a LR parsing represents the right sentential form: X 1... X m a i a i+1... a n $

Actions of A LR-Parser 1. shift s -- shifts the next input symbol and the state s onto the stack ( S o X 1 S 1... X m S m, a i a i+1... a n $ ) ( S o X 1 S 1... X m S m a i s, a i+1... a n $ ) 2. reduce A (or rn where n is a production number) pop 2 (=r) items from the stack; then push A and s where s=goto[s m-r,a] ( S o X 1 S 1... X m S m, a i a i+1... a n $ ) ( S o X 1 S 1... X m-r S m-r A s, a i... a n $ ) Output is the reducing production reduce A 3. Accept Parsing successfully completed 4. Error -- Parser detected an error (an empty entry in the action table)

Reduce Action pop 2 (=r) items from the stack; let us assume that = Y 1 Y 2...Y r then push A and s where s=goto[s m-r,a] ( S o X 1 S 1... X m-r S m-r Y 1 S m-r+1...y r S m, a i a i+1... a n $ ) ( S o X 1 S 1... X m-r S m-r A s, a i... a n $ ) In fact, Y 1 Y 2...Y r is a handle. X 1... X m-r A a i... a n $ X 1... X m Y 1...Y r a i a i+1... a n $

(SLR) Parsing Tables for Expression 1) E E+T 2) E T 3) T T*F 4) T F 5) F (E) 6) F id Grammar Action Table state id + * ( ) $ E T F 0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 Goto Table 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

Actions of A (S)LR-Parser -- Example stack input action output 0 id*id+id$ shift 5 0id5 *id+id$ reduce by F id F id 0F3 *id+id$ reduce by T F T F 0T2 *id+id$ shift 7 0T2*7 id+id$ shift 5 0T2*7id5 +id$ reduce by F id F id 0T2*7F10 +id$ reduce by T T*FT T*F 0T2 +id$ reduce by E T E T 0E1 +id$ shift 6 0E1+6 id$ shift 5 0E1+6id5 $ reduce by F id F id 0E1+6F3 $ reduce by T F T F 0E1+6T9$ reduce by E E+T E E+T 0E1 $ accept

SLR Grammars SLR (Simple LR): a simple extension of LR(0) shift-reduce parsing SLR eliminates some conflicts by populating the parsing table with reductions A on symbols in FOLLOW(A) S E E id + E E id State I 0 : S E E id + E E id State I 2 : Shift on + goto(i 0,id) E id + E E id goto(i 3,+) FOLLOW(E)={$} thus reduce on $ 75

SLR Parsing Table Reductions do not fill entire rows Otherwise the same as LR(0) 1. S E 2. E id + E 3. E id Shift on + 0 1 2 3 4 id + $ E s2 s2 s3 ac c r3 r2 1 4 FOLLOW(E)={$} thus reduce on $ 76

SLR Parsing An LR(0) state is a set of LR(0) items An LR(0) item is a production with a (dot) in the right-hand side Build the LR(0) DFA by Closure operation to construct LR(0) items Goto operation to determine transitions Construct the SLR parsing table from the DFA LR parser program uses the SLR parsing table to determine shift/reduce operations 77

LR(0) Items of a Grammar An LR(0) item of a grammar G is a production of G with a at some position of the right-hand side Thus, a production A X Y Z has four items: [A X Y Z] [A X Y Z] [A X Y Z] [A X Y Z ] Note that production A has one item [A ] 78

Constructing the set of LR(0) Items of a Grammar 1. The grammar is augmented with a new start symbol S and production S S 2. Initially, set C = closure({[s S]}) (this is the start state of the DFA) 3. For each set of items I C and each grammar symbol X (N T) such that goto(i,x) C and goto(i,x), add the set of items goto(i,x) to C 4. Repeat 3 until no more sets can be added to C 79

The Closure Operation for LR(0) Items 1. Initially, every LR(0) item in I is added to closure(i) 2. If [A B ] closure(i) then for each production B in the grammar, add the item [B ] to I if not already in I 3. Repeat 2 until no new items can be added 80

The Closure Operation (Example) closure({[e E]}) = { [E E] } Grammar: E E + T T T T * F F F ( E ) F id Add [E ] { [E E] [E E + T] [E T] } Add [T ] { [E E] [E E + T] [E T] [T T * F] [T F] } Add [F ] { [E E] [E E + T] [E T] [T T * F] [T F] [F ( E )] [F id] } 81

The Goto Operation for LR(0) Items 1. For each item [A X ] I, add the set of items closure({[a X ]}) to goto(i,x) if not already there 2. Repeat step 1 until no more items can be added to goto(i,x) 3. Intuitively, goto(i,x) is the set of items that are valid for the viable prefix X when I is the set of items that are valid for 82

The Goto Operation (Example 1) Suppose I = { [E E] [E E + T] [E T] [T T * F] [T F] [F ( E )] [F id] } Then goto(i,e) = closure({[e E, E E + T]}) = { [E E ] [E E + T] } Grammar: E E + T T T T * F F F ( E ) F id 83

The Goto Operation (Example 2) Suppose I = { [E E ], [E E + T] } Then goto(i,+) = closure({[e E + T]}) = { [E E + T] [T T * F] [T F] [F ( E )] [F id] } Grammar: E E + T T T T * F F F ( E ) F id 84

Constructing SLR Parsing Tables 1. Augment the grammar with S S 2. Construct the set C={I 0,I 1,,I n } of LR(0) items 3. If [A a ] I i and goto(i i,a)=i j then set action[i,a]=shift j 4. If [A ] I i then set action[i,a]=reduce A for all a FOLLOW(A) A (apply only if A S ) 5. If [S S ] is in I i then set action[i,$]=accept 6. If goto(i i,a)=i j then set goto[i,a]=j 7. Repeat 3-6 until no more entries added 8. The initial state i is the I i holding item [S S] 85

The Canonical LR(0) Collection -- Example I 0 : E.E I 1 : E E. I 6 : E E+.T I 9 : E E+T. E.E+T E E.+T T.T*F T T.*F E.T T.F T.T*F I 2 : E T. F.(E) I 10 : T T*F. T.F T T.*F F.id F.(E) F.id I 3 : T F. I 7 : T T*.F I 11 : F (E). I 4 : F (.E) E.E+T E.T T.T*F T.F F.(E) F.id F.(E) F.id I 8 : F (E.) E E.+T

Transition Diagram (DFA) of Goto I 0 E I 1 T F id ( I 2 I 3 I 4 I 5 id + * E T F ( I 6 I 7 I 8 to I 2 to I 3 to I 4 Function T ) + F ( F id ( id I 9 to I 3 to I 4 to I 5 I 10 to I 4 to I 5 I 11 to I 6 * to I 7

Example SLR Grammar and LR(0) Items Augmented grammar: 1. C C 2. C A B 3. A a 4. B a I 0 = closure({[c C]}) I 1 = goto(i 0,C) = closure({[c C ]}) goto(i 0,C) State I 1 : C C final State I 4 : C A B start State I 0 : C C C A B A a goto(i 0,A) State I 2 : C A B B a goto(i 2,B) goto(i 2,a) goto(i 0,a) State I 3 : A a State I 5 : B a 88

Example SLR Parsing Table State I 0 : C C C A B A a State I 1 : C C State I 2 : C A B B a State I 3 : A a State I 4 : C A B State I 5 : B a a $ C A B start 0 C a A 1 2 3 B a 4 5 0 1 2 3 4 5 s3 s5 r3 ac c r2 r4 1 2 4 Grammar: 1. C C 2. C A B 3. A a 4. B a 89

SLR and Ambiguity Every SLR grammar is unambiguous, but not every unambiguous grammar is SLR Consider for example the unambiguous grammar S L = R R L * R id R L I 0 : S S S L=R S R L *R L id R L I 1 : S S I 2 : S L =R R L action[2,=]=s6 action[2,=]=r5 I 3 : S R no I 4 : L * R R L L *R L id Has no SLR parsing table I 5 : L id I 6 : S L= R R L L *R L id I 7 : L *R I 8 : R L I 9 : S L=R 90

LR(1) Grammars SLR too simple LR(1) parsing uses lookahead to avoid unnecessary conflicts in parsing table LR(1) item = LR(0) item + lookahead LR(0) item: [A ] LR(1) item: [A, a] 91

SLR Versus LR(1) Split the SLR states by adding LR(1) lookahead Unambiguous grammar 1. S L = R 2. R 3. L * R 4. id 5. R L I 2 : S L =R R L split S L =R R L lookahead=$ action[2,=]=s6 action[2,$]=r5 Should not reduce on =, because no right-sentential form begins with R= 92

LR(1) Items An LR(1) item [A, a] contains a lookahead terminal a, meaning already on top of the stack, expect to see a For items of the form [A, a] the lookahead a is used to reduce A only if the next input is a For items of the form [A, a] with the lookahead has no effect 93

The Closure Operation for LR(1) Items 1. Start with closure(i) = I 2. If [A B, a] closure(i) then for each production B in the grammar and each terminal b FIRST( a), add the item [B, b] to I if not already in I 3. Repeat 2 until no new items can be added 94

The Goto Operation for LR(1) Items 1. For each item [A X, a] I, add the set of items closure({[a X, a]}) to goto(i,x) if not already there 2. Repeat step 1 until no more items can be added to goto(i,x) 95

Constructing the set of LR(1) Items of a Grammar 1. Augment the grammar with a new start symbol S and production S S 2. Initially, set C = closure({[s S, $]}) (this is the start state of the DFA) 3. For each set of items I C and each grammar symbol X (N T) such that goto(i,x) C and goto(i,x), add the set of items goto(i,x) to C 4. Repeat 3 until no more sets can be added to C 96

Example Grammar and LR(1) Items Unambiguous LR(1) grammar: S L = R R L * R id R L Augment with S S LR(1) items (next slide) 97

I 0 : [S S, $] goto(i 0,S)=I 1 [S L=R, $] goto(i 0,L)=I 2 [S R, $] goto(i 0,R)=I 3 [L *R, =/$] goto(i 0,*)=I 4 [L id, =/$] goto(i 0,id)=I 5 [R L, $] goto(i 0,L)=I 2 I 6 : I 7 : [S L= R, $] goto(i 6,R)=I 9 [R L, $] goto(i 6,L)=I 10 [L *R, $] goto(i 6,*)=I 11 [L id, $] goto(i 6,id)=I 12 [L *R, =/$] I 1 : [S S, $] I 8 : [R L, =/$] I 2 : I 3 : I 4 : I 5 : [S L =R, $] goto(i 0,=)=I 6 [R L, $] [S R, $] [L * R, =/$] goto(i 4,R)=I 7 [R L, =/$] goto(i 4,L)=I 8 [L *R, =/$] goto(i 4,*)=I 4 [L id, =/$] goto(i 4,id)=I 5 [L id, =/$] I 9 : I 10 : I 11 : I 12 : I 13 : [S L=R, $] [R L, $] [L * R, $] goto(i 11,R)=I 13 [R L, $] goto(i 11,L)=I 10 [L *R, $] goto(i 11,*)=I 11 [L id, $] goto(i 11,id)=I 12 [L id, $] [L *R, $] 98

Constructing Canonical LR(1) Parsing Tables 1. Augment the grammar with S S 2. Construct the set C={I 0,I 1,,I n } of LR(1) items 3. If [A a, b] I i and goto(i i,a)=i j then set action[i,a]=shift j 4. If [A, a] I i then set action[i,a]=reduce A (apply only if A S ) 5. If [S S, $] is in I i then set action[i,$]=accept 6. If goto(i i,a)=i j then set goto[i,a]=j 7. Repeat 3-6 until no more entries added 8. The initial state i is the I i holding item [S S,$] 99

Example LR(1) Parsing Table 0 id * = $ s5 s4 S L R 1 2 3 Grammar: 1. S S 2. S L = R 3. S R 4. L * R 5. L id 6. R L 1 2 3 4 5 6 7 8 9 1 0 11 1 2 s5 s1 2 s1 2 s4 s1 1 s1 1 s6 r5 r4 r6 ac c r6 r3 r5 r4 r6 r2 r6 8 7 10 4 10 13 100

LALR(1) Grammars LR(1) parsing tables have many states LALR(1) parsing (Look-Ahead LR) combines LR(1) states to reduce table size Less powerful than LR(1) Will not introduce shift-reduce conflicts, because shifts do not use lookaheads May introduce reduce-reduce conflicts, but seldom do so for grammars of programming languages 101

Constructing LALR(1) Parsing Tables 1. Construct sets of LR(1) items 2. Combine LR(1) sets with sets of items that share the same first part I 4 : I 11 : [L * R, =] [R L, =] [L *R, =] [L id, =] [L * R, $] [R L, $] [L *R, $] [L id, $] [L * R, =/$] [R L, =/$] [L *R, =/$] [L id, =/$] Shorthand for two items in the same set 102

Example LALR(1) Grammar Unambiguous LR(1) grammar: S L = R R L * R id R L Augment with S S LALR(1) items (next slide) 103

I 0 : I 1 : [S S,$] goto(i 0,S)=I 1 [S L=R,$] goto(i 0,L)=I 2 [S R,$] goto(i 0,R)=I 3 [L *R,=/$] goto(i 0,*)=I 4 [L id,=/$] goto(i 0,id)=I 5 [R L,$] goto(i 0,L)=I 2 goto(i 0,=)= I 6 [S S,$] I 6 : I 7 : I 8 : [S L= R, $] goto(i 6,R)=I 8 [R L, $] goto(i 6,L)=I 9 [L *R, $] goto(i 6,*)=I 4 [L id, $] goto(i 6,id)=I 5 [L *R,=/$] [S L=R,$] I 2 : [S L =R,$] [R L,$] I 9 : [R L,=/$] I 3 : I 4 : [S R,$] [L * R,=/$] goto(i 4,R)=I 7 [R L,=/$] goto(i 4,L)=I 9 [L *R,=/$] goto(i 4,*)=I 4 [L id,=/$] goto(i 4,id)=I 5 [R L,=] [R L,$] Shorthand for two items I 5 : [L id,=/$] 104

Example LALR(1) Parsing Table 0 id * = $ s5 s4 S L R 1 2 3 Grammar: 1. S S 2. S L = R 3. S R 4. L * R 5. L id 6. R L 1 2 3 4 5 6 7 8 9 s5 s5 s4 s4 s6 r5 r4 r6 ac c r6 r3 r5 r4 r2 r6 9 7 9 8 105

LL, SLR, LR, LALR Summary LL parse tables computed using FIRST/FOLLOW Nonterminals terminals productions Computed using FIRST/FOLLOW LR parsing tables computed using closure/goto LR states terminals shift/reduce actions LR states nonterminals goto state transitions A grammar is LL(1) if its LL(1) parse table has no conflicts SLR if its SLR parse table has no conflicts LALR(1) if its LALR(1) parse table has no conflicts LR(1) if its LR(1) parse table has no conflicts 106

LL, SLR, LR, LALR Grammars LR(1) LALR(1) LL(1) SLR LR(0) 107

Dealing with Ambiguous Grammars 1. S E 2. E E + E 3. E id 0 1 2 3 4 id + $ s2 s2 s3 r3 s3/r 2 ac c r3 r2 E 1 4 $ 0 stack $ 0 E 1 + 3 E 4 input id+id+id$ +id$ When shifting on +: yields right associativity id+(id+id) Shift/reduce conflict: action[4,+] = shift 4 action[4,+] = reduce E E + E When reducing on +: yields left associativity (id+id)+id 108

Using Associativity and Precedence to Resolve Conflicts Left-associative operators: reduce Right-associative operators: shift Operator of higher precedence on stack: reduce Operator of lower precedence on stack: shift stack $ 0 S E E E + E E E * E E id $ 0 E 1 * 3 E 5 input id*id+id$ +id$ reduce E E * E 109

Error Detection in LR Parsing Canonical LR parser uses full LR(1) parse tables and will never make a single reduction before recognizing the error when a syntax error occurs on the input SLR and LALR may still reduce when a syntax error occurs on the input, but will never shift the erroneous input symbol 110

Error Recovery in LR Parsing Panic mode Pop until state with a goto on a nonterminal A is found, (where A represents a major programming construct), push A Discard input symbols until one is found in the FOLLOW set of A Phrase-level recovery Implement error routines for every error entry in table Error productions Pop until state has error production, then shift on stack Discard input until symbol is encountered that allows parsing to continue 111

ANTLR, Yacc, and Bison ANTLR tool Generates LL(k) parsers Yacc (Yet Another Compiler Compiler) Generates LALR(1) parsers Bison Improved version of Yacc 112

Creating an LALR(1) Parser with Yacc/Bison yacc specification yacc.y Yacc or Bison compiler y.tab.c y.tab.c C compiler a.out input stream a.out output stream 113

Yacc Specification A yacc specification consists of three parts: yacc declarations, and C declarations within %{ %} %% translation rules %% user-defined auxiliary procedures The translation rules are productions with actions: production 1 { semantic action 1 } production 2 { semantic action 2 } production n { semantic action n } 114

Writing a Grammar in Yacc Productions in Yacc are of the form Nonterminal: tokens/nonterminals { action } tokens/nonterminals { action } ; Tokens that are single characters can be used directly within productions, e.g. + Named tokens must be declared first in the declaration part using %token TokenName 115

Synthesized Attributes Semantic actions may refer to values of the synthesized attributes of terminals and nonterminals in a production: X : Y 1 Y 2 Y 3 Y n { action } $$ refers to the value of the attribute of X $i refers to the value of the attribute of Y i For example factor : ( expr ) { $$=$2; } factor.val=x ( expr.val=x ) $$=$2 116

Example 1 Also results in definition of #define DIGIT xxx %{ #include <ctype.h> %} %token DIGIT %% line : expr \n { printf( %d\n, $1); } ; expr : expr + term { $$ = $1 + $3; } term { $$ = $1; } ; term : term * factor { $$ = $1 * $3; } factor { $$ = $1; } ; factor : ( expr ) { $$ = $2; } DIGIT { $$ = $1; } ; %% int yylex() { int c = getchar(); if (isdigit(c)) { yylval = c- 0 ; return DIGIT; } return c; } Attribute of term (parent) Example of a very crude lexical analyzer invoked by the parser Attribute of factor (child) Attribute of token (stored in yylval) 117

Dealing With Ambiguous Grammars By defining operator precedence levels and left/right associativity of the operators, we can specify ambiguous grammars in Yacc, such as E E+E E-E E*E E/E (E) -E num To define precedence levels and associativity in Yacc s declaration part: %left + - %left * / %right UMINUS 118

Example 2 %{ #include <ctype.h> #include <stdio.h> #define YYSTYPE double %} %token NUMBER %left + - %left * / %right UMINUS %% Double type for attributes and yylval lines : lines expr \n { printf( %g\n, $2); } lines \n /* empty */ ; expr : expr + expr { $$ = $1 + $3; } expr - expr { $$ = $1 - $3; } expr * expr { $$ = $1 * $3; } expr / expr { $$ = $1 / $3; } ( expr ) { $$ = $2; } - expr %prec UMINUS { $$ = -$2; } NUMBER ; %% 119

Example 2 (cont d) %% int yylex() { int c; while ((c = getchar()) == ) ; if ((c ==. ) isdigit(c)) { ungetc(c, stdin); scanf( %lf, &yylval); return NUMBER; } return c; } int main() { if (yyparse()!= 0) fprintf(stderr, Abnormal exit\n ); return 0; } int yyerror(char *s) { fprintf(stderr, Error: %s\n, s); } Crude lexical analyzer for fp doubles and arithmetic operators Run the parser Invoked by parser to report parse errors 120

Combining Lex/Flex with Yacc/Bison yacc specification yacc.y Yacc or Bison compiler y.tab.c y.tab.h Lex specification lex.l and token definitions y.tab.h Lex or Flex compiler lex.yy.c lex.yy.c y.tab.c C compiler a.out input stream a.out output stream 121

Lex Specification for Example 2 %option noyywrap %{ #include y.tab.h extern double yylval; %} number [0-9]+\.? [0-9]*\.[0-9]+ %% [ ] { /* skip blanks */ } {number} { sscanf(yytext, %lf, &yylval); return NUMBER; } \n. { return yytext[0]; } Generated by Yacc, contains #define NUMBER xxx Defined in y.tab.c yacc -d example2.y lex example2.l gcc y.tab.c lex.yy.c./a.out bison -d -y example2.y flex example2.l gcc y.tab.c lex.yy.c./a.out 122

Error Recovery in Yacc %{ %} %% lines : lines expr \n { printf( %g\n, $2; } lines \n /* empty */ error \n { yyerror( reenter last line: ); yyerrok; } ; Error production: set error mode and skip input until newline Reset parser to normal mode 123