Lexical and Syntax Analysis

Similar documents
Lexical and Syntax Analysis. Top-Down Parsing

A programming language requires two major definitions A simple one pass compiler

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Wednesday, September 9, 15. Parsers

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Syntax Analysis Part I

3. Parsing. Oscar Nierstrasz

Top down vs. bottom up parsing

Abstract Syntax Trees & Top-Down Parsing

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Building Compilers with Phoenix

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Monday, September 13, Parsers

Table-Driven Parsing

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

1 Introduction. 2 Recursive descent parsing. Predicative parsing. Computer Language Implementation Lecture Note 3 February 4, 2004

COP 3402 Systems Software Syntax Analysis (Parser)

Wednesday, August 31, Parsers

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Part 3. Syntax analysis. Syntax analysis 96

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Parsing III. (Top-down parsing: recursive descent & LL(1) )

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Compilers. Predictive Parsing. Alex Aiken

CS502: Compilers & Programming Systems


Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

CS 406/534 Compiler Construction Parsing Part I

Chapter 3. Parsing #1

Compilation Lecture 3: Syntax Analysis: Top-Down parsing. Noam Rinetzky

Syntactic Analysis. Top-Down Parsing

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Defining syntax using CFGs

Context-free grammars (CFG s)

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

A Simple Syntax-Directed Translator

Syntax Analysis Check syntax and construct abstract syntax tree

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Compiler Design 1. Top-Down Parsing. Goutam Biswas. Lect 5

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

CA Compiler Construction

A simple syntax-directed

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 1. Top-Down Parsing. Lect 5. Goutam Biswas

EDA180: Compiler Construc6on Context- free grammars. Görel Hedin Revised:

Types of parsing. CMSC 430 Lecture 4, Page 1

CSCI312 Principles of Programming Languages

Introduction to Syntax Analysis

Introduction to Syntax Analysis. The Second Phase of Front-End

Context-Free Languages and Parse Trees

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Principles of Programming Languages COMP251: Syntax and Grammars

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

It parses an input string of tokens by tracing out the steps in a leftmost derivation.

Compiler Design Concepts. Syntax Analysis

More Assigned Reading and Exercises on Syntax (for Exam 2)

Homework. Lecture 7: Parsers & Lambda Calculus. Rewrite Grammar. Problems

CSCI312 Principles of Programming Languages!

CSCI312 Principles of Programming Languages!

Lexical and Syntax Analysis. Bottom-Up Parsing

Syntax. In Text: Chapter 3

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

4. Lexical and Syntax Analysis

LECTURE 7. Lex and Intro to Parsing

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

CSE 3302 Programming Languages Lecture 2: Syntax

Introduction to Lexing and Parsing

4 (c) parsing. Parsing. Top down vs. bo5om up parsing

Introduction to Parsing. Lecture 8

LL parsing Nullable, FIRST, and FOLLOW

4. Lexical and Syntax Analysis

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Syntax-Directed Translation. Lecture 14

CPS 506 Comparative Programming Languages. Syntax Specification

Compila(on (Semester A, 2013/14)

Fall Compiler Principles Lecture 3: Parsing part 2. Roman Manevich Ben-Gurion University

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Earlier edition Dragon book has been revised. Course Outline Contact Room 124, tel , rvvliet(at)liacs(dot)nl

Transcription:

Lexical and Syntax Analysis (of Programming Languages) Top-Down Parsing

Lexical and Syntax Analysis (of Programming Languages) Top-Down Parsing

Easy for humans to write and understand String of characters Lexemes identified String of tokens Easy for programs to transform Data structure

Easy for humans to write and understand String of characters Lexemes identified String of tokens Easy for programs to transform Data structure

PART 1: SYNTAX OF LANGUAGES Context-Free Grammars Derivations Parse Trees Ambiguity Precedence and Associativity

PART 1: SYNTAX OF LANGUAGES Context-Free Grammars Derivations Parse Trees Ambiguity Precedence and Associativity

Syntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic expressions: e x y e + e e e e * e ( e )

Syntax The syntax is a set of rules defining valid strings of a language, often specified by a context-free grammar. For example, a grammar E for arithmetic expressions: e x y e + e e e e * e ( e )

Context-free grammars Have four components: 1. A set of terminal symbols. 2. A set of non-terminal symbols. 3. A set of productions (or rules) of the form: n X 1 X n where n is a non-terminal and X 1 X n is any sequence of terminals, non-terminals, and ε. 4. The start symbol (one of the non-terminals).

Context-free grammars Have four components: 1. A set of terminal symbols. 2. A set of non-terminal symbols. 3. A set of productions (or rules) of the form: n X 1 X n where n is a non-terminal and X 1 X n is any sequence of terminals, non-terminals, and ε. 4. The start symbol (one of the non-terminals).

Notation Non-terminals are underlined. Rather than writing e x e e + e we may write: e x e + e (Also, symbols and ::= will be used interchangeably.)

Notation Non-terminals are underlined. Rather than writing e x e e + e we may write: e x e + e (Also, symbols and ::= will be used interchangeably.)

Why context-free? Unrestricted Context Sensitive Context Free Regular Nice balance between expressive power and efficiency of parsing.

Why context-free? Unrestricted Context Sensitive Context Free Regular Nice balance between expressive power and efficiency of parsing.

Derivations A derivation is a proof that the some string conforms to a grammar. For example: e e + e x + e x + ( e ) x + ( e * e ) x + ( y * e ) x + ( y * x )

Derivations A derivation is a proof that the some string conforms to a grammar. For example: e e + e x + e x + ( e ) x + ( e * e ) x + ( y * e ) x + ( y * x )

Derivations Leftmost derivation: always expand the leftmost nonterminal when applying the grammar rules. Rightmost derivation: always expand the rightmost nonterminal, e.g. e e + e e + ( e ) e + ( x ) x + ( x )

Derivations Leftmost derivation: always expand the leftmost nonterminal when applying the grammar rules. Rightmost derivation: always expand the rightmost nonterminal, e.g. e e + e e + ( e ) e + ( x ) x + ( x )

Parse tree: motivation Like a derivation: a proof that a given input is valid according to the grammar. But a parse tree: is more concise: we don t write out the sentence every time a non-terminal is expanded. abstracts over the order in which rules are applied.

Parse tree: motivation Like a derivation: a proof that a given input is valid according to the grammar. But a parse tree: is more concise: we don t write out the sentence every time a non-terminal is expanded. abstracts over the order in which rules are applied.

Parse tree: intuition If non-terminal n has a production n X Y Z where X, Y, and Z are terminals or non-terminals, then a parse tree may have an interior node labelled n with three children labelled X, Y, and Z. n X Y Z

Parse tree: intuition If non-terminal n has a production n X Y Z where X, Y, and Z are terminals or non-terminals, then a parse tree may have an interior node labelled n with three children labelled X, Y, and Z. n X Y Z

Parse tree: definition A parse tree is a tree in which: the root is labelled by the start symbol; each leaf is labelled by a terminal symbol, or ε; each interior node is labelled by a non-terminal; if n is a non-terminal labelling an interior node whose children are X 1, X 2,, X n then there must exist a production n X 1 X 2 X n.

Parse tree: definition A parse tree is a tree in which: the root is labelled by the start symbol; each leaf is labelled by a terminal symbol, or ε; each interior node is labelled by a non-terminal; if n is a non-terminal labelling an interior node whose children are X 1, X 2,, X n then there must exist a production n X 1 X 2 X n.

Example 1 Example input string: x + y * x Resulting parse tree according to grammar E: e e + e x e * e y x

Example 1 Example input string: x + y * x Resulting parse tree according to grammar E: e e + e x e * e y x

Example 2 The following is not a parse tree according to grammar E. e x + e e y * e x Why? Because e x + e is not a production in grammar E.

Example 2 The following is not a parse tree according to grammar E. e x + e e y * e x Why? Because e x + e is not a production in grammar E.

Syntax Analysis String of symbols Parse tree A parse tree is: 1. A proof that a given input is valid according to the grammar; 2. A structure-rich representation of the input that can be stored in a data structure that is convenient to process. (Syntax analysis may also report that the input string is invalid.)

Syntax Analysis String of symbols Parse tree A parse tree is: 1. A proof that a given input is valid according to the grammar; 2. A structure-rich representation of the input that can be stored in a data structure that is convenient to process. (Syntax analysis may also report that the input string is invalid.)

Ambiguity If there exists more than one parse tree for any string then the grammar is ambiguous. For example, the string x+y*x has two parse trees: e e e + e e * e x e * e e + e x y x x y

Ambiguity If there exists more than one parse tree for any string then the grammar is ambiguous. For example, the string x+y*x has two parse trees: e e e + e e * e x e * e e + e x y x x y

Operator precedence Different parse trees often have different meanings, so we usually want unambiguous grammars. Conventionally, * has a higher precedence (binds tighter) than +, so there is only one interpretation of x+y*x, namely x+(y*x).

Operator precedence Different parse trees often have different meanings, so we usually want unambiguous grammars. Conventionally, * has a higher precedence (binds tighter) than +, so there is only one interpretation of x+y*x, namely x+(y*x).

Operator associativity Even with operator precedence rules, ambiguity remains, e.g. x-x-x. Binary operators are either: left-associative; right-associative; non-associative. Conventionally, - is left-associative, so there is only one interpretation of x-x-x, namely (x-x)-x.

Operator associativity Even with operator precedence rules, ambiguity remains, e.g. x-x-x. Binary operators are either: left-associative; right-associative; non-associative. Conventionally, - is left-associative, so there is only one interpretation of x-x-x, namely (x-x)-x.

Exercise 1 Recall grammar E: e x y e + e e e e * e ( e ) Let all operators be left associative, and let * bind tighter than + and. Give an unambiguous grammar for expressions, using these rules of associativity and precedence.

Exercise 1 Recall grammar E: e x y e + e e e e * e ( e ) Let all operators be left associative, and let * bind tighter than + and. Give an unambiguous grammar for expressions, using these rules of associativity and precedence.

Answer: step-by-step Given a non-terminal e which involves operators at n levels of precedence: Step 1: introduce n+1 new nonterminals, e 0 e n.

Answer: step-by-step Given a non-terminal e which involves operators at n levels of precedence: Step 1: introduce n+1 new nonterminals, e 0 e n.

Let op denote an operator with precedence i. Step 2: replace each production with e e op e e i e i op e i+1 e i+1 if op is left-associative, or e i e i+1 op e i e i+1 if op is right-associative

Let op denote an operator with precedence i. Step 2: replace each production with e e op e e i e i op e i+1 e i+1 if op is left-associative, or e i e i+1 op e i e i+1 if op is right-associative

Construct the precedence table: Operator Precedence +, - 0 * 1 Grammar E after step 2 becomes: e 0 e 0 + e 1 e 0 e 1 e 1 e 1 e 1 * e 2 e 2 e ( e ) x y

Construct the precedence table: Operator Precedence +, - 0 * 1 Grammar E after step 2 becomes: e 0 e 0 + e 1 e 0 e 1 e 1 e 1 e 1 * e 2 e 2 e ( e ) x y

Step 3: replace each production with e e n After step 3: e 0 e 0 + e 1 e 0 e 1 e 1 e 1 e 1 * e 2 e 2 e 2 ( e ) x y

Step 3: replace each production with e e n After step 3: e 0 e 0 + e 1 e 0 e 1 e 1 e 1 e 1 * e 2 e 2 e 2 ( e ) x y

Step 4: replace all occurrences of e 0 with e. After step 4: e e + e 1 e e 1 e 1 e 1 e 1 * e 2 e 2 e 2 ( e ) x y

Step 4: replace all occurrences of e 0 with e. After step 4: e e + e 1 e e 1 e 1 e 1 e 1 * e 2 e 2 e 2 ( e ) x y

Exercise 2 Consider the following ambiguous grammar for logical propositions. p 0 (Zero) 1 (One) ~ p (Negation) p + p (Disjunction) p * p (Conjunction) Now let + and * be right associative and the operators in increasing order of binding strength be : +, *, ~. Give an unambiguous grammar for logical propositions.

Exercise 2 Consider the following ambiguous grammar for logical propositions. p 0 (Zero) 1 (One) ~ p (Negation) p + p (Disjunction) p * p (Conjunction) Now let + and * be right associative and the operators in increasing order of binding strength be : +, *, ~. Give an unambiguous grammar for logical propositions.

Exercise 3 Which of the following grammars are ambiguous? b 0 b 1 0 1 e + e e e e x s if b then s if b then s else s skip

Exercise 3 Which of the following grammars are ambiguous? b 0 b 1 0 1 e + e e e e x s if b then s if b then s else s skip

Summary of Part 1 Syntax of a language is often specified by a context-free grammar Derivations and parse trees are proofs that a string is accepted by a grammar. Construction of unambiguous grammars using rules of precedence and associativity.

Summary of Part 1 Syntax of a language is often specified by a context-free grammar Derivations and parse trees are proofs that a string is accepted by a grammar. Construction of unambiguous grammars using rules of precedence and associativity.

PART 2: TOP-DOWN PARSING Recursive-Descent Backtracking Left-Factoring Predictive Parsing Left-Recursion Removal First and Follow Sets Parsing tables and LL(1)

PART 2: TOP-DOWN PARSING Recursive-Descent Backtracking Left-Factoring Predictive Parsing Left-Recursion Removal First and Follow Sets Parsing tables and LL(1)

Top-down parsing Top-down: begin with the start symbol and expand non-terminals, succeeding when the input string is matched. A good strategy for writing parsers: 1. Implement a syntax checker to accept or refute input strings. 2. Modify the checker to construct a parse tree straightforward.

Top-down parsing Top-down: begin with the start symbol and expand non-terminals, succeeding when the input string is matched. A good strategy for writing parsers: 1. Implement a syntax checker to accept or refute input strings. 2. Modify the checker to construct a parse tree straightforward.

RECURSIVE DESCENT A popular top-down parsing technique.

RECURSIVE DESCENT A popular top-down parsing technique.

Recursive descent A recursive descent parser consists of a set of functions, one for each non-terminal. The function for non-terminal n returns true if some prefix of the input string can be derived from n, and false otherwise.

Recursive descent A recursive descent parser consists of a set of functions, one for each non-terminal. The function for non-terminal n returns true if some prefix of the input string can be derived from n, and false otherwise.

Consuming the input We assume a global variable next points to the input string. char* next; Consume c from input if possible. int eat(char c) { if (*next == c) { next++; return 1; } return 0; }

Consuming the input We assume a global variable next points to the input string. char* next; Consume c from input if possible. int eat(char c) { if (*next == c) { next++; return 1; } return 0; }

Recursive descent Let parser(x) denote X() if X is a non-terminal eat(x) if X is a terminal For each non-terminal N, introduce: int N() { char* save = next; } for each N X 1 X 2 X n if (parser(x 1 ) && parser(x 2 ) && && parser(x n )) return 1; else next = save; return 0; Backtrack

Recursive descent Let parser(x) denote X() if X is a non-terminal eat(x) if X is a terminal For each non-terminal N, introduce: int N() { char* save = next; } for each N X 1 X 2 X n if (parser(x 1 ) && parser(x 2 ) && && parser(x n )) return 1; else next = save; return 0; Backtrack

Exercise 4 Consider the following grammar G with start symbol e. e ( e + e ) ( e * e ) v v x y Using recursive descent, write a syntax checker for grammar G.

Exercise 4 Consider the following grammar G with start symbol e. e ( e + e ) ( e * e ) v v x y Using recursive descent, write a syntax checker for grammar G.

Answer (part 1) int e() { char* save = next; if (eat('(') && e() && eat('+') && e() && eat(')')) return 1; else next = save; if (eat('(') && e() && eat('*') && e() && eat(')')) return 1; else next = save; if (v()) return 1; else next = save; return 0; }

Answer (part 1) int e() { char* save = next; if (eat('(') && e() && eat('+') && e() && eat(')')) return 1; else next = save; if (eat('(') && e() && eat('*') && e() && eat(')')) return 1; else next = save; if (v()) return 1; else next = save; return 0; }

Answer (part 2) int v() { char* save = next; if (eat('x')) return 1; else next = save; if (eat('y')) return 1; else next = save; return 0; }

Answer (part 2) int v() { char* save = next; if (eat('x')) return 1; else next = save; if (eat('y')) return 1; else next = save; return 0; }

Exercise 5 How many function calls are made by the recursive descent parser to parse the following strings? (x*x) ((x*x)*x) (((x*x)*x)*x) (See animation of backtracking.)

Exercise 5 How many function calls are made by the recursive descent parser to parse the following strings? (x*x) ((x*x)*x) (((x*x)*x)*x) (See animation of backtracking.)

Answer Number of calls is quadratic in the length of the input string. Input string Length Calls (x*x) 5 21 ((x*x)*x) 9 53 (((x*x)*x)*x) 13 117 Lesson: backtracking expensive!

Answer Number of calls is quadratic in the length of the input string. Input string Length Calls (x*x) 5 21 ((x*x)*x) 9 53 (((x*x)*x)*x) 13 117 Lesson: backtracking expensive!

LEFT FACTORING Reducing backtracking!

LEFT FACTORING Reducing backtracking!

Left factoring When two productions for a non-terminal share a common prefix, expensive backtracking can be avoided by left-factoring the grammar. Idea: Introduce a new nonterminal that accepts each of the different suffixes.

Left factoring When two productions for a non-terminal share a common prefix, expensive backtracking can be avoided by left-factoring the grammar. Idea: Introduce a new nonterminal that accepts each of the different suffixes.

Example 3 Left-factoring grammar G by introducing non-terminal r: e ( e r v r + e ) * e ) v x y Common prefix Different suffixes

Example 3 Left-factoring grammar G by introducing non-terminal r: e ( e r v r + e ) * e ) v x y Common prefix Different suffixes

Exercise 6 How many function calls are made by the recursive descent parser (after left-factoring) to parse the following strings? (x*x) ((x*x)*x) (((x*x)*x)*x)

Exercise 6 How many function calls are made by the recursive descent parser (after left-factoring) to parse the following strings? (x*x) ((x*x)*x) (((x*x)*x)*x)

Answer Number of calls is now linear in the length of input string. Input string Length Calls (x*x) 5 13 ((x*x)*x) 9 22 (((x*x)*x)*x) 13 31 Lesson: left-factoring a grammar reduces backtracking.

Answer Number of calls is now linear in the length of input string. Input string Length Calls (x*x) 5 13 ((x*x)*x) 9 22 (((x*x)*x)*x) 13 31 Lesson: left-factoring a grammar reduces backtracking.

PREDICTIVE PARSING Eliminating backtracking!

PREDICTIVE PARSING Eliminating backtracking!

Predictive parsing Idea: know which production of a non-terminal to choose based solely on the next input symbol. Advantage: very efficient since it eliminates all backtracking. Disadvantage: not all grammars can be parsed in this way. (But many useful ones can.)

Predictive parsing Idea: know which production of a non-terminal to choose based solely on the next input symbol. Advantage: very efficient since it eliminates all backtracking. Disadvantage: not all grammars can be parsed in this way. (But many useful ones can.)

Running example The following grammar H will be used as a running example to demonstrate predictive parsing. e e + e e * e ( e ) x y Example: x+y*(y+x)

Running example The following grammar H will be used as a running example to demonstrate predictive parsing. e e + e e * e ( e ) x y Example: x+y*(y+x)

Removing ambiguity Since + and * are left-associative and * binds tighter than +, we can derive an unambiguous variant of H. e e + t t t t * f f f ( e ) x y

Removing ambiguity Since + and * are left-associative and * binds tighter than +, we can derive an unambiguous variant of H. e e + t t t t * f f f ( e ) x y

Left recursion Problem: left-recursive grammars cause recursive descent parsers to loop forever. int e() { char* save = next; if (e() && eat('+') && t()) return 1; next = save; if (t()) return 1; next = save; Call to self without consuming any input } return 0;

Left recursion Problem: left-recursive grammars cause recursive descent parsers to loop forever. int e() { char* save = next; if (e() && eat('+') && t()) return 1; next = save; if (t()) return 1; next = save; Call to self without consuming any input } return 0;

Eliminating left recursion Let α denote any sequence of grammar symbols. n n α Rule 1 n' α n' n α Rule 2 n α n' where α does not begin with n Introduce new production Rule 3 n' ε

Eliminating left recursion Let α denote any sequence of grammar symbols. n n α Rule 1 n' α n' n α Rule 2 n α n' where α does not begin with n Introduce new production Rule 3 n' ε

Example 4 Running example, after eliminating left-recursion. e t e' e' + t e' ε t f t' t' * f t' ε f ( e ) x y

Example 4 Running example, after eliminating left-recursion. e t e' e' + t e' ε t f t' t' * f t' ε f ( e ) x y

first and follow sets Predictive parsers are built using the first and follow sets of each non-terminal in a grammar. The first set of a non-terminal n is the set of symbols that can begin a string derived from n. The follow set of a non-terminal n is the set of symbols that can immediately follow n in any step of a derivation.

first and follow sets Predictive parsers are built using the first and follow sets of each non-terminal in a grammar. The first set of a non-terminal n is the set of symbols that can begin a string derived from n. The follow set of a non-terminal n is the set of symbols that can immediately follow n in any step of a derivation.

Definition of first sets Let α denote any sequence of grammar symbols. If α can derive a string beginning with terminal a then a first(α). If α can derive ε then ε first(α).

Definition of first sets Let α denote any sequence of grammar symbols. If α can derive a string beginning with terminal a then a first(α). If α can derive ε then ε first(α).

Computing first sets If a is a terminal then a first(a). If there exists a production n X 1 X 2 X n and i a first(x i ) and j < i ε first(x j ) then a first(n). If n ε then ε first(n).

Computing first sets If a is a terminal then a first(a). If there exists a production n X 1 X 2 X n and i a first(x i ) and j < i ε first(x j ) then a first(n). If n ε then ε first(n).

Exercise 7 What are the first sets for each non-terminal in the following grammar. e t e' e' + t e' ε t f t' t' * f t' ε f ( e ) x y

Exercise 7 What are the first sets for each non-terminal in the following grammar. e t e' e' + t e' ε t f t' t' * f t' ε f ( e ) x y

Answer first( f ) = { (, x, y } first( t' ) = { *, ε } first( t ) = { (, x, y } first( e' ) = { +, ε } first( e ) = { (, x, y }

Answer first( f ) = { (, x, y } first( t' ) = { *, ε } first( t ) = { (, x, y } first( e' ) = { +, ε } first( e ) = { (, x, y }

Definition of follow sets Let α and β denote any sequence of grammar symbols. Terminal a follow(n) if the start symbol of the grammar can derive a string of grammar symbols in which a immediately follows n. The set follow(n) never contains ε.

Definition of follow sets Let α and β denote any sequence of grammar symbols. Terminal a follow(n) if the start symbol of the grammar can derive a string of grammar symbols in which a immediately follows n. The set follow(n) never contains ε.

End markers In predictive parsing, it is useful to mark the end of the input string with a $ symbol. If the start symbol can derive a string of grammar symbols in which n is the rightmost symbol then $ is in follow(n).

End markers In predictive parsing, it is useful to mark the end of the input string with a $ symbol. If the start symbol can derive a string of grammar symbols in which n is the rightmost symbol then $ is in follow(n).

Computing follow sets If s is the start symbol of the grammar then $ follow(s). If n α x β then everything in first(β) except ε is in follow(x). If n α x or n α x β and ε first(β) then everything in follow(n) is in follow(x).

Computing follow sets If s is the start symbol of the grammar then $ follow(s). If n α x β then everything in first(β) except ε is in follow(x). If n α x or n α x β and ε first(β) then everything in follow(n) is in follow(x).

Exercise 8 What are the follow sets for each non-terminal in the following grammar. e t e' e' + t e' ε t f t' t' * f t' ε f ( e ) x y

Exercise 8 What are the follow sets for each non-terminal in the following grammar. e t e' e' + t e' ε t f t' t' * f t' ε f ( e ) x y

Answer follow( e' ) = { $, ) } follow( e ) = { $, ) } follow( t' ) = { +, $, ) } follow( t ) = { +, $, ) } follow( f ) = { *, +, ), $ }

Answer follow( e' ) = { $, ) } follow( e ) = { $, ) } follow( t' ) = { +, $, ) } follow( t ) = { +, $, ) } follow( f ) = { *, +, ), $ }

Predictive parsing table For each non-terminal n, a parse table T defines which production of n should be chosen, based on the next input symbol. for each production n α for each a first(α) add n α to T[n, a] if ε first(α) then for each b follow(n) add n α to T[n, a]

Predictive parsing table For each non-terminal n, a parse table T defines which production of n should be chosen, based on the next input symbol. for each production n α for each a first(α) add n α to T[n, a] if ε first(α) then for each b follow(n) add n α to T[n, a]

Exercise 9 Construct a predictive parsing table for the following grammar. e t e' e' + t e' ε t f t' t' * f t' ε f ( e ) x y

Exercise 9 Construct a predictive parsing table for the following grammar. e t e' e' + t e' ε t f t' t' * f t' ε f ( e ) x y

LL(1) grammars If each cell in the parse table contains at most one entry then the a non-backtracking parser can be constructed and the grammar is said to be LL(1). First L: left-to-right scanning of the input. Second L: a leftmost derivation is constructed. The (1): using one input symbol of look-ahead to decide which grammar production to choose.

LL(1) grammars If each cell in the parse table contains at most one entry then the a non-backtracking parser can be constructed and the grammar is said to be LL(1). First L: left-to-right scanning of the input. Second L: a leftmost derivation is constructed. The (1): using one input symbol of look-ahead to decide which grammar production to choose.

Exercise 10 Write a syntax checker for the grammar of Exercise 9, utilising the predictive parsing table. int e() {... } It should return a non-zero value if some prefix of the string pointed to by next conforms to the grammar, otherwise it should return zero.

Exercise 10 Write a syntax checker for the grammar of Exercise 9, utilising the predictive parsing table. int e() {... } It should return a non-zero value if some prefix of the string pointed to by next conforms to the grammar, otherwise it should return zero.

Answer (part 1) int e() { if (*next == 'x') return t() && e1(); if (*next == 'y') return t() && e1(); if (*next == '(') return t() && e1(); return 0; } int e1() { if (*next == '+') return eat('+') && t() && e1(); if (*next == ')') return 1; if (*next == '\0') return 1; return 0; }

Answer (part 1) int e() { if (*next == 'x') return t() && e1(); if (*next == 'y') return t() && e1(); if (*next == '(') return t() && e1(); return 0; } int e1() { if (*next == '+') return eat('+') && t() && e1(); if (*next == ')') return 1; if (*next == '\0') return 1; return 0; }

Answer (part 2) int t() { if (*next == 'x') return f() && t1(); if (*next == 'y') return f() && t1(); if (*next == '(') return f() && t1(); return 0; } int t1() { if (*next == '+') return 1; if (*next == '* ) return eat('*') && f() && t1(); if (*next == ')') return 1; if (*next == '\0') return 1; return 0; }

Answer (part 2) int t() { if (*next == 'x') return f() && t1(); if (*next == 'y') return f() && t1(); if (*next == '(') return f() && t1(); return 0; } int t1() { if (*next == '+') return 1; if (*next == '* ) return eat('*') && f() && t1(); if (*next == ')') return 1; if (*next == '\0') return 1; return 0; }

Answer (part 3) int f() { if (*next == 'x') return eat('x'); if (*next == 'y') return eat('y'); if (*next == '(') return eat('(') && e() && eat(')'); return 0; } (Notice how backtracking is not required.)

Answer (part 3) int f() { if (*next == 'x') return eat('x'); if (*next == 'y') return eat('y'); if (*next == '(') return eat('(') && e() && eat(')'); return 0; } (Notice how backtracking is not required.)

Predictive parsing algorithm Let s be a stack, initially containing the start symbol of the grammar, and let next point to the input string. while (top(s)!= $) if (top(s) is a terminal) { if (top(s) == *next) { pop(s); next++; } else error(); } else if (T[top(s), *next] == X Y 1 Y n ) { pop(s); push(s, Y n Y 1 ) /* Y 1 on top */ }

Predictive parsing algorithm Let s be a stack, initially containing the start symbol of the grammar, and let next point to the input string. while (top(s)!= $) if (top(s) is a terminal) { if (top(s) == *next) { pop(s); next++; } else error(); } else if (T[top(s), *next] == X Y 1 Y n ) { pop(s); push(s, Y n Y 1 ) /* Y 1 on top */ }

Exercise 11 Give the steps that a predictive parser takes to parse the following input. x + x * y For each step (loop iteration), show the input stream, the stack, and the parser action.

Exercise 11 Give the steps that a predictive parser takes to parse the following input. x + x * y For each step (loop iteration), show the input stream, the stack, and the parser action.

Acknowledgements Plus Stanford University lecture notes by Maggie Johnson and Julie Zelenski.

Acknowledgements Plus Stanford University lecture notes by Maggie Johnson and Julie Zelenski.

APPENDIX

APPENDIX

Chomsky hierarchy Let t range over terminals, x and z over non-terminals and, β and γ over sequences of terminals, nonterminals, and ε. Grammar Unrestricted Valid productions α β Context-Sensitive α x γ α β γ Context-Free Regular x β x t x t z x ε

Chomsky hierarchy Let t range over terminals, x and z over non-terminals and, β and γ over sequences of terminals, nonterminals, and ε. Grammar Unrestricted Valid productions α β Context-Sensitive α x γ α β γ Context-Free Regular x β x t x t z x ε

Backus-Naur Form BNF is a standard ASCII notation for specification of context-free grammars whose terminals are ASCII characters. For example: <exp> ::= <exp> "+" <exp> <exp> "-" <exp> <var> <var> ::= "x" "y" The BNF notation can itself be specified in BNF.