Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Similar documents
CA Compiler Construction

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

3. Parsing. Oscar Nierstrasz

Table-Driven Parsing

Types of parsing. CMSC 430 Lecture 4, Page 1

Syntactic Analysis. Top-Down Parsing

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Concepts Introduced in Chapter 4


Top down vs. bottom up parsing

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CS 314 Principles of Programming Languages

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Introduction to parsers

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

CS 321 Programming Languages and Compilers. VI. Parsing

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 1. Top-Down Parsing. Lect 5. Goutam Biswas

CS502: Compilers & Programming Systems

Compiler Design 1. Top-Down Parsing. Goutam Biswas. Lect 5

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Parsing Part II (Top-down parsing, left-recursion removal)

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

CSCI312 Principles of Programming Languages

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

A programming language requires two major definitions A simple one pass compiler

Monday, September 13, Parsers

Bottom up parsing. The sentential forms happen to be a right most derivation in the reverse order. S a A B e a A d e. a A d e a A B e S.

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

Context-free grammars

CS 406/534 Compiler Construction Parsing Part I

Wednesday, August 31, Parsers

Compiler Construction: Parsing

Part 3. Syntax analysis. Syntax analysis 96

CMSC 330: Organization of Programming Languages

Topdown parsing with backtracking

COMP 181. Prelude. Next step. Parsing. Study of parsing. Specifying syntax with a grammar

Syntax Analysis Check syntax and construct abstract syntax tree

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

Introduction to Parsing. Comp 412

4. Lexical and Syntax Analysis

Acknowledgements. The slides for this lecture are a modified versions of the offering by Prof. Sanjeev K Aggarwal

VIVA QUESTIONS WITH ANSWERS

Compilers. Predictive Parsing. Alex Aiken

Some Basic Definitions. Some Basic Definitions. Some Basic Definitions. Language Processing Systems. Syntax Analysis (Parsing) Prof.

Lexical and Syntax Analysis. Top-Down Parsing

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

4. Lexical and Syntax Analysis


Chapter 3. Parsing #1

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Syntactic Analysis. Chapter 4. Compiler Construction Syntactic Analysis 1

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Chapter 4. Lexical and Syntax Analysis

Syntax Analysis Part I

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Introduction to Syntax Analysis

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Compiler Construction 2016/2017 Syntax Analysis

Building Compilers with Phoenix

CSE302: Compiler Design

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

1 Introduction. 2 Recursive descent parsing. Predicative parsing. Computer Language Implementation Lecture Note 3 February 4, 2004

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

4 (c) parsing. Parsing. Top down vs. bo5om up parsing

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

Table-Driven Top-Down Parsers

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

20. Compilers. source language program. target language program. translator

ICOM 4036 Spring 2004

Introduction to Syntax Analysis. The Second Phase of Front-End

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Parsing II Top-down parsing. Comp 412

COMP3131/9102: Programming Languages and Compilers

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Lexical and Syntax Analysis

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Compilation Lecture 3: Syntax Analysis: Top-Down parsing. Noam Rinetzky

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

CS308 Compiler Principles Syntax Analyzer Li Jiang

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Transcription:

Syntax Analysis Martin Sulzmann Martin Sulzmann Syntax Analysis 1 / 38

Syntax Analysis Objective Recognize individual tokens as sentences of a language (beyond regular languages). Example 1 (OK) Program text x := x + 2 yields token stream (ident x ) assign (ident x ) plus (const 2) Example 2 (Not OK) Program text x x := + 2 yields token stream (ident x ) (ident x ) assign plus (const 2) We will discuss later if the OK example shall be semantically valid. Martin Sulzmann Syntax Analysis 2 / 38

Approach: Parsing with CFG Context-Free Grammar (CFG) Language described in terms of a context-free grammar: Command ident assign Expression... Expression const ident Expression plus Expression ident, const, plus being tokens where attributes are omitted. Note: CFG uses uppercase for non-terminal and lowercase for terminal symbols. Our parsing tool OCamlyacc just uses it the other way around! Parsing yields Abstract Syntax Tree (AST) Check if stream of tokens is valid and produce AST. Martin Sulzmann Syntax Analysis 3 / 38

Parsing with CFG Example Token Stream (ident x ) assign (ident x ) plus (const 2) AST assign ident plus x ident plus x 2 Martin Sulzmann Syntax Analysis 4 / 38

Types of Parsing Top-Down Recursive descent From root to leaves ANTRL style Bottom-Up From leaves to root yacc style Others Parser combinators Derivative-based parser... Martin Sulzmann Syntax Analysis 5 / 38

Error Handling Types of Errors lexical misspelled keyword whle cond... syntactic unbalanced parentheses x:= x+1) semantic incompatible types x:= x + True logical infinite loop while True... Provide useful feedback to user. Mostly ignored here. Martin Sulzmann Syntax Analysis 6 / 38

Context Free Grammar (CFG) Definition A CFG G is a 4-tuple (V t,v n,s,p) where 1 V t is the set of terminal symbols (commonly the set of tokens). 2 V n is the set of non-terminal symbols. 3 S is a distinguished non-terminal symbol call the start symbol. 4 P is a finite set of productions (aka rewrite rules) of the form: non-terminal (non-terminal terminal). Note: All sets are finite. Left-hand side (lhs) of production consists of a single non-terminal symbol. Right-hand side (rhs) consists of arbitrary combinations of terminal and non-terminal symbol described by a regular expression. Martin Sulzmann Syntax Analysis 7 / 38

CFG Example and Notation Example E E +T T T T F F F (E) ident Notation Vocabulary V = V t V n a,b,c,... V t A,B,C,... V n α,β,γ,... V u,v,w,... V t Martin Sulzmann Syntax Analysis 8 / 38

CFG as a Rewrite System Derivations by Rewriting If A γ P then αaβ αγβ is a one-step derivation using A γ. We assume that denotes a derivation of 0 steps. In a leftmost derivation (denoted L ) we replace the leftmost non-terminal in each step. In a rightmost derivation (denoted R ) we replace the rightmost non-terminal in each step. If S β then β is a sentential form of G. If S w and w V t then w is a sentence of G. L(G) = {w S w,w V t } denotes the language described by G. Martin Sulzmann Syntax Analysis 9 / 38

Example Derivation CFG E E +T T T T F F F (E) ident Derivation E ident +(ident) E E +T T +T F +T ident +T ident +F ident +(E) ident +(T) ident +(F) ident +(ident) Martin Sulzmann Syntax Analysis 10 / 38

Example Derivation (2) CFG E E +E E E (E) 0... 9 Derivation E 3+7 2 E E +E 3+E 3+E E 3+7 E 3+7 2 Things to think about Multiple derivations for the same input word? How to represent derivations? Martin Sulzmann Syntax Analysis 11 / 38

Parse Trees Purpose Compact representation of a derivation E w. Most often in terms of a tree-like structure (but other formats like bit-encodings are also possible). Different from AST! Parse tree represents concrete input word. AST is an abstract representation of input. Martin Sulzmann Syntax Analysis 12 / 38

Parse Tree Example CFG + Derivation E E +E E E (E) 0... 9 E E +E 3+E 3+E E 3+7 E 3+7 2 Parse Tree E 8 8888888 E + E 3 E E 7 2 Martin Sulzmann Syntax Analysis 13 / 38

AST versus Parse Tree Parse tree E ( E ) E + E 1 2 AST A + 2 2222222 ident ident Parse tree: Maintain all source (concrete) syntax info. AST: Abstract representation. Martin Sulzmann Syntax Analysis 14 / 38

Ambiguity Definition Multiple parse trees for the same grammar and input word. Example E E +E E E (E) 0... 9 E E E E +E E 3+E E 3+7 E 3+7 2 Another parse tree: E E E E + E 2 3 7 Martin Sulzmann Syntax Analysis 15 / 38

Ambiguity Example Ambiguous Grammar E E +E E E (E) ident Equivalent Non-Ambiguous Grammar E E +T T E E +E T T F F E E E F (E) ident E (E) ident Fact Some context-free languages are inherently ambiguous. Martin Sulzmann Syntax Analysis 16 / 38

Ambiguity Example (2) Exercise Consider Stmt if Expr then Stmt if Expr then Stmt else Stmt other Ambiguous? Does an equivalent unambiguous grammar exist? Martin Sulzmann Syntax Analysis 17 / 38

Parsing Approaches: Top-Down versus Bottom-Up Example E E +E E E (E) 0... 9 Top-Down ( ANTLR ) Build left-most derivation (strictly reduce left-most non-terminal symbols). E E +E 3+E 3+E E 3+7 E 3+7 2 Guess which rule to apply from left to right. Bottom-Down ( yacc ) Build right-most derivation. E E +E E +E E E +E 2 E +7 2 3+7 2 Seek for matches of right-hand side, replace by left-hand side. Martin Sulzmann Syntax Analysis 18 / 38

Top-Down Parsing Example E E +T T T T F F F (E) ident Approach Build left-most derivation starting from start symbol, use target string to choose productions. Challenge: Left Recursion E E +T E +T +T... Several possible right-hand sides. Some lead to dead ends. Backtracking? Too inefficient. Martin Sulzmann Syntax Analysis 19 / 38

Removal of Left Recursion Example Consider E E +T T T T F F F (E) ident Remove left recursion, e.g. E TE E +TE ǫ Martin Sulzmann Syntax Analysis 20 / 38

Removal Left Recursion (2) Direct Left Recursion Removal Left recursion can be eliminated as follows: Grammar rules A Aα 1... Aα n β 1... β m are replaced by the equivalent rules A β 1 A... β m A A α 1 A... α n A ǫ Indirect Left Recursion Removal Consider A BC B Aβ Left recursion involves several rules. Can be eliminated (reduced to direct case) via unfolding. Martin Sulzmann Syntax Analysis 21 / 38

Left Factoring Issue Remove common prefix of right hand sides. Left Factoring Left factoring of A αβ 1... αβ n γ 1... γ m (where α ǫ is the longest common prefix) yields Example A αa γ 1... γ m A β 1... β n S iets ietses a E b transformed to S ietss a S ǫ es E b Martin Sulzmann Syntax Analysis 22 / 38

Top-Down Parsing: Short Summary Predictive Top-Down Parsing Eliminate left recursion. Apply left factoring. Deterministic. No backtracking. Lookahead? How far to lookahead in the input string to (deterministically) apply a rule? Observe right-hand sides! Martin Sulzmann Syntax Analysis 23 / 38

First (matching symbols) Intuition Which terminals can start a string matching the non-terminal? Definition First(α) = {a V t {ǫ} α aβ} Inductive Definition {x} if α = xα, x V t First(α) = First(x) if α = xα, ǫ First(x), x V n First(x) First(α ) if α = xα, ǫ First(x), x V n Sufficient to consider First(X) where X is a non-terminal. Martin Sulzmann Syntax Analysis 24 / 38

First Algorithm Algorithm Build First(X) as follows: 1 If X is a terminal then First(X) = {X}. 2 If X ǫ then add ǫ to First(X). 3 If X Y 1...Y k : 1 Put First(Y 1 ) {ǫ} in First(X). 2 i.1 < i <= k, if ǫ First(Y 1 )... First(Y i 1 ) (i.e. Y 1...Y i 1 ǫ) then put First(Y i ) {ǫ} in First(X). 3 If ǫ First(Y 1 )... First(Y k ) then add ǫ to First(X). Repeat until no more additions can be made. Martin Sulzmann Syntax Analysis 25 / 38

First Example Example E TE E +TE E ǫ T FT T FT T ǫ F ident F (E) First(E) = {ident,(} First(E ) = {+,ǫ} First(T) = {ident,(} First(T ) = {,ǫ} First(F) = {ident,(} Martin Sulzmann Syntax Analysis 26 / 38

Follow Issue Which terminals can start a string matching after the nonterminal? Consider ǫ right hand sides. Definition Follow(A) = {w First(β) S waβ} Algorithm Build Follow(A) as follows: 1 Add $ to Follow(S) where S is the start symbol. 2 If A αbβ: 1 Put First(β) {ǫ} in Follow(B) 2 If β = ǫ (i.e. A αb) or ǫ First(β) (i.e. β ǫ) then put Follow(A) in Follow(B). Repeat until no more additions can be made ($ = EOF token). Martin Sulzmann Syntax Analysis 27 / 38

First and Follow Example Example S BAa A ǫ BcA B ǫ b First Follow S a,b,c $ A ǫ,b,c a B ǫ,b a,b,c Martin Sulzmann Syntax Analysis 28 / 38

LL(1) Grammar definition A grammar G is LL(1) iff for each set of productions A α 1... α n : 1 First(α 1 ),...,First(α n ) are all pairwise disjoint. 2 If α i ǫ then First(α j ) Follow(A) = forall 1 <= j <= n,i j. If G is ǫ-free (there are no ǫ productions), first condition is sufficient. Facts 1 No left-recursive grammar is LL(1). 2 No ambiguous grammar is LL(1). 3 Some languages have no LL(1) grammar. Martin Sulzmann Syntax Analysis 29 / 38

LL(1) Grammar (cont d) Example S as a is not LL(1) because First(aS) = First(a) = {a}. However, the following equivalent grammar is LL(1). S as S as ǫ Lemma A grammar is LL(1) iff for each two productions A α and A β we have that Look(A α) Look(A β) = where Look(A α) = { First(α)\{ǫ} if ǫ Follow(α) (First(α)\{ǫ}) Follow(A) otherwise Martin Sulzmann Syntax Analysis 30 / 38

Recursive Descent Parsing Example E TE E +TE E ǫ T FT T FT T ǫ F ident F (E) First(E) = {ident,(} First(E ) = {+,ǫ} First(T) = {ident,(} First(T ) = {,ǫ} First(F) = {ident,(} Recursive Descent Parser Grammar is LL(1). Introduce a function for each non-terminal symbol. Martin Sulzmann Syntax Analysis 31 / 38

Recursive Descent Parser Parser Ignore the parse tree, so strictly speaking we only build a matcher. e() { t() ; e (); if token!= EOF then HALT; } e () { if token == PLUS then get_next_token(); t(); e (); } t() { f(); t (); } t () { if token == TIMES then get_next_token(); f(); t (); } f () { if token == IDENT then get_next_token(); else if token == OPENPAR then get_next_token(); e(); if token==closepar then get_next_token(); else HALT; } Martin Sulzmann Syntax Analysis 32 / 38

Table Driven Parsing Fact A program making use of recursive function can be simulated by a non-recursive program (by making use of a stack). Table Driven Approach Input Stack elements are elements in our voc Input consists of terminal symbols Table encodes LL(1) actions Stack LL(1) parser Parsing table M Martin Sulzmann Syntax Analysis 33 / 38

Parse Table Construction Algorithm Input: Grammar G Output: Parsing table M Method: 1 For each production A α: 1 For each a First(α) add A α to M[A,a]. 2 If ǫ First(α): 1 For each b Follow(A) add A α to M[A,b]. 2 If $ Follow(A), add A α to M[A,$]. 2 Set each undefined entry of M to error (or leave empty). If there are multiple entries then grammar is not LL(1). Martin Sulzmann Syntax Analysis 34 / 38

Table Driven Parsing Algorithm Push $; Push start symbol while Top of stack is not $ { X:= top of stack; a:= input symbol if X is a terminal then if X == a then pop X; advance input else error else if M[X,a] == X->Y1... Yk then pop X; push Yk,..., Y1 else error } Martin Sulzmann Syntax Analysis 35 / 38

Example First and Follow Sets E TE First(E) = {ident,(} Follow(E) = {$,)} E +TE First(E ) = {+,ǫ} Follow(E ) = {$,)} E ǫ T FT First(T) = {ident,(} Follow(T) = {$,+,)} T FT First(T ) = {,ǫ} Follow(T ) = {$,+,)} T ǫ F ident First(F) = {ident,(} Follow(F) = {$,+,,)} F (E) Martin Sulzmann Syntax Analysis 36 / 38

Parse Table + Execution ident + ( ) $ E E TE E TE E E +TE E ǫ E ǫ T T FT T FT T T ǫ T FT T ǫ T ǫ F F ident F (E) Stack Input Rule Stack Input Rule $E id + (id) E TE $E T )E id) E TE $E T id + (id) T FT $E T )E T id) T FT $E T F id + (id) F id $E T )E T F id) F id $E T id id + (id) $E T )E T id id) $E T +(id) T ǫ $E T )E T ) T ǫ $E +(id) E +TE $E T )E ) E ǫ $E T+ +(id) $E T ) ) $E T (id) T FT $E T $ T ǫ $E T F (id) F (E) $E $ E ǫ $E T )E( (id) $ $ Martin Sulzmann Syntax Analysis 37 / 38

Top-Down Parsing Summary Predictive parsing (removal left recursion, left factoring). LL(1) grammars. Recursive descent parsing. Table driven parsing. How to construct parse tree? (later) Error recovery important (but no time!), please consult text book. ANTRL state-of-the art LL-style parser for Java. Martin Sulzmann Syntax Analysis 38 / 38