Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Similar documents
LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Syntax-Directed Translation. Lecture 14

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Parsing II Top-down parsing. Comp 412

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Abstract Syntax Trees & Top-Down Parsing

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Homework & Announcements

Syntax. In Text: Chapter 3

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

Top down vs. bottom up parsing

Outline. Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Context-free grammars (CFG s)

CS 314 Principles of Programming Languages

Context-Free Grammars

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Introduction to Parsing. Lecture 8

COP 3402 Systems Software Syntax Analysis (Parser)

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Error Handling Syntax-Directed Translation Recursive Descent Parsing

CMSC 330: Organization of Programming Languages

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Lexical and Syntax Analysis. Top-Down Parsing

Context-Free Grammars

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

Error Handling Syntax-Directed Translation Recursive Descent Parsing. Lecture 6

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Principles of Programming Languages COMP251: Syntax and Grammars

Outline. Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations Syntax-Directed Translation

Chapter 3. Parsing #1

Introduction to Parsing. Lecture 5

Recursive Descent Parsers

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Introduction to Parsing. Lecture 5

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

Syntax Analysis Check syntax and construct abstract syntax tree

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

CSE431 Translation of Computer Languages

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

4. Lexical and Syntax Analysis

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

CMSC 330: Organization of Programming Languages

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

4. Lexical and Syntax Analysis

Chapter 3. Describing Syntax and Semantics ISBN

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Lexical and Syntax Analysis

Chapter 4. Lexical and Syntax Analysis

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Syntax Analysis Part I

Topic 3: Syntax Analysis I

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Building Compilers with Phoenix

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

1. Explain the input buffer scheme for scanning the source program. How the use of sentinels can improve its performance? Describe in detail.

Lecture 8: Deterministic Bottom-Up Parsing

CSE 401 Midterm Exam Sample Solution 2/11/15

CSCE 314 Programming Languages

CMSC 330: Organization of Programming Languages

Parsing Algorithms. Parsing: continued. Top Down Parsing. Predictive Parser. David Notkin Autumn 2008

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

EECS 6083 Intro to Parsing Context Free Grammars

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Defining syntax using CFGs

BSCS Fall Mid Term Examination December 2012

Lecture 7: Deterministic Bottom-Up Parsing

CMSC 330: Organization of Programming Languages. Context Free Grammars

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Monday, September 13, Parsers

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

Compiler Design Concepts. Syntax Analysis

3. Parsing. Oscar Nierstrasz

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Front End. Hwansoo Han

EDA180: Compiler Construc6on Context- free grammars. Görel Hedin Revised:

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS 406/534 Compiler Construction Putting It All Together

Ambiguity and Errors Syntax-Directed Translation

Plan for Today. Regular Expressions: repetition and choice. Syntax and Semantics. Context Free Grammars

Transcription:

Derivations vs Parses Grammar is used to derive string or construct parser Context ree Grammars A derivation is a sequence of applications of rules Starting from the start symbol S......... (sentence) Leftmost and rightmost derivations At each derivation step, a leftmost derivation always replaces the leftmost non-terminal symbol Rightmost derivation always replaces the rightmost one xample ( ) Leftmost derivation:... Rightmost derivation:... Parse Tree Parse tree: Internal nodes are non-terminals Leaves are terminals It filters out the order of replacement and describes the hierarchy The same parse tree results from both the rightmost and leftmost derivations in the previous example: Different Parse Trees While the two derivations could have the same parse tree for there can actually be 3 different trees: Ambiguity A grammar G is ambiguous if there exists a string str L(G) such that more than one parse trees derive str We prefer unambiguous grammars. Ambiguity is the property of a grammar and not the language It is possible to rewrite the grammar to remove ambiguity 1

Removing Ambiguity Method 1: Specify precedence. You can build precedence into the grammar by having a different non-terminal for each precedence level: Lowest level highest in the tree (lowest precedence) Highest level lowest in the tree Same level same precedence or the previous example, ( ) rewrite it to: T T T T ( ) T T T T Removing Ambiguity Method 2: Specify associativity. When recursion is allowed, we need to specify associativity or the previous example, Allows both right and left associativity. We can rewrite it to force it either way: Left associative : T Right associative: T In a programming language, most operators are left associative. Syntax Analysis We ve only discussed grammar from the point of view of derivation. What is syntax analysis? To process an input string for a given grammar, and compose the derivation if the string is in the language Two subtasks: to determine if string in the language or not to construct the parse tree Types of Parsers Universal parser Can parse any CG grammar. (arly s algorithm) Powerful but extremely inefficient Top-down parser It is goal-directed, expands the start symbol to the given sentence Only works for certain class of grammars To start from the root of the parse tree and reach leaves ind leftmost derivation Can be implemented efficiently by hand Is it possible to construct such a parser? Types of Parsers Bottom-up parser It tries to reduce input string to the start symbol Works for wer class of grammars Starts at leaves and build tree in bottom-up fashion ind reverse order of the rightmost derivation Automated tool generates it automatically Parser Output We have a choice of outputs from the parser: A parse tree (concrete syntax tree), or An abstract syntax tree xample Grammar: int ( ) and an input: 5 ( 2 3 ) After lexical analysis, we have a sequence of tokens INT:5 ( INT:2 INT:3 ) 2

Parser Output Summary The parse tree traces the operation of the parser. Captures the nested structure but contains too much information: Parentheses (precedence encoded in tree hierarchy) Single-successor nodes (could be collapsed/omitted) INT:5 ( ) INT:2 INT:3 We specify the syntax structure using CG even if the programming language itself is not context free. A parser can: Answer if an input str L(G) and build a parse tree or build an AST instead and pass it to the rest of compiler. We prefer an Abstract Syntax Tree (AST): AST also captures the nested structure. AST abstracts from the concrete syntax. AST is more compact and easier to use. PLUS 5 PLUS 2 3 Parsing We will study two approaches: Parsing Top-down asier to understand and implement manually Bottom-up More powerful, can be implemented automatically Top Down Parsers Recursive descent Simple to implement, use backtracking Predictive parser Predict the rule based on the 1st m symbols without backtracking Restrictions on the grammar to avo backtracking LL(k) predictive parser for LL(k) grammar Non recursive and only k symbol look ahead Table driven efficient Approach: or a non-terminal in the derivation, productions are tried in some order until A production is found that generates a portion of the input, or No production is found that generates a portion of the input, in which case backtrack to previous non-terminal. Parsing fails if no production for the start symbol generates the entire input. Terminals of the derivation are compared against input. Match advance input, continue parsing Mismatch backtrack, or fail 3

Grammar: T T T int T int ( ) Input string: int int Start symbol: Assume: When there are alternative rules, try right rule first Input Derivation Action int int pick rightmost rule T int int T pick rightmost rule T ( ) int int T ( ) ( does not match int int int T ailure, backtrack one level. int int T int pick next rule T int int int T int int matches input int int int T We have more tokens, so this is failure too. Backtrack. int int T int T Match int xpand T. int int T int T int ( ) pick rightmost rule ( ) int int T int T int ( ) ( does not match input int int int T int T ailure, backtrack one level. int int T int T int int pick next rule T int int int T int T int int Match whole input. Accept. Implementation Create a procedure for each non-terminal: 1. Checks if input symbol matches a terminal symbol in the grammar rule 2. Calls other procedure when non-terminals are part of the rule 3. If end of procedure is reached, success is reported to the caller int ( ) vo () { switch(lexer.yylex()) { case INT: eat(int); break; case LPARN: eat(lparn); (); eat(rparn); break; case???: (); eat(plus); (); break; } } Problems Unclear what to label the last case with. What if we don t label it at all and make it the default? Conser parsing 5 5: We d find INT and be done with the parse with more input to consume. We d want to backtrack, but there s no prior function call to return to. What if we put the call to () prior to the switch/case? Then () would always make a recursive call to () with no end case for the recursion. Left Recursion A production is left recursive if the same nonterminal that appears on the LHS appears first on the RHS of the production. Recursive descent parsers cannot deal with left recursion. However, we can rewrite the grammar to represent the same language without the need for left recursion. Removing Left Recursion In general, we can eliminate all immediate left recursion: A A x y By changing the grammar to: A y A A x A Not all left recursion is immediate may be hden in multiple production rules A BC D B A There is a general approach for removing indirect left recursion, but we ll not worry about if for this course. 4

Recursive Descent Summary Recursive descent is a simple and general parsing strategy Left-recursion must be eliminated first But this can be done automatically It is not popular because of its inefficiency: Backtracking re-parses the string Undoing semantic actions (actions taken upon matching a production much like the actions from our lexer) may be difficult! Techniques used in practice do no backtracking at the cost of restricting the class of grammar 5