Revisit the example. Transformed DFA 10/1/16 A B C D E. Start

Similar documents
Lexical and Syntax Analysis (2)

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

4. Lexical and Syntax Analysis

4. Lexical and Syntax Analysis

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Lexical and Syntax Analysis

Chapter 4. Lexical and Syntax Analysis

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

ICOM 4036 Spring 2004

Chapter 3. Describing Syntax and Semantics ISBN

Chapter 2 - Programming Language Syntax. September 20, 2017

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

CS 230 Programming Languages

Downloaded from Page 1. LR Parsing

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Context-free grammars

CS 314 Principles of Programming Languages

Top down vs. bottom up parsing

Compiler Construction: Parsing

Lecturer: William W.Y. Hsu. Programming Languages

CSE 3302 Programming Languages Lecture 2: Syntax

Programming Language Syntax and Analysis

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

4. LEXICAL AND SYNTAX ANALYSIS

2068 (I) Attempt all questions.

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Parsing Algorithms. Parsing: continued. Top Down Parsing. Predictive Parser. David Notkin Autumn 2008

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Introduction. Introduction. Introduction. Lexical Analysis. Lexical Analysis 4/2/2019. Chapter 4. Lexical and Syntax Analysis.

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

CS 406/534 Compiler Construction Parsing Part I

Chapter 2 :: Programming Language Syntax

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012


Recursive Descent Parsers

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

Chapter 3. Parsing #1

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Parsing II Top-down parsing. Comp 412

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Concepts Introduced in Chapter 4

Bottom-Up Parsing. Lecture 11-12

Lecture 7: Deterministic Bottom-Up Parsing

Syntax. In Text: Chapter 3

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

LR Parsing Techniques

Syntax-Directed Translation. Lecture 14

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Lecture 8: Deterministic Bottom-Up Parsing

A programming language requires two major definitions A simple one pass compiler

Syntax Analysis Part I

LECTURE 7. Lex and Intro to Parsing

Types of parsing. CMSC 430 Lecture 4, Page 1

3. Parsing. Oscar Nierstrasz

Bottom-Up Parsing. Parser Generation. LR Parsing. Constructing LR Parser

Parsing Expression Grammars and Packrat Parsing. Aaron Moss

Monday, September 13, Parsers

Lexical and Syntax Analysis. Bottom-Up Parsing

Syntax Analysis, III Comp 412

Introduction to Parsing. Comp 412

Wednesday, August 31, Parsers

CSE 401 Midterm Exam Sample Solution 2/11/15

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

CS 403: Scanning and Parsing

10/12/17. Motivating Example. Lexical and Syntax Analysis (2) Recursive-Descent Parsing. Recursive-Descent Parsing. Recursive-Descent Parsing

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

ECE251 Midterm practice questions, Fall 2010

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Syntax Analysis, III Comp 412

Compiler Construction 2016/2017 Syntax Analysis

Table-Driven Parsing

LR Parsers. Aditi Raste, CCOEW

SLR parsers. LR(0) items

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

CMSC 330 Practice Problem 4 Solutions

Transcription:

Revisit the example ε 0 ε 1 Start ε a ε 2 3 ε b ε 4 5 ε a b b 6 7 8 9 10 ε-closure(0)={0, 1, 2, 4, 7} = A Trans(A, a) = {1, 2, 3, 4, 6, 7, 8} = B Trans(A, b) = {1, 2, 4, 5, 6, 7} = C Trans(B, a) = {1, 2, 3, 4, 6, 7, 8} = B Trans(B, b) = {1, 2, 4, 5, 6, 7, 9} = D Trans(C, a) = {1, 2, 3, 4, 6, 7, 8} = B Trans(C, b) = {1, 2, 4, 5, 6, 7} = C Trans(D, a) = {1, 2, 3, 4, 6, 7, 8} = B Trans(D, b) = {1, 2, 4, 5, 6, 7, 10} = E (mark as end state because 10 is included) Trans(E, a) = {1, 2, 3, 4, 6, 7, 8} = B Trans(E, b) = {1, 2, 4, 5, 6, 7} = C 1 ε Transformed DFA b a b a a b A B C D E a b Start b a 2 1

Minimizing the DFA Insight Identify equivalent state sets if all states have the same transitions Merge equivalent states as a new state in the refined DFA 3 Minimizing the DFA Initially partition the state set into two groups: (1) Group I: the states with only final states, and (2) Group II: the states with only non-final states for each group G do partition G into subgroups such that for any states s and t in G, for all input symbols, they have transitions to states in the same group replace G with identified subgroups 4 2

Initially, two partitions: G1 = {E}, G2 = {A, B, C, D} Revisit the example G1 cannot be further partitioned State Input symbol Trans(G2, a) = {B} G2 a b Trans(G2, b) = {C, D, E}, the resulting set A B C does not belong to the same group B B D partition G2 into {A, B, C}=G3, {D}=G4 C B C Trans(G3, a) = {B} G3 D B E Trans(G3, b) = {C, D}, the resulting set E B C does not belong to the same group Partition G3 into {A, C} = G5, {B} = G6 Trans(G5, a) = {B} = G6 Trans(G5, b) = {C} G5 Therefore, the resulting partition is: {A, C}, {B}, {D}, {E} 5 Refined DFA a b a b b S0 S1 S2 S3 a a Start b 6 3

State Transition Diagram (DFA) Le-er/Digit Start Le-er id Digit int_lit Digit 7 Constructing the Lexical Analysis Convenient utility subprograms: getchar - gets the next character of input, puts it in nextchar, determines its class and puts the class in charclass addchar - puts the character from nextchar into the place the token is being accumulated: nexttoken 8 4

Implementation Pseudo-code static TOKEN nexttoken; static CHAR_CLASS charclass; void lex() { getchar(); switch (charclass) { case LETTER: addchar(); getchar(); while (charclass == LETTER charclass == DIGIT) { addchar(); getchar(); } return; //nexttoken = ID 9 Implementation (Cont d) case DIGIT: addchar(); getchar(); while (charclass == DIGIT) { addchar(); getchar(); } return;//nexttoken = INT_LIT default: report error(); } /* End of switch */ } /* End of function lex */ 10 5

Key Points about Scanner Nearly universal rule Always take the longest possible token from the input foobar is never parsed to foo or fooba Regular expressions generate a regular language, while DFAs recognize it 11 Parser By analogy to RE and DFAs, a contextfree grammar (CFG) is a generator for a context-free language, while a parser is a language recognizer Responsibilities Generate a parse tree, report syntax errors if any 12 6

Two Classes of Grammars Left-to-right, Leftmost derivation (LL) Left-to-right, Rightmost derivation (LR) We can build parsers for these grammars that run in linear time 13 Grammar Comparison LL E -> T E E -> + T E ε T -> F T T -> * F T ε F -> id LR E -> E + T T T -> T * F F F -> id 14 7

Two Categories of Parsers LL(1) Parsers L: scanning the input from left to right L: producing a leftmost derivation 1: using one input symbol of lookahead at each step to make parsing action decisions LR(1) Parsers L: scanning the input from left to right R: producing a rightmost derivation in reverse 1: the same as above 15 Two Categories of Parsers LL(1) parsers (predicative parsers) Top down Build the parse tree from the root Find a left most derivation for an input string LR(1) parsers (shift-reduce parsers) Bottom up Build the parse tree from leaves Reducing a string to the start symbol of a grammar 16 8

Top-down parsing algorithms Recursive predictive parsing Recursive-descent parsing Non-recursive predictive parsing LL(1): Table-driven parsing 17 Motivating Example Consider the grammar S -> cad A -> ab a Input string: w = cad How to build a parse tree top-down? 18 9

Recursive-Descent Parsing Initially create a tree containing a single node S (the start symbol) Apply the S-rule to see whether the first token matches If matches, expand the tree Apply the A-rule to the leftmost nonterminal A Since the first token matches both alternatives (A1 and A2), randomly pick one (e.g., A1) to apply S c A d S c A d a b 19 Recursive-Descent Parsing S c Since the third token d does not match b, report A d failure and go back to A to try another a b alternative Rollback to the state before applying A1 rule, S and then apply the alternative rule The third token matches, so parsing is successfully done c A a d 20 10

Recursive-Descent Parsing Algorithm Suppose we have a scanner which generates the next token as needed. Given a string, the parsing process starts with the start symbol rule: if there is only one RHS then for each terminal in the RHS compare it with the next input token if they match, then continue else report an error for each nonterminal in the RHS call its corresponding subprogram and try match if no match is found, then report an error else // there is more than one RHS choose the RHS based on the next input token (the lookahead) for each chosen RHS call the corresponding subprogram and try match if no match is found, then report an error 21 Constructing Parser Utility program match( ) sees what it expects to see, and do corresponding processing 22 11

An Example (one RHS) /* Function expr Parses strings in the language generated by the rule: <expr> <term> {(+ -) <term>} */ void expr() { /* Parse the first term */ term(); /* As long as the next token is + or -, call lex to get the next token, and parse the next term */ } while (nexttoken == PLUS_CODE nexttoken == MINUS_CODE){ lex(); term(); } 23 Another Example (multiple RHS) /* Function factor Parses strings in the language generated by the rule: <factor> -> id (<expr>) */ void factor() { if (nexttoken == ID_CODE) { lex(); } else if (nexttoken == LEFT_PAREN_CODE) { lex(); expr(); if (nexttoken == RIGHT_PAREN_CODE) { lex(); } else error(); } else error(); /* Neither RHS matches */ } 24 12

Key points about recursive-descent parsing Recursive-descent parsing may require backtracking LL(1) does not allow backtracking By only looking at the next input token, we can always precisely decide which rule to apply By carefully designing a grammar, i.e., LL(1) grammar, we can avoid backtracking 25 Two Obstacles to LL(1)-ness Left recursion E.g., id_list -> id_list_prefix ; id_list_prefix -> id_list_prefix, id id When the next token is id, which rule should we apply? Common prefixes E.g., A -> ab a When the next token is a, which rule should we apply? 26 13

LL(1) Grammar Grammar which can be processed with LL(1) parser Non-LL grammar can be converted to LL(1) grammar via: Left-recursion elimination Left factoring by extracting common prefixes 27 Left-Recursion Elimination Replace left-recursion with rightrecursion id_list -> id_list_prefix ; id_list_prefix -> id_list_prefix, id id => id_list -> id id_list_tail id_list_tail -> ;, id id_list_tail 28 14

Left Factoring Extract the common prefixes, and introduce new nonterminals as needed A -> ab a => A -> ab B -> b ε 29 Non-LL Languages Simply eliminating left recursion and common prefixes is not guaranteed to make LL(1) An example in Pascal: stmt -> if condition then_clause else_clause other_stmt then_clause -> then stmt else_clause -> else stmt ε How to parse if C1 then if C2 then S1 else S2? 30 15

Non-LL Languages Define disambiguating rule, use it together with ambiguous grammar to parse top-down E.g., in the case of a conflict between two possible productions, the one to use is the one that occurs first, textually in the grammar to pair the else with the nearest then Disambiguating rule can be also defined for bottom-up parsing 31 16