A parser is some system capable of constructing the derivation of any sentence in some language L(G) based on a grammar G.

Similar documents
Introduction to Syntax Directed Translation and Top-Down Parsers

Chapter 4. Lexical and Syntax Analysis

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Top down vs. bottom up parsing

COP4020 Programming Assignment 2 - Fall 2016

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Software II: Principles of Programming Languages

COP4020 Programming Assignment 2 Spring 2011

4. Lexical and Syntax Analysis

CS 4120 Introduction to Compilers

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

4. Lexical and Syntax Analysis

CSCI312 Principles of Programming Languages

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Syntax-Directed Translation

Topic 3: Syntax Analysis I

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

Abstract Syntax Trees & Top-Down Parsing

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

4. LEXICAL AND SYNTAX ANALYSIS

Left to right design 1

Introduction to Parsing. Lecture 8

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Computer Science 160 Translation of Programming Languages

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Wednesday, September 9, 15. Parsers

Types of parsing. CMSC 430 Lecture 4, Page 1

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

Introduction to Parsing. Comp 412

Chapter 3. Describing Syntax and Semantics ISBN

shift-reduce parsing

CSX-lite Example. LL(1) Parse Tables. LL(1) Parser Driver. Example of LL(1) Parsing. An LL(1) parse table, T, is a twodimensional

Monday, September 13, Parsers

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Table-Driven Top-Down Parsers

Syntactic Analysis. Top-Down Parsing

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

LECTURE 7. Lex and Intro to Parsing

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Lexical and Syntax Analysis (2)

Chapter 3. Parsing #1

Syntax-Directed Translation. Lecture 14

Programming Language Syntax and Analysis

Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

CS 314 Principles of Programming Languages

LR Parsing LALR Parser Generators

Configuration Sets for CSX- Lite. Parser Action Table

Wednesday, August 31, Parsers

CS 406/534 Compiler Construction Parsing Part I

Lexical and Syntax Analysis. Top-Down Parsing

Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

Homework & Announcements

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Week 2: Syntax Specification, Grammars

Concepts Introduced in Chapter 4

CMSC 330: Organization of Programming Languages

CSC 4181 Compiler Construction. Parsing. Outline. Introduction

CSCI312 Principles of Programming Languages!

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Syntax Analysis Check syntax and construct abstract syntax tree

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

Outline CS412/413. Administrivia. Review. Grammars. Left vs. Right Recursion. More tips forll(1) grammars Bottom-up parsing LR(0) parser construction

1. Lexical Analysis Phase

Bottom-Up Parsing. Lecture 11-12

LR Parsing LALR Parser Generators

Lexical and Syntax Analysis

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

How do LL(1) Parsers Build Syntax Trees?

Compiler Construction Assignment 1 Spring 2018

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Abstract Syntax Trees Synthetic and Inherited Attributes

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

ICOM 4036 Spring 2004

Compiler Construction Assignment 1 Fall 2009

Functions BCA-105. Few Facts About Functions:

CA Compiler Construction

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

Chapter 4: Syntax Analyzer

Compiler Construction D7011E

Lecture 14 Sections Mon, Mar 2, 2009

LECTURE 3. Compiler Phases

MIT Top-Down Parsing. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Transcription:

Top Down Parsing 1

Parsers Formal Definitions of a Parser A parser is some system capable of constructing the derivation of any sentence in some language L(G) based on a grammar G. which talks about a derivation A parser for a grammar G is a program that reads a string w and produces either a parse tree for w (if w is a sentence of G) or an error message. which is about building a parse tree 13

Two Major Classes of Parser Top-Down Parser: a parser that builds a parse tree starting at the root (the start symbol), building downwards to the leaves (tokens) Bottom-Up Parser: a parser that starts at the leaf nodes (bottom), and builds upwards towards the root. Has to match up strings of symbols which match the RHS of productions, and reduce them, replacing them with the non-terminal on the LHS Sometimes called a shift-reduce parser 14

Top-Down Parser Construction Top-down parsers, especially hand-written ones, are easier to understand and visualize the first time around So we re going to invent top-down parsing looking at progressively more interesting grammars. 15

Parsing a Very Simple Grammar Grammar G1: S if ( true ) goto label We need a lexical analyser getoken() which identifies one token at a time and returns a token code in the form of a manifest (or enumerated) constant (to use vanilla C terminology). Something like: #define IF 256 #define TRUE 257 etc. For single-character tokens like the left parenthesis, we can use the ascii code for the character as the code. 16

G1 Parser (continued) We re describing algorithms, and we ll stick to procedural code, mostly in C Here is a useful procedure which will simplify the programming of a parser. Note the use of a lookahead token in the variable tok. int tok; /* Yes, a global variable! */ match(int t) { if (tok == t) tok = getoken(); else error( Syntax error ); 17

The Actual (G1) Parser The Grammar: S if ( true ) goto label main() { tok = getoken( ); /* read first lookahead */ S( ); printf("parse complete!"); S( ) { match(if); match( '(' ); match(true); match( ')' ); match(goto); match(label); 18

Dealing with Non-Terminal Symbols Grammar G2: S if ( E ) goto label E id > id Approach: simply write a new parsing procedure for each non-terminal symbol in the grammar, and instead of matching, if we encounter a non-terminal in some RHS we simply call the parsing procedure for that non-terminal 19

G2 Parser G2: S if ( E ) goto label E id > id Parser: main( ) { /* same as before */ S( ) { match(if); match( '(' ); E( ); match( ')' ); match(goto); match(label); E( ) { match(id); match( '>' ); match(id); 20

Multiple Productions A more realistic grammar would generate more than one sentence (!), and require that the parser actually make choices. For example: G3: S if ( E ) S E goto label id = id id > id 21

Multiple Productions If there is more than one production for some non-terminal, the parser much decide which one to apply We will use the look-ahead symbol to make this decision comparing the look-ahead with the first symbol on the LHS of each production but only works if the first symbol is a terminal! 22

G3 Parser G3: S if ( E ) S goto label id = id E id > id main( ) and e( ) have not changed from the previous version Note the recursion in S which supports nested if s This is a recursive descent parser S( ) { if (tok == IF) { match(if); match( '(' ); E( ); match( ')' ); S( ); else if (tok == GOTO) { match(goto); match(label); else if (tok == ID) { match(id); match( '=' ); match(id); else error("syntax error in rule S ); 23

But Not All Productions Start with Tokens! How about grammar G4?: S IF_ST LOOP ASSIGN IF_ST if ( E ) S LOOP while ( E ) do S do S while ( E ) ASSIGN id = id E id > id How will we decide which productions of S to choose in our parser? 24

When Productions Start With a Nonterminal S IF_ST LOOP ASSIGN IF_ST if ( E ) S LOOP while ( E ) do S do S while ( E ) ASSIGN id = id E id > id We need to know something about what strings a RHS can derive Let A α be a production Define FIRST (α) to be the set of terminals which could be the first token in strings derived from α 25

Example of FIRST( ) A fragment of G4: S LOOP LOOP while ( E ) do S do S while ( E ) What is FIRST (LOOP)? FIRST (LOOP) = { while, do (the set consisting of the token while and the token do ) So select production A α if the look ahead token is in First (α) 26

Choosing the productions of S S IF_ST LOOP IF_ST LOOP ASSIGN if ( E ) S while ( E ) do S do S while ( E ) ASSIGN id = id E id > id We need the FIRST sets for the productions of S FIRST(IF_ST)={ if FIRST(LOOP)={ while, do FIRST(ASSIGN)={ id 27

Parsing Procedure for S S( ) { if (tok == IF) IF_ST( ); else if ((tok == WHILE) (tok == DO)) LOOP( ); else if (tok == ID) ASSIGN( ); else error("syntax error in rule S"); 28

Footnotes and Gotcha's Remember: S LOOP LOOP while ( E ) do S do S while ( E ) While computing FIRST(LOOP) we may find a production for LOOP that starts with a non-terminal: (must go deeper into grammar..)..and If we have two productions A α and A β then FIRST (α) and FIRST (β) must be disjoint 29

Parse Tree Traversal in Recursive Descent Parsers Our parser did not produce a parse tree..but it could have, because it constructed a left-most derivation, which is equivalent In what order does the parser traverse the conceptual tree? Define visit to mean: We have just called match( ) to consume a leaf node, or: We have just called the parsing procedure for a non-terminal (an interior node) and that procedure has just finished matching one of its productions 30

Parse Tree Traversal S IF_ST LOOP ASSIGN IF_ST if ( E ) S LOOP while ( E ) do S do S while ( E ) ASSIGN id = id E id > id S LOOP while id ( E > ) do id S ASSIGN Let s trace: while (x > y) do y = x id = id 31

Recursive Descent is Depth-First! This was depth-first, as we defined earlier Therefore we could perform a translation scheme using such a parser We just insert the actions (code fragments) in the parsing routines, at the points which correspond to visit Let s try it! 32

First Attempt at a Translator E E + T E - T T T T * F T / F F F ( E ) num Let s try to write the parsing routine for E ( ) We need to compute FIRST(E) and FIRST(T) FIRST(E) = { (, num FIRST(T) = { (, num Whoops! (Not only do we have no way to decide between the first two E productions, we can t even distinguish them from the T one.) But it gets worse: if E( ) chooses production E+T, the first thing that happens is it calls E( ) recursively! 33

Left-Recursion and Top-Down Parsers If the first thing a parsing procedure does is call itself recursively without first consuming any input It will make the same decision the next time because the "state" has not changed and call itself again and again and again Bad News: We cannot use a recursive descent parser (or any other top-down parser) for a grammar that contains left-recursive productions So we need a method to transform grammars containing left-recursion into grammars without leftrecursion 34

Elimination of Left Recursion Consider the left-recursive grammar S S α β What language does this generate? S generates all strings starting with a β and followed by any number of α s Can rewrite using right-recursion S β S S α S ϵ 35

Elimination of Left Recursion In general S S α 1 S α n β 1 β m All strings derived from S start with one of β 1,,β m and continue with several instances of any of the α 1,,α n Rewrite as S β 1 S β m S S α 1 S α n S ϵ 36

General Left Recursion The grammar S A α δ A S β is also left-recursive because + S S β α This left-recursion can also be eliminated See book, pp. 157-162 for general algorithm 37

Second Attempt at Translator E E + T { print + E - T { print - T T T * F { print * T / F { print / F F ( E ) num {print num We will treat the actions just like other grammar symbols: Transformed: E T R R + T { print + R - T { print - R ϵ T F Q Q * F { print * Q / F { print / Q ϵ i.e { foo=bar counts as one symbol F ( E ) num {print num 38

Writing a Parser E T R R + T R - T R ϵ T F Q Q * F Q / F Q ϵ F ( E ) num E( ) { T( ); R( ); R( ) { if (tok== + ) { match( + ); T( ); R( ); else if (tok== - ) { match( - ); T( ); R( ); else ; /* accept empty string */ 39

Notes Note how the semantic action has been carried into the middle of production RHS s and code inserted at the appropriate point in the parsing routine Productions for T and Q are almost identical in form: 40

The Rest of the Parser/Translator T( ) { F( ); Q( ); Q( ) { if (tok== * ) { match( * ); F( ); Q( ); else if (tok== / ) { match( / ); F( ); Q( ); else ; /* accept empty string */ F( ) { if (tok == NUM) { printf( %d ", tokenval); match(num); else if (tok == '(') { match('('); E( ); match(')'); else error("syntax error"); 41

We Need a Lexical Analyser We've already seen how straightforward it is to write a "lexer" for a simple grammar such as this. Tokens are: num ( ) + - * / Here s the header stuff: #include <stdio.h> #include <ctype.h> #include "globals.h" #define NONE -1 /* used for tokenval*/ #define NUM 256 #define ID 257 /* not needed yet */ #define DONE 258 int tok= NONE; int lineno = 1; int tokenval = NONE; 42

A Suitable Handwritten Lexer int getoken() { int t; while(1) { t = getchar(); if (t==' ' t=='\t') ; /*strip white space*/ else if (t == '\n') ++lineno; else if (isdigit(t)) { /* num */ tokenval = t - '0'; t = getchar(); while (isdigit(t)) { tokenval = tokenval*10 + t-'0'; t = getchar(); ungetc(t, stdin); return NUM; else if (t == EOF) return DONE; else { tokenval = NONE; return t; /* + - ( ) etc. */ We should really enumerate all the legal one-char tokens and report illegal characters, but if we don t the parser will end up reporting the error anyway... Note the use of one-character lookahead in lexer using a trick from C library 43

A Second Translator For a second example, let s generate code for a stack machine with the following instructions: push x pushcon num add subtract multiply divide push contents of word at address x push integer constant num pop 2 words, add them, push sum pop 2 words, subtract, push result pop 2 words, multiply, push product pop 2 words, divide, push quotient 44

Stack Machine Example x + 22 would translate to: push x pushcon 22 add (Let s keep it simple and assume there are only 26 variables a..z: I.e. variable names are single letters) 45

Translation Scheme for Stack Machine (Note introduction of variables) E T E + T E - T T T * F T / F F F id num ( E ) 46

Remove Left Recursion E T R R + T R - T R ϵ T F Q Q * F Q / F Q ϵ F id num ( E ) 47

Expand the Lexer a Little #define NONE -1 #define NUM 256 #define ID 257 #define DONE 258 int getoken() { int t; while(1) { t = getchar(); if (t==' ' t=='\t') ; /*strip white space*/ else if (t == '\n') ++lineno; else if (isdigit(t)) { tokenval = t - '0'; t = getchar(); while (isdigit(t)) { tokenval = tokenval*10 + t-'0'; t = getchar(); ungetc(t, stdin); return NUM; else if (isalpha(t)) { /* identifier */ tokenval = tolower(t); return ID; else if (t == EOF) return DONE; else { tokenval = NONE; return t; /* + - ( ) etc. */ 48

Add Code to Generate Assembly Code /* Functions to simplify printing assembly instructions and report errors */ /* Each prints a single line, hiding formatting details from parser */ void emit(char * x) { /* 0-address instruction (opcode only) */ printf("\t%s\n",x); void emit1n(char *x, int y) { /* 1-address instruction, numeric parameter */ printf("\t%s %d\n",x,y); void emit1c(char *x, char y) { /* 1-address instruction, char parameter */ printf("\t%s %c\n",(x),(y)); error(m) char *m; { fprintf(stderr,"line %d: %s\n",lineno,m); exit(1); Note that these are simple enough that they could actually be written as C "macros 49