Introduction to Syntax Directed Translation and Top-Down Parsers

Similar documents
A parser is some system capable of constructing the derivation of any sentence in some language L(G) based on a grammar G.

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Top down vs. bottom up parsing

Chapter 4. Lexical and Syntax Analysis

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

CSCI312 Principles of Programming Languages

Software II: Principles of Programming Languages

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

COP4020 Programming Assignment 2 - Fall 2016

A programming language requires two major definitions A simple one pass compiler

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

COP4020 Programming Assignment 2 Spring 2011

CS 4120 Introduction to Compilers

4. Lexical and Syntax Analysis

4. Lexical and Syntax Analysis

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Syntax-Directed Translation

A Simple Syntax-Directed Translator

Wednesday, September 9, 15. Parsers

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Left to right design 1

Monday, September 13, Parsers

Abstract Syntax Trees & Top-Down Parsing

shift-reduce parsing

Syntactic Analysis. Top-Down Parsing

Types of parsing. CMSC 430 Lecture 4, Page 1

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Topic 3: Syntax Analysis I

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

Computer Science 160 Translation of Programming Languages

Lexical and Syntax Analysis (2)

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers

Syntax-Directed Translation. CS Compiler Design. SDD and SDT scheme. Example: SDD vs SDT scheme infix to postfix trans

CSX-lite Example. LL(1) Parse Tables. LL(1) Parser Driver. Example of LL(1) Parsing. An LL(1) parse table, T, is a twodimensional

Table-Driven Top-Down Parsers

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

Syntax-Directed Translation. Lecture 14

Wednesday, August 31, Parsers

Introduction to Parsing. Lecture 8

4. LEXICAL AND SYNTAX ANALYSIS

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Chapter 3. Parsing #1

Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

1. Lexical Analysis Phase

Chapter 3. Describing Syntax and Semantics ISBN

Introduction to Parsing. Comp 412

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Chapter 4 :: Semantic Analysis

Homework & Announcements

LECTURE 7. Lex and Intro to Parsing

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Programming Language Syntax and Analysis

CS 406/534 Compiler Construction Parsing Part I

CS 314 Principles of Programming Languages

Earlier edition Dragon book has been revised. Course Outline Contact Room 124, tel , rvvliet(at)liacs(dot)nl

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Compiler Design Aug 1996

LR Parsing LALR Parser Generators

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

CA Compiler Construction

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Configuration Sets for CSX- Lite. Parser Action Table

Concepts Introduced in Chapter 4

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Week 2: Syntax Specification, Grammars

Lexical and Syntax Analysis. Top-Down Parsing

CSCI312 Principles of Programming Languages!

Compilers. 5. Attributed Grammars. Laszlo Böszörmenyi Compilers Attributed Grammars - 1

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Semantic actions for declarations and expressions

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

Stating the obvious, people and computers do not speak the same language.

Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

The role of semantic analysis in a compiler

Building Compilers with Phoenix

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Error Handling Syntax-Directed Translation Recursive Descent Parsing. Lecture 6

Semantic actions for declarations and expressions. Monday, September 28, 15

Functions BCA-105. Few Facts About Functions:

How do LL(1) Parsers Build Syntax Trees?

Parsing II Top-down parsing. Comp 412

Chapter 2 A Quick Tour

LECTURE 3. Compiler Phases

Compiler Construction: Parsing

Transcription:

Introduction to Syntax Directed Translation and Top-Down Parsers 1

Attributes and Semantic Rules Let s associate attributes with grammar symbols, and semantic rules with productions. This gives us a syntax directed definition. (Note simpler grammar) Grammar E E 1 + T T T T 1 * F F F ( E ) num Semantic Rules E.v := E 1.v + T.v E.v := T.v T.v := T 1.v * F.v T.v := F.v F.v := E.v F.v := num.v ( value of token) 2

Annotated ( Decorated ) Parse Tree After Executing Semantic Rules E E 1 + T E.v := E 1.v + T.v E E.v=22 T T T 1 * F E.v := T.v T.v := T 1.v * F.v E E.v=12 + T T.v=10 F T.v := F.v T T.v=12 T * F T.v=5 F.v=2 F ( E ) num F.v := E.v F.v := num.v T T.v=3 * F F.v=4 F F.v=5 2 F F.v=3 4 5 3 Input string: 3 * 4 + 5 * 2 3

Infix and Postfix Expressions Infix: 3 * 4 Postfix: 3 4 * Infix: 3 * 4 + 5 * 2 Postfix: 3 4 * 5 2 * + Let s build a translator from infix to postfix 4

Translation as an Attribute An attribute could be anything, including a translation. Let attribute.t be a character string, and be the string concatenation operator for strings Grammar E E 1 + T T T T 1 * F F F ( E ) num Semantic Rules E.t := E 1.t T.t + E.t := T.t T.t := T 1.t F.t * T.t := F.t F.t := E.t F.t := num.t 5

Annotated Parse Tree After Executing Semantic Rules E E 1 + T E.t := E 1.t T.t + E E.t= 3 4 * 5 2 * + T T T 1 * F E.t := T.t T.t := T 1.t F.t * E + E.t = 3 4 * T T.t= 5 2 * F T.t := F.t T T.t= 3 4 * T T.t= 5 * F F.t= 2 F ( E ) num F.t := E.t F.t := num.t T T.t= 3 * F F.t= 4 F F.t= 5 2 F F.t= 3 4 5 3 Input string: 3 * 4 + 5 * 2 6

Translation Schemes What we have seen so far is called a Syntax-directed definition Which is simply a specification of the translation to be performed Without necessarily addressing how this might be accomplished in any real implementation Order of evaluation of rules is not specified, for example If we can develop a plan for how to implement the translation, in the form of particular actions to be performed, and the order in which they should be performed, we will have something called a Syntaxdirected translation Scheme 7

Translation Schemes If we embed program fragments called semantic actions within the RHS s of productions, we will have a translation scheme This explicitly shows the order of evaluation of the actions (assuming we can imply the order in which the parse tree will be covered) Here is a scheme which concatenates simply by writing (appending) to a file Implied Traversal order: We must visit child nodes before we can visit an interior node..and execute the action when we visit E E + T { print + T T T * F { print * F F ( E ) num {print num 8

Translation Scheme in Action E E + T { print + E {print + T T T * F { print * E + T {print * F T {print * T * F {print 2 F ( E ) num {print num.t T * F {print 4 F {print 5 2 F {print 3 4 5 3 Input string: 3 * 4 + 5 * 2 Output string: 3 4 * 5 2 * + 9

Depth First Traversal The translation scheme depends on actions happening is a certain order. It must be bottom-up, leftto-right A guaranteed way to do this is to do a depth-first traversal of the parse tree procedure visit (node n) { for each child m of n, from left to right { visit (m); evaluate semantic rules at node n; 10

Depth First Traversal procedure visit (node n) { for each child m of n, left to right { visit (m); T E E + T T * F T * F F 2 F 4 5 3 11

Our Translation Scheme Re-Visited E E + T { print + T T T * F { print * F F ( E ) num {print num.t T E E + {print * T * {print + T {print * F {print 2 If we could develop a parsing scheme which visits the parse tree nodes in this order, we can execute our translation scheme while we parse! T * F {print 4 F {print 5 F {print 3 4 5 3 Output string: 3 4 * 5 2 * + 2 12

Parsers Formal Definitions of a Parser A parser is some system capable of constructing the derivation of any sentence in some language L(G) based on a grammar G. which talks about a derivation A parser for a grammar G is a program that reads a string w and produces either a parse tree for w (if w is a sentence of G) or an error message. which is about building a parse tree 13

Two Major Classes of Parser Top-Down Parser: a parser that builds a parse tree starting at the root (the start symbol), building downwards to the leaves (tokens) Bottom-Up Parser: a parser that starts at the leaf nodes (bottom), and builds upwards towards the root. Has to match up strings of symbols which match the RHS of productions, and reduce them, replacing them with the non-terminal on the LHS Sometimes called a shift-reduce parser 14

Top-Down Parser Construction Top-down parsers, especially hand-written ones, are easier to understand and visualize the first time around So we re going to invent top-down parsing looking at progressively more interesting grammars. 15

Parsing a Very Simple Grammar Grammar G1: S if ( true ) goto label We need a lexical analyser getoken() which identifies one token at a time and returns a token code in the form of a manifest (or enumerated) constant (to use vanilla C terminology). Something like: #define IF 256 #define TRUE 257 etc. For single-character tokens like the left parenthesis, we can use the ascii code for the character as the code. 16

G1 Parser (continued) We re describing algorithms, and we ll stick to procedural code, mostly in C Here is a useful procedure which will simplify the programming of a parser. Note the use of a lookahead token in the variable tok. int tok; /* Yes, a global variable! */ match(int t) { if (tok == t) tok = getoken(); else error( Syntax error ); 17

The Actual (G1) Parser The Grammar: S if ( true ) goto label main() { tok = getoken( ); /* read first lookahead */ S( ); printf("parse complete!"); S( ) { match(if); match( '(' ); match(true); match( ')' ); match(goto); match(label); 18

Dealing with Non-Terminal Symbols Grammar G2: S if ( E ) goto label E id > id Approach: simply write a new parsing procedure for each non-terminal symbol in the grammar, and instead of matching, if we encounter a non-terminal in some RHS we simply call the parsing procedure for that non-terminal 19

G2 Parser G2: S if ( E ) goto label E id > id Parser: main( ) { /* same as before */ S( ) { match(if); match( '(' ); E( ); match( ')' ); match(goto); match(label); E( ) { match(id); match( '>' ); match(id); 20

Multiple Productions A more realistic grammar would generate more than one sentence (!), and require that the parser actually make choices. For example: G3: S if ( E ) S E goto label id = id id > id 21

Multiple Productions If there is more than one production for some non-terminal, the parser much decide which one to apply We will use the look-ahead symbol to make this decision comparing the look-ahead with the first symbol on the LHS of each production but only works if the first symbol is a terminal! 22

G3 Parser G3: S if ( E ) S goto label id = id E id > id main( ) and e( ) have not changed from the previous version Note the recursion in S which supports nested if s This is a recursive descent parser S( ) { if (tok == IF) { match(if); match( '(' ); E( ); match( ')' ); S( ); else if (tok == GOTO) { match(goto); match(label); else if (tok == ID) { match(id); match( '=' ); match(id); else error("syntax error in rule S ); 23

But Not All Productions Start with Tokens! How about grammar G4?: S IF_ST LOOP ASSIGN IF_ST if ( E ) S LOOP while ( E ) do S do S while ( E ) ASSIGN id = id E id > id How will we decide which productions of S to choose in our parser? 24

When Productions Start With a Nonterminal S IF_ST LOOP ASSIGN IF_ST if ( E ) S LOOP while ( E ) do S do S while ( E ) ASSIGN id = id E id > id We need to know something about what strings a RHS can derive Let A α be a production Define FIRST (α) to be the set of terminals which could be the first token in strings derived from α 25

Example of FIRST( ) A fragment of G4: S LOOP LOOP while ( E ) do S do S while ( E ) What is FIRST (LOOP)? FIRST (LOOP) = { while, do (the set consisting of the token while and the token do ) So select production A α if the look ahead token is in First (α) 26

Choosing the productions of S S IF_ST LOOP IF_ST LOOP ASSIGN if ( E ) S while ( E ) do S do S while ( E ) ASSIGN id = id E id > id We need the FIRST sets for the productions of S FIRST(IF_ST)={ if FIRST(LOOP)={ while, do FIRST(ASSIGN)={ id 27

Parsing Procedure for S S( ) { if (tok == IF) IF_ST( ); else if ((tok == WHILE) (tok == DO)) LOOP( ); else if (tok == ID) ASSIGN( ); else error("syntax error in rule S"); 28

Footnotes and Gotcha's Remember: S LOOP LOOP while ( E ) do S do S while ( E ) While computing FIRST(LOOP) we may find a production for LOOP that starts with a non-terminal: (must go deeper into grammar..)..and If we have two productions A α and A β then FIRST (α) and FIRST (β) must be disjoint 29

Parse Tree Traversal in Recursive Descent Parsers Our parser did not produce a parse tree..but it could have, because it constructed a left-most derivation, which is equivalent In what order does the parser traverse the conceptual tree? Define visit to mean: We have just called match( ) to consume a leaf node, or: We have just called the parsing procedure for a non-terminal (an interior node) and that procedure has just finished matching one of its productions 30

Parse Tree Traversal S IF_ST LOOP ASSIGN IF_ST if ( E ) S LOOP while ( E ) do S do S while ( E ) ASSIGN id = id E id > id S LOOP while id ( E > ) do id S ASSIGN Let s trace: while (x > y) do y = x id = id 31

Recursive Descent is Depth-First! This was depth-first, as we defined earlier Therefore we could perform a translation scheme using such a parser We just insert the actions (code fragments) in the parsing routines, at the points which correspond to visit Let s try it! 32

First Attempt at a Translator E E + T { print + E - T { print - T T T * F { print * T / F { print / F F ( E ) num {print num Let s try to write the parsing routine for E ( ) We need to compute FIRST(E) and FIRST(T) FIRST(E) = { (, num FIRST(T) = { (, num Whoops! (Not only do we have no way to decide between the first two E productions, we can t even distinguish them from the T one.) But it gets worse: if E( ) chooses production E+T, the first thing that happens is it calls E( ) recursively! 33

Left-Recursion and Top-Down Parsers If the first thing a parsing procedure does is call itself recursively without first consuming any input It will make the same decision the next time because the "state" has not changed and call itself again and again and again Bad News: We cannot use a recursive descent parser (or any other top-down parser) for a grammar that contains left-recursive productions So we need a method to transform grammars containing left-recursion into grammars without leftrecursion 34

Elimination of Left Recursion Consider the left-recursive grammar S S α β What language does this generate? S generates all strings starting with a β and followed by any number of α s Can rewrite using right-recursion S β S S α S ϵ 35

Elimination of Left Recursion In general S S α 1 S α n β 1 β m All strings derived from S start with one of β 1,,β m and continue with several instances of any of the α 1,,α n Rewrite as S β 1 S β m S S α 1 S α n S ϵ 36

General Left Recursion The grammar S A α δ A S β is also left-recursive because + S S β α This left-recursion can also be eliminated See book, pp. 157-162 for general algorithm 37

Second Attempt at Translator E E + T { print + E - T { print - T T T * F { print * T / F { print / F F ( E ) num {print num We will treat the actions just like other grammar symbols: Transformed: E T R R + T { print + R - T { print - R ϵ T F Q Q * F { print * Q / F { print / Q ϵ i.e { foo=bar counts as one symbol F ( E ) num {print num 38

Writing a Translator (finally..) E T R R + T { print + R - T { print - R ϵ T F Q Q * F { print * Q / F { print / Q ϵ F ( E ) num {print num E( ) { T( ); R( ); R( ) { if (tok== + ) { match( + ); T( ); printf( + ); R( ); else if (tok== - ) { match( - ); T( ); printf( - ); R( ); else ; /* accept empty string */ 39

Notes Note how the semantic action has been carried into the middle of production RHS s and code inserted at the appropriate point in the parsing routine Productions for T and Q are almost identical in form: 40

The Rest of the Parser/Translator T( ) { F( ); Q( ); Q( ) { if (tok== * ) { match( * ); F( ); printf( * ); Q( ); else if (tok== / ) { match( / ); F( ); printf( / ); Q( ); else ; /* accept empty string */ F( ) { if (tok == NUM) { printf( %d ", tokenval); match(num); else if (tok == '(') { match('('); E( ); match(')'); else error("syntax error"); Question: Why did I call printf before calling match(num)? 41

We Need a Lexical Analyser We've already seen how straightforward it is to write a "lexer" for a simple grammar such as this. Tokens are: num ( ) + - * / Here s the header stuff: #include <stdio.h> #include <ctype.h> #include "globals.h" #define NONE -1 /* used for tokenval*/ #define NUM 256 #define ID 257 /* not needed yet */ #define DONE 258 int tok= NONE; int lineno = 1; int tokenval = NONE; 42

A Suitable Handwritten Lexer int getoken() { int t; while(1) { t = getchar(); if (t==' ' t=='\t') ; /*strip white space*/ else if (t == '\n') ++lineno; else if (isdigit(t)) { /* num */ tokenval = t - '0'; t = getchar(); while (isdigit(t)) { tokenval = tokenval*10 + t-'0'; t = getchar(); ungetc(t, stdin); return NUM; else if (t == EOF) return DONE; else { tokenval = NONE; return t; /* + - ( ) etc. */ We should really enumerate all the legal one-char tokens and report illegal characters, but if we don t the parser will end up reporting the error anyway... Note the use of one-character lookahead in lexer using a trick from C library 43

A Second Translator For a second example, let s generate code for a stack machine with the following instructions: push x pushcon num add subtract multiply divide push contents of word at address x push integer constant num pop 2 words, add them, push sum pop 2 words, subtract, push result pop 2 words, multiply, push product pop 2 words, divide, push quotient 44

Stack Machine Example x + 22 would translate to: push x pushcon 22 add (Let s keep it simple and assume there are only 26 variables a..z: I.e. variable names are single letters) 45

Translation Scheme for Stack Machine (Note introduction of variables) E E + T { emit add E - T { emit subtract T T T * F { emit multiply T / F { emit divide F F id { emit push ID num { emit pushcon NUM ( E ) 46

Remove Left Recursion E T R R + T { emit add R - T { emit subtract R ϵ T F Q Q * F { emit multiply Q / F { emit divide Q ϵ F id { emit push ID num { emit pushcon NUM ( E ) 47

Expand the Lexer a Little #define NONE -1 #define NUM 256 #define ID 257 #define DONE 258 int getoken() { int t; while(1) { t = getchar(); if (t==' ' t=='\t') ; /*strip white space*/ else if (t == '\n') ++lineno; else if (isdigit(t)) { tokenval = t - '0'; t = getchar(); while (isdigit(t)) { tokenval = tokenval*10 + t-'0'; t = getchar(); ungetc(t, stdin); return NUM; else if (isalpha(t)) { /* identifier */ tokenval = tolower(t); return ID; else if (t == EOF) return DONE; else { tokenval = NONE; return t; /* + - ( ) etc. */ 48

Add Code to Generate Assembly Code /* Functions to simplify printing assembly instructions and report errors */ /* Each prints a single line, hiding formatting details from parser */ void emit(char * x) { /* 0-address instruction (opcode only) */ printf("\t%s\n",x); void emit1n(char *x, int y) { /* 1-address instruction, numeric parameter */ printf("\t%s %d\n",x,y); void emit1c(char *x, char y) { /* 1-address instruction, char parameter */ printf("\t%s %c\n",(x),(y)); error(m) char *m; { fprintf(stderr,"line %d: %s\n",lineno,m); exit(1); Note that these are simple enough that they could actually be written as C "macros 49

Parser/Translator E( ) { T( ); R( ); T( ) { F( ); Q( ); R( ) { if (tok == '+') { match('+'); T( ); emit("add"); R( ); else if (tok == '-') { match('-'); T( ); emit("subtract"); R( ); else ; /* "absorb" empty string */ Q( ) { if (tok == '*') { match('*'); F( ); emit("multiply"); Q( ); else if (tok == '/') { match('/'); F( ); emit("divide"); Q( ); else ; /* "absorb" empty string */ 50

Completion of Translator, and Demo F( ) { if (tok == ID) { emit1c("push",tokenval); match(id); else if (tok == NUM) { emit1n("pushcon", tokenval); match(num); else if (tok == '(') { match('('); E( ); match(')'); else error("syntax error"); Input: (1+y)*34-z Output: pushcon 1 push y add pushcon 34 multiply push z subtract 51