Syntax Directed Translation

Similar documents
CS 406: Syntax Directed Translation

Syntax-Directed Translation. Introduction

Syntax-Directed Translation

CS143 Handout 14 Summer 2011 July 6 th, LALR Parsing

Syntax-Directed Translation Part II

Intermediate Representation

CS143 Handout 12 Summer 2011 July 1 st, 2011 Introduction to bison

[Syntax Directed Translation] Bikash Balami

Lexical and Syntax Analysis

COP4020 Programming Languages. Semantics Prof. Robert van Engelen

Abstract Syntax Trees Synthetic and Inherited Attributes

Syntax Analysis Part IV

Syntax-Directed Translation

COP4020 Programming Languages. Semantics Robert van Engelen & Chris Lacher

Compiler Principle and Technology. Prof. Dongming LU April 15th, 2019

Syntax-Directed Translation Part I

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Earlier edition Dragon book has been revised. Course Outline Contact Room 124, tel , rvvliet(at)liacs(dot)nl

Syntax-Directed Translation. Lecture 14

Compiler Lab. Introduction to tools Lex and Yacc

Chapter 2: Syntax Directed Translation and YACC

Syntax Directed Translation

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

LECTURE 11. Semantic Analysis and Yacc

Chapter 4 :: Semantic Analysis

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

Syntax-Directed Translation. Concepts Introduced in Chapter 5. Syntax-Directed Definitions

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Principles of Programming Languages

Semantic Analysis. Role of Semantic Analysis

CSE302: Compiler Design

Chapter 4. Action Routines

10/18/18. Outline. Semantic Analysis. Two types of semantic rules. Syntax vs. Semantics. Static Semantics. Static Semantics.

Compiler Construction Assignment 3 Spring 2018

Yacc: A Syntactic Analysers Generator

CS143 Handout 04 Summer 2011 June 22, 2011 flex In A Nutshell

COP5621 Exam 3 - Spring 2005

Compilers. 5. Attributed Grammars. Laszlo Böszörmenyi Compilers Attributed Grammars - 1

A programming language requires two major definitions A simple one pass compiler

We now allow any grammar symbol X to have attributes. The attribute a of symbol X is denoted X.a

Using an LALR(1) Parser Generator

COP4020 Spring 2011 Midterm Exam

Lecture 14 Sections Mon, Mar 2, 2009

Syntax Analysis Part VIII

Syntax Directed Translation

Abstract Syntax Trees & Top-Down Parsing

A Bison Manual. You build a text file of the production (format in the next section); traditionally this file ends in.y, although bison doesn t care.

Summary: Semantic Analysis

Semantic analysis and intermediate representations. Which methods / formalisms are used in the various phases during the analysis?

Introduction to Yacc. General Description Input file Output files Parsing conflicts Pseudovariables Examples. Principles of Compilers - 16/03/2006

CS143 Handout 08 Summer 2009 June 30, 2009 Top-Down Parsing

Syntax-Directed Translation

Parsing and Pattern Recognition

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

Lexical and Syntax Analysis. Top-Down Parsing

Static and Dynamic Semantics

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Programming Project II

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process.

Syntax-Directed Translation. CS Compiler Design. SDD and SDT scheme. Example: SDD vs SDT scheme infix to postfix trans

Syntax-Directed Translation

4. Semantic Processing and Attributed Grammars

CS 6353 Compiler Construction Project Assignments

Lexical and Parser Tools

Error Handling Syntax-Directed Translation Recursive Descent Parsing

A Simple Syntax-Directed Translator

As we have seen, token attribute values are supplied via yylval, as in. More on Yacc s value stack

CS 6353 Compiler Construction Project Assignments

Compiler Design 1. Yacc/Bison. Goutam Biswas. Lect 8

COP4020 Programming Assignment 2 Spring 2011

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Lexical and Syntax Analysis

Single-pass Static Semantic Check for Efficient Translation in YAPL

More On Syntax Directed Translation

COP4020 Programming Assignment 2 - Fall 2016

Evaluation of Semantic Actions in Predictive Non- Recursive Parsing

LECTURE 3. Compiler Phases

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 97 INSTRUCTIONS

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

CS143 Handout 14 Autumn 2007 October 24, 2007 Semantic Analysis

Last Time. What do we want? When do we want it? An AST. Now!

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

CSCI312 Principles of Programming Languages!

Programming Languages

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

LR Parsing LALR Parser Generators

Chapter 4. Lexical and Syntax Analysis

Recursive Descent Parsers

CS415 Compilers Context-Sensitive Analysis Type checking Symbol tables

The role of semantic analysis in a compiler

Semantic Analysis computes additional information related to the meaning of the program once the syntactic structure is known.

10/26/17. Attribute Evaluation Order. Attribute Grammar for CE LL(1) CFG. Attribute Grammar for Constant Expressions based on LL(1) CFG

Context-sensitive Analysis

Lex & Yacc (GNU distribution - flex & bison) Jeonghwan Park

Semantic analysis. Syntax tree is decorated with typing- and other context dependent

COMP 181. Prelude. Prelude. Summary of parsing. A Hierarchy of Grammar Classes. More power? Syntax-directed translation. Analysis

UNIT-3. (if we were doing an infix to postfix translator) Figure: conceptual view of syntax directed translation.

Applications of Context-Free Grammars (CFG)

Left to right design 1

Transcription:

CS143 Handout 16 Summer 2011 July 6 th, 2011 Syntax Directed Translation Handout written by Maggie Johnson and revised by Julie Zelenski. Syntax-directed translation refers to a method of compiler implementation where the source language translation is completely driven by the parser. In other words, the parsing process and parse trees are used to direct semantic analysis and the translation of the source program. This can be a separate phase of a compiler or we can augment our conventional grammar with information to control the semantic analysis and translation. Such grammars are called attribute grammars. We augment a grammar by associating attributes with each grammar symbol that describes its properties. An attribute has a name and an associated value: a string, a number, a type, a memory location, an assigned register whatever information we need. For example, variables may have an attribute "type" (which records the declared type of a variable, useful later in type-checking) or an integer constant may have an attribute "value" (which we will later need to generate code). With each production in a grammar, we give semantic rules or actions, which describe how to compute the attribute values associated with each grammar symbol in a production. The attribute value for a parse node may depend on information from its children nodes below or its siblings and parent node above. Consider this production, augmented with a set of actions that use the "value" attribute for a digit node to store the appropriate numeric value. Below, we use the syntax X.a to refer to the attribute a associated with symbol X. digit > 0 digit.value = 0 1 digit.value = 1 2 digit.value = 2... 9 digit.value = 9 Attributes may be passed up a parse tree to be used by other productions: int1 > digit int1.value = digit.value int2 digit int1.value = int2.value*10 + digit.value (We are using subscripts in this example to clarify which attribute we are referring to, so int1 and int2 are different instances of the same non-terminal symbol.) There are two types of attributes we might encounter: synthesized or inherited. Synthesized attributes are those attributes that are passed up a parse tree, i.e., the left-

2 side attribute is computed from the right-side attributes. The lexical analyzer usually supplies the attributes of terminals and the synthesized ones are built up for the nonterminals and passed up the tree. int value = 4*10 + 2 = 42 X > Y 1 Y 2...Y n X.a = f(y1.a, Y2.a,...Yn.a) int digit value = 4 value = 2 digit value = 4 2 4 Inherited attributes are those that are passed down a parse tree, i.e., the right-side attributes are derived from the left-side attributes (or other right-side attributes). These attributes are used for passing information about the context to nodes further down the tree. X > Y 1 Y 2...Y n Yk.a = f(x.a, Y1.a, Y2.a,..., Yk-1.a, Yk+1.a,..., Yn.a) Consider the following grammar that defines declarations and simple expressions in a Pascal-like syntax: P > DS D > var V; D ε S > V := E; S ε V > x y z Now we add two attributes to this grammar, name and dl, for the name of a variable and the list of declarations. Each time a new variable is declared, a synthesized attribute for its name is attached to it. That name is added to a list of variables declared so far in the synthesized attribute dl that is created from the declaration block. The list of variables is then passed as an inherited attribute to the statements following the declarations for use in checking that variables are declared before use.

3 P > DS S.dl = D.dl D 1 > var V; D 2 D 1.dl = addlist(v.name, D 2.dl) ε D 1.dl = NULL S 1 > V := E; S 2 check(v.name, S 1.dl); S 2.dl = S 1.dl ε V > x V.name = 'x' y V.name = 'y' z V.name = 'z' If we were to parse the following code, what would the attribute structure look like? var x; var y; x :=...; y :=...; P D S var V ; D V := E ; S x var V ; D y ε x... ε Typically the way we handle attributes is to associate with each symbol some sort of structure with all the necessary attributes, and we can pick out the attribute of interest by fieldname. struct attribute char *name; attribute *list; ; P -> DS $2.list = $1.list D -> var V; D $$.list = add_to_list($2.name, $4.list) ε $$.list = NULL S -> V := E; S check($1.name, $$.list); $5.list = $$.list ε V -> x $$.name = 'x' y $$.name = 'y' z $$.name = 'z'

4 Top-Down SDT We can implement syntax-directed translation in either a top-down or a bottom-up parser and we'll briefly investigate each approach. First, let's look at adding attribute information to a hand-constructed top-down recursive-descent parser. Our example will be a very simple FTP client, where the parser accepts user commands and uses a syntax-directed translation to act upon those requests. Here's in the grammar we'll use, already in an LL(1) form: Session > CommandList T_QUIT CommandList > Command CommandList ε Command > Login Get Logout Login > User Pass User > T_USER T_IDENT Pass > T_PASS T_IDENT Get > T_GET T_IDENT MoreFiles MoreFiles > T_IDENT MoreFiles ε Logout > T_LOGOUT Now, let s see how the attributes, such as the username, filename, and connection, can be passed around during the parsing. This recursive-descent parser is using the lookahead/token-matching utility functions from the top-down parsing handout. static void ParseCommands() Connection *conn = NULL; int next; while ((next = GetLookahead())!= T_QUIT) switch (next) case T_USER: conn = ParseLogin(); break; case T_GET: ParseGet(conn); break; case T_LOGOUT: ParseLogout(conn); conn = NULL; break; default: ReportParseFailure("command expected", yytext); static Connection *ParseLogin() // returns attribute of opened connection char *user = ParseUser(); char *pass = ParsePassword(); return Login(user, pass); // uses attributes passed up from below static char *ParseUser() // returns attribute of username given MatchToken(T_USER); char *str = ParseIdentifier(); // gets name from identfier child node MatchToken('\n'); return str; static char *ParsePassword() // similar to username

5 MatchToken(T_PASS); char *str = ParseIdentifier(); MatchToken('\n'); return str; static char *ParseIdentifier() char *str = NULL; if (GetLookahead() == T_IDENT) str = strdup(yytext); MatchToken(T_IDENT); return str; // returns NULL on error static void ParseGet(Connection *conn) // receives conn from above MatchToken(T_GET); char *filename = ParseIdentifier(); MatchToken('\n'); Transfer(conn, filename);// reports error if conn NULL static void ParseLogout(Connection *conn) // receives conn from above MatchToken(T_LOGOUT); MatchToken('\n'); Logout(conn); // reports error if conn NULL During processing of the Login command, the parser gathers the username and password returned from the children nodes and uses that information to create a new connection attribute to pass up the tree. In this situation the username, password, and connection are all acting as synthesized attributes, working their way from the leaves upward to the parent. That open connection is saved and then later passed downward into other commands. The connection is being used as an inherited attributed when processing Get and Logout, those commands are receiving information from the parent about parts of the parse that have already happened (its left siblings).

6 Bottom-Up SDT Here is a simple expression grammar that has associativity and precedence already built in. E' > E E > T E A T T > F T M F F > (E) int A > + - M > * / During the bottom-up parse, as we push symbols on to the parse stack, we will associate with each operand/expression symbol (E, T, F, etc.) an integer value. For each operator (A, M) we will store the operator code. When performing a reduction, we will synthesize the attribute for the left-side nonterminal from the attributes of the right side symbol, the handle that is currently on top of the stack. See how the associated actions below will evaluate the arithmetic expression during parsing? E' -> E printf("%d\n", E.val); E -> T E.val = T.val; E A T switch(a.op) case ADD: E.val = E.val + T.val; break; case SUB: E.val = E.val - T.val; break; T -> F T.val = F.val; T M F switch(m.op) case MUL: T.val = T.val * F.val; break; case DIV: T.val = T.val / F.val; break; F -> (E) F.val = E.val; int F.val = int.val; A -> + A.op = ADD; - A.op = SUB; M -> * M.op = MUL; / M.op = DIV; The attribute value of terminals is assumed to have been assigned by the scanner. The attribute values for nonterminals are explicitly assigned in the parser action code. The symbol attributes can usually be stored along with the symbol itself in the bottom-up parse stack, which is mighty convenient. However, given the way a bottom-up parser constructs the leaves first and works its way up to the parent, it is trivial to support synthesized attributes (like those needed in the expression parser), but more awkward to allow for inherited attributes.

7 Attributes and bison It doesn t take much to transform the above abstract example into legitimate bison code. $1, $2, etc. are all used to access the attribute of the nth token on the right side of the production. The global variable yylval is set by the scanner and that value is saved with the token when placed on the parse stack. When a rule is reduced, a new state is placed on the stack, the default behavior is to just copy the attribute of $1 for that new state, this can be controlled by assigning to $$ in the action for the rule. By default the attribute type is an integer, but can be extended to allow for more storage of diverse types using the %union specification in the bison input file. Here's a simple infix calculator that shows all these techniques in conjunction. It parses ordinary arithmetic expressions, uses a symbol table for setting and retrieving values of simple variables, and even has some primitive error-handling. % static int Lookup(const char *name); static void Store(const char *name, int val); % %union int intval; char name[32]; %token <intval> T_Int %token <name> T_Identifier %type <intval> E %left '+' '-' %left '*' '/' %right U_minus %% S : Stmt S Stmt ; Stmt : T_Identifier '=' E '\n' Store($1, $3); printf("%d\n", $3); E '\n' printf("%d\n", $1); error '\n' printf("discarding malformed expression.\n"); ; E : E '+' E $$ = $1 + $3; E '-' E $$ = $1 - $3; E '*' E $$ = $1 * $3; E '/' E if ($3 == 0) printf("divide by zero\n"); $$ = 0; else $$ = $1 / $3; '(' E ')'

8 %% $$ = $2; '-' E %prec U_minus $$ = -$2; T_Int $$ = $1; T_Identifier $$ = Lookup($1); ; /* Store/Lookup functions omitted for clarify They are just ordinary hash table operations */ Here is the scanner to go with: % #include "y.tab.h" % %% [0-9]+ yylval.intval = atoi(yytext); return T_Int; [a-za-z]+ strncpy(yylval.name, yytext, 32); return T_Identifier; [-+*/()\n=] return yytext[0]; [ \t]* /* ignore whitespace */ Check out the online manual for more details about these and any other features of yacc/bison. Bibliography A. Aho, R. Sethi, J. Ullman, Compilers: Principles, Techniques, and Tools. Reading, MA: Addison-Wesley, 1986. J.P. Bennett, Introduction to Compiling Techniques. Berkshire, England: McGraw-Hill, 1990. A. Pyster, Compiler Design and Construction. New York, NY: Van Nostrand Reinhold, 1988.