Syntax-Directed Translation

Similar documents
Syntax Analysis Part IV

Using an LALR(1) Parser Generator

Syntax-Directed Translation. Introduction

Syntax Analyzer --- Parser

CSCI Compiler Design

Syntax-Directed Translation

Introduction to Yacc. General Description Input file Output files Parsing conflicts Pseudovariables Examples. Principles of Compilers - 16/03/2006

Syntax-Directed Translation Part I

LR Parsing LALR Parser Generators

Syntax-Directed Translation Part II

Context-free grammars

As we have seen, token attribute values are supplied via yylval, as in. More on Yacc s value stack

Yacc Yet Another Compiler Compiler

Syntax-Directed Translation. Concepts Introduced in Chapter 5. Syntax-Directed Definitions

Lex & Yacc. By H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Compiler Construction: Parsing

Yacc: A Syntactic Analysers Generator

Chapter 2: Syntax Directed Translation and YACC

Lex & Yacc. by H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

LR Parsing LALR Parser Generators

Principles of Programming Languages

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

A Bison Manual. You build a text file of the production (format in the next section); traditionally this file ends in.y, although bison doesn t care.

Lexical Analyzer Scanner

CSE302: Compiler Design

Lexical Analyzer Scanner

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

LECTURE 11. Semantic Analysis and Yacc

[Syntax Directed Translation] Bikash Balami

Summary: Semantic Analysis

TDDD55 - Compilers and Interpreters Lesson 3

Syntax Directed Translation

Compiler Construction

Parsing How parser works?

Syn S t yn a t x a Ana x lysi y s si 1

CS143 Handout 12 Summer 2011 July 1 st, 2011 Introduction to bison

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

Compiler Design 1. Yacc/Bison. Goutam Biswas. Lect 8

Compiler Construction

COP5621 Exam 3 - Spring 2005

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

TDDD55- Compilers and Interpreters Lesson 3

Compiler Lab. Introduction to tools Lex and Yacc

Semantic Analysis Attribute Grammars

Bottom-Up Parsing. Lecture 11-12

G53CMP: Lecture 4. Syntactic Analysis: Parser Generators. Henrik Nilsson. University of Nottingham, UK. G53CMP: Lecture 4 p.1/32

UNIT III & IV. Bottom up parsing

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing.

Bottom-Up Parsing. Lecture 11-12

Syntax-Directed Translation

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

Conflicts in LR Parsing and More LR Parsing Types

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

Syntax Directed Translation

Action Table for CSX-Lite. LALR Parser Driver. Example of LALR(1) Parsing. GoTo Table for CSX-Lite

CS 406: Syntax Directed Translation

Chapter 4. Action Routines

Introduction to Parsing Ambiguity and Syntax Errors

Compilers. Bottom-up Parsing. (original slides by Sam

Chapter 4 :: Semantic Analysis

Building a Parser Part III

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

Lecture 8: Deterministic Bottom-Up Parsing

Introduction to Parsing Ambiguity and Syntax Errors

PRACTICAL CLASS: Flex & Bison

Principles of Programming Languages

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COMP 181. Prelude. Prelude. Summary of parsing. A Hierarchy of Grammar Classes. More power? Syntax-directed translation. Analysis

COMPILER CONSTRUCTION Seminar 02 TDDB44

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Lecture 7: Deterministic Bottom-Up Parsing

Lexical and Syntax Analysis

Yacc. Generator of LALR(1) parsers. YACC = Yet Another Compiler Compiler symptom of two facts: Compiler. Compiler. Parser

An Introduction to LEX and YACC. SYSC Programming Languages

shift-reduce parsing

Hyacc comes under the GNU General Public License (Except the hyaccpar file, which comes under BSD License)

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

Parser Tools: lex and yacc-style Parsing

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

Syntactic Analysis. Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree.

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

S Y N T A X A N A L Y S I S LR

Configuration Sets for CSX- Lite. Parser Action Table

Context-free grammars (CFG s)

Concepts Introduced in Chapter 4

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

LECTURE 7. Lex and Intro to Parsing

Compiler Construction Assignment 3 Spring 2018

Syntax-Directed Translation. CS Compiler Design. SDD and SDT scheme. Example: SDD vs SDT scheme infix to postfix trans

10/18/18. Outline. Semantic Analysis. Two types of semantic rules. Syntax vs. Semantics. Static Semantics. Static Semantics.

Bottom-Up Parsing. Parser Generation. LR Parsing. Constructing LR Parser


Compilers. 5. Attributed Grammars. Laszlo Böszörmenyi Compilers Attributed Grammars - 1

Introduction to Compiler Construction

A programming language requires two major definitions A simple one pass compiler

Monday, September 13, Parsers

LR Parsing Techniques

Transcription:

Syntax-Directed Translation ALSU Textbook Chapter 5.1 5.4, 4.8, 4.9 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1

What is syntax-directed translation? Definition: The compilation process is driven by the syntax. The semantic routines perform interpretation based on the syntax structure. Attaching attributes to the grammar symbols. Values for attributes are computed by semantic actions associated with the grammar productions. Compiler notes #4, 20070517, Tsan-sheng Hsu 2

Example: Syntax-directed translation Example in a parse tree: Annotate the parse tree by attaching semantic attributes to the nodes of the parse tree. Generate code by visiting nodes in the parse tree in a given order. Input: y := 3 x + z := := id const * + id id id (y) + * const id (3) (x) id (z) parse tree annotated parse tree Compiler notes #4, 20070517, Tsan-sheng Hsu 3

Syntax-directed definitions Each grammar symbol is associated with a set of attributes. Synthesized attribute : value computed from its children or associated with the meaning of the tokens. Inherited attribute : value computed from parent and/or siblings. General attribute : value can be depended on the attributes of any nodes. Compiler notes #4, 20070517, Tsan-sheng Hsu 4

Format for writing syntax-directed definitions Production L E E E 1 + T E T T T 1 F T F F (E) F digit Semantic actions print(e.val) E.val := E 1.val + T.val E.val := T.val T.val := T 1.val F.val T.val := F.val F.val := E.val F.val := digit.lexval E.val is one of the attributes of E. To avoid confusion, recursively defined nonterminals are numbered on the RHS. Semantic actions are performed when this production is used. Compiler notes #4, 20070517, Tsan-sheng Hsu 5

Order of evaluation (1/2) Order of evaluating attributes is important. General rule for ordering: Dependency graph : If attribute b needs attributes a and c, then a and c must be evaluated before b. Represented as a directed graph without cycles. Topologically order nodes in the dependency graph as n 1, n 2,..., n k such that there is no path from n i to n j with i > j. := := id (y) + * const id (3) (x) id (z) id (y) + * const id (3) (x) id (z) Compiler notes #4, 20070517, Tsan-sheng Hsu 6

Order of evaluation (2/2) It is always possible to rewrite syntax-directed definitions using only synthesized attributes, but the one with inherited attributes is easier to understand. Use inherited attributes to keep track of the type of a list of variable declarations. Example: int i, j Grammar 1: using inherited attributes D T L T int char L L, id id Grammar 2: using only synthesized attributes D L id L L id, T T int char D D T L L j int L, j L i, i T int Compiler notes #4, 20070517, Tsan-sheng Hsu 7

Attribute grammars Attribute grammar: a grammar with syntax-directed definitions and having no side effects. Side effect: change values of others not related to the return values of functions themselves. Tradeoffs: Synthesized attributes are easy to compute, but are sometimes difficult to be used to express semantics. Inherited and general attributes are difficult to compute, but are sometimes easy to express the semantics. The dependence graph for computing some inherited and general attributes may contain cycles and thus not be computable. A restricted form of inherited attributes is invented. L-attributes. Compiler notes #4, 20070517, Tsan-sheng Hsu 8

S-attributed definition Definition: a syntax-directed definition that uses synthesized attributed only. A parse tree can be represented using a directed graph. A post-order traverse of the parse tree can properly evaluate grammars with S-attributed definitions. Goes naturally with LR parsers. Example of an S-attributed definition: 3 5 + 4 return 15 L 13 14 E.val = 19 return 8 9 12 E.val = 15 + T.val = 4 7 T.val = 15 11 3 6 F.val = 4 4 T.val = 3 * F.val = 5 10 2 digit.lexval = 4 5 F.val = 3 digit.lexval = 5 1 digit.lexval = 3 Compiler notes #4, 20070517, Tsan-sheng Hsu 9

Definitions of L-attributed definitions Each grammar symbol can have many attributes. each attribute must be either a synthesized attribute, or However, an inherited attribute with the following constraints. Assume there is a production A X 1 X 2 X n and the inherited attribute is associated with X i. Then this inherited attribute depends only on the inherited attributes of its parent node A; either inherited or synthesized attributes from its elder siblings X 1, X 2,..., X i 1 ; inherited or synthesized attributed associated from itself X i, but only in such a way that there are no cycles in a dependency graph formed by the attributes of this X i. Every S-attributed definition is an L-attributed definition. Compiler notes #4, 20070517, Tsan-sheng Hsu 10

Evaluations of L-attributed definitions For grammars with L-attributed definitions, special evaluation algorithms must be designed. L-attributes are always computable. Similar arguments as the one used in discussing Algorithm 4.19 for removing left recursion. Evaluation of L-attributed grammars. Goes together naturally with LL parsers. Parse tree generate by recursive descent parsing corresponds naturally to a top-down tree traversal using DFS by visiting the sibling nodes from left to right. High level ideas for tree traversal. Visit a node v first. Compute inherited attributes for v if they do not depend on synthesized attributes of v. Recursively visit each children of v one by one from left to right. Visit the node v again. Compute synthesized attributes for v. Compute inherited attributes for v if they depend on synthesized attributes of v. Compiler notes #4, 20070517, Tsan-sheng Hsu 11

Example: L-attributed definitions D T {L.in := T.type} L T int {T.type := integer} T real {T.type := real} L {L 1.in := L.in} L 1, id {addtype(id.entry, L.in)} L id {addtype(id.entry, L.in)} Parsing and dependency graph: STACK input production used int p, q, r D int p, q, r L T int p, q, r D T L type L int int p, q, r T int 2,5 T L p, q, r id, L p, q, r L L, id integer 3,4 int id, id, L p, q, r L L, id 8,11 id, id, id p, q, r L id in L id, id q, r id q 9,10 p 1,22 D 6,21 in L 19,20 in L 7,16, 17,18 r, q 14,15 12,13 Compiler notes #4, 20070517, Tsan-sheng Hsu 12

Problems with L-attributed definitions Comparisons: L-attributed definitions go naturally with LL parsers. S-attributed definitions go naturally with LR parsers. L-attributed definitions are more flexible than S-attributed definitions. LR parsers are more powerful than LL parsers. Some cases of L-attributed definitions cannot be in-cooperated into LR parsers. Assume the next handle to take care is A X 1 X 2 X i X k, and X 1,..., X i is already on the top of the STACK. Attribute values of X 1,..., X i 1 can be found on the STACK at this moment. No information about A can be found anywhere at this moment. Thus the attribute values of X i cannot be depended on the value of A. L -attributed definitions: Same as L-attributed definitions, but do not depend on the inherited attributes of parent nodes, or any attributes associated with itself. Can be handled by LR parsers. Compiler notes #4, 20070517, Tsan-sheng Hsu 13

Using ambiguous grammars ambiguous grammars unambiguous grammars LR(1) Ambiguous grammars often provide a shorter, more natural specification than their equivalent unambiguous grammars. Sometimes need ambiguous grammars to specify important language constructs. Example: declare a variable before its usage. var xyz : integer begin... xyz := 3;... Use symbol tables to create side effects. Compiler notes #4, 20070517, Tsan-sheng Hsu 14

Ambiguity from precedence and associativity Precedence and associativity are important language constructs. Example: G 1 : E E + E E E (E) id Ambiguous, but easy to understand and maintain! G 2 : E E + T T T T F F F (E) id Unambiguous, but difficult to understand and maintain! Input: 1+2*3 E E + E E E * E E E + T 1 E * E 2 3 Parse tree#1: G 1 Parse tree#2: G 1 E 1 + E 2 3 T T * F F F 3 1 2 Parse tree: G 2 Compiler notes #4, 20070517, Tsan-sheng Hsu 15

Deal with precedence and associativity When parsing the following input for G 1 : id + id id. Assume the input parsed so far is id + id. We now see *. We can either shift or perform reduce by E E + E. When there is a conflict, say in LALR(1) parsing, we use precedence and associativity information to resolve conflicts. Here we need to shift because of seeing a higher precedence operator. Need a mechanism to let user specify what to do when a conflict is seen based on the viable prefix on the STACK so far and the token currently encountered. Compiler notes #4, 20070517, Tsan-sheng Hsu 16

Ambiguity from dangling-else Grammar: Statement Other Statement if Condition then Statement if Condition then Statement else Statement When seeing if C then S else S there is a shift/reduce conflict, we always favor a shift. Intuition: favor a longer match. Need a mechanism to let user specify the default conflicthandling rule when there is a shift/reduce conflict. Compiler notes #4, 20070517, Tsan-sheng Hsu 17

Special cases Ambiguity from special-case productions: Sometime a very rare happened special case causes ambiguity. It is too costly to revise the grammar. We can resolve the conflicts by using special rules. Example: E E sub E sup E E E sub E E E sup E E {E} character Meanings: W sub U: W U. W sup U: W U. W sub U sup V is W V U, not W U V. Resolve by semantic and special rules. Pick the right one when there is a reduce/reduce conflict. Reduce the production listed earlier. Need a mechanism to let user specify the default conflict-handling rule when there is a reduce/reduce conflict. Compiler notes #4, 20070517, Tsan-sheng Hsu 18

Implementation Passing of synthesized attributes is best. Without using global variables. Cannot use information from its younger siblings because of the limitation of LR parsing. During parsing, the STACK contains information about the elder siblings. It is difficult and usually impossible to pass information from its parent node. May be possible to use the state information to pass some information. Some possible choices: Build a parse tree first, then evaluate its semantics. Parse and evaluate the semantic actions on the fly. YACC, an LALR(1) parser generator, can be used to implement L -attributed definitions. Use top of STACK information to pass synthesized attributes. Use global variables and internal STACK information to pass the inherited values from its elder siblings. Cannot process inherited values from its parent. Compiler notes #4, 20070517, Tsan-sheng Hsu 19

YACC Yet Another Compiler Compiler [Johnson 1975]: A UNIX utility for generating LALR(1) parsing tables. Convert your YACC code into C programs. file.y yacc file.y y.tab.c y.tab.c cc y.tab.c -ly -ll a.out Format: declarations %% grammars and semantic actions. %% supporting C-routines. Libraries: Assume the lexical analyzer routine is yylex(). Need to include the scanner routines. There is a parser routine yyparse() generated in y.tab.c. Default main routines both in LEX and YACC libraries. Need to search YACC library first. Compiler notes #4, 20070517, Tsan-sheng Hsu 20

YACC code example (1/2) %{ #include <stdio.h> #include <ctype.h> #include <math.h> #define YYSTYPE int /* integer type for YACC stack */ %} %token NUMBER ERROR ( ) %left + - %left * / %right UMINUS %% Compiler notes #4, 20070517, Tsan-sheng Hsu 21

YACC code example (2/2) lines : lines expr \n {printf("%d\n", $2);} lines \n /* empty, i.e., epsilon */ lines error \n {yyerror("please reenter:");yyerrok;} ; expr : expr + expr { $$ = $1 + $3; } expr - expr { $$ = $1 - $3; } expr * expr { $$ = $1 * $3; } expr / expr { $$ = $1 / $3; } ( expr ) { $$ = $2; } - expr %prec UMINUS { $$ = - $2; } NUMBER { $$ = atoi(yytext);} ; %% #include "lex.yy.c" Compiler notes #4, 20070517, Tsan-sheng Hsu 22

Included LEX program %{ %} Digit [0-9] IntLit {Digit}+ %% [ \t] {/* skip white spaces */} [\n] {return( \n );} {IntLit} {return(number);} "+" {return( + );} "-" {return( - );} "*" {return( * );} "/" {return( / );} "(" {return( ( );} ")" {return( ) );}. {printf("error token <%s>\n",yytext); return(error);} %% Compiler notes #4, 20070517, Tsan-sheng Hsu 23

YACC: Declarations System used and C language declarations. %{ %} to enclose C declarations. Type of attributes associated with each grammar symbol on the STACK: YYSTYPE declaration. This area will not be translated by YACC. Tokens with associativity and precedence assignments. In increasing precedence from top to the bottom. %left, %right or %token (non-associativity): e.g., dot products of vectors has no associativity. Other declarations. %type %union Compiler notes #4, 20070517, Tsan-sheng Hsu 24

YACC: Productions and semantic actions Format: for productions P with a common LHS <common LHS of P>: <RHS 1 of P> { semantic actions # 1} <RHS 2 of P> { semantic actions # 2} The semantic actions are performed, i.e., C routines are executed, when this production is reduced. Special symbols and usages. Accessing attributes associated with grammar symbols: $$: the return value of this production if it is reduced. $i: the returned value of the ith symbol in the RHS of the production. %prec declaration. When there are ambiguities: reduce/reduce conflict: favor the one listed first. shift/reduce conflict: favor shift, i.e., longer match. Q: How to implement this? Compiler notes #4, 20070517, Tsan-sheng Hsu 25

YACC: Error handling Example: lines: error \n {...} When there is an error, skip until newline is seen. error: special nonterminal. A production with error is inserted or processed only when it is in the reject state. It matches any sequence on the STACK as if the handle error is seen. Use a special token to immediately follow error for the purpose of skipping until something special is seen. Q: How to implement this? Use error to implement statement terminators in language designs. The token after error is a synchronizing token for panic mode recovery. Difficult to implement statement separators using error. yyerrok: a macro to reset error flags and make error invisible again. yyerror(string): pre-defined routine for printing error messages. Compiler notes #4, 20070517, Tsan-sheng Hsu 26

In-production actions Actions can be inserted in the middle of a production, each such action is treated as a nonterminal. Example: expr : expr {actions} + expr {$$ = $1 + $4; } is translated into expr : expr $ACT + expr {$$ = $1 + $4;} $ACT : {actions} Split a production into two. Create a nonterminal $ACT and an ɛ-production. Avoid in-production actions. An ɛ-production, e.g., A ɛ, can easily generate conflicts. A reduce by A for states including this item. Split the production yourself. May generate some conflicts. May be difficult to specify precedence and associativity. May change the parse tree and thus the semantic. expr : exprhead exptail {$$ = $1 + $2;} exphead : expr { perform some semantic actions; $$ = $1;} exptail : + expr {$$ = $2;} Compiler notes #4, 20070517, Tsan-sheng Hsu 27

Some useful YACC programming styles Keep the RHS of a production short, but not too short. Better to have 3 to 4 symbols. Language issues. Avoiding using names starting with $. YACC auto-generated variable names. Watch out C-language rules. goto Some C-language reserved words are used by YACC. union Some YACC pre-defined routines are macros, not procedures. yyerrok Rewrite the productions with L-attributed definitions to productions with S-attributed definitions. Grammar 1: Array id [ Elist ] Grammar 2: Array Aelist ] Aelist Aelist, id Ahead Ahead id [ id Compiler notes #4, 20070517, Tsan-sheng Hsu 28

Limitations of syntax-directed translation Limitation of syntax-directed definitions: Without using global data to create side effects, some of the semantic actions cannot be performed. Examples: Checking whether a variable is defined before its usage. Checking the type and storage address of a variable. Checking whether a variable is used or not. Need to use a symbol table : global data to create controlled side effects of semantic actions. Common approaches in using global variables: A program with too many global variables is difficult to understand and maintain. Restrict the usage of global variables to essential ones and use them as objects. Symbol table. Labels for GOTO s. Forwarded declarations. Tradeoff between ease of coding and ease of maintaining. Compiler notes #4, 20070517, Tsan-sheng Hsu 29