Semantic actions for declarations and expressions

Similar documents
Semantic actions for declarations and expressions. Monday, September 28, 15

Semantic actions for declarations and expressions

Semantic actions for expressions

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking

CMPT 379 Compilers. Parse trees

Last time. What are compilers? Phases of a compiler. Scanner. Parser. Semantic Routines. Optimizer. Code Generation. Sunday, August 29, 2010

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Syntax-Directed Translation. Lecture 14

Intermediate Code Generation

Programming Languages Third Edition. Chapter 7 Basic Semantics

Syntax Errors; Static Semantics

Using an LALR(1) Parser Generator

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis?

CSE 12 Abstract Syntax Trees

The Compiler So Far. Lexical analysis Detects inputs with illegal tokens. Overview of Semantic Analysis

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Project Compiler. CS031 TA Help Session November 28, 2011

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

The role of semantic analysis in a compiler

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Syntactic Directed Translation

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Compilers. Compiler Construction Tutorial The Front-end

Syntax-Directed Translation. Introduction

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

Compiler Theory. (Semantic Analysis and Run-Time Environments)

3.5 Practical Issues PRACTICAL ISSUES Error Recovery

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

Compilers Project 3: Semantic Analyzer

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Lecture 7: Type Systems and Symbol Tables. CS 540 George Mason University

Time : 1 Hour Max Marks : 30

Yacc: A Syntactic Analysers Generator

CS 6353 Compiler Construction Project Assignments

COMPILER CONSTRUCTION Seminar 02 TDDB44

CSC 467 Lecture 13-14: Semantic Analysis

Building Compilers with Phoenix

Chapter 4 :: Semantic Analysis

// the current object. functioninvocation expression. identifier (expressionlist ) // call of an inner function

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

COMPILER CONSTRUCTION Seminar 03 TDDB

Compiler construction in4020 lecture 5

Chapter 2 A Quick Tour

Undergraduate Compilers in a Day

Lexical and Syntax Analysis

1 Lexical Considerations

The Structure of a Syntax-Directed Compiler

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam

Chapter 20: Binary Trees

Semantic Analysis. Lecture 9. February 7, 2018

Static Semantics. Winter /3/ Hal Perkins & UW CSE I-1

Alternatives for semantic processing

Semantic Analysis. Compiler Architecture

Wednesday, August 31, Parsers

Intermediate Representations

QUIZ. What is wrong with this code that uses default arguments?

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Summary: Direct Code Generation

CSc 453. Compilers and Systems Software. 13 : Intermediate Code I. Department of Computer Science University of Arizona

CSE P 501 Compilers. Static Semantics Hal Perkins Winter /22/ Hal Perkins & UW CSE I-1

Error Detection in LALR Parsers. LALR is More Powerful. { b + c = a; } Eof. Expr Expr + id Expr id we can first match an id:

CS 432 Fall Mike Lam, Professor. Code Generation

Semantic analysis and intermediate representations. Which methods / formalisms are used in the various phases during the analysis?

Compiler construction 2002 week 5

Static Checking and Intermediate Code Generation Pat Morin COMP 3002

UNIT-4 (COMPILER DESIGN)

Working of the Compilers

Context-free grammars

Syntax-Directed Translation

Symbol Tables. ASU Textbook Chapter 7.6, 6.5 and 6.3. Tsan-sheng Hsu.

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

CSE 413 Languages & Implementation. Hal Perkins Winter 2019 Structs, Implementing Languages (credits: Dan Grossman, CSE 341)

Chapter 4. Lexical and Syntax Analysis

cmps104a 2002q4 Assignment 3 LALR(1) Parser page 1

Chapter 4 - Semantic Analysis. June 2, 2015

Principles of Compiler Design

CMSC 330: Organization of Programming Languages

Today's Topics. CISC 458 Winter J.R. Cordy

GBIL: Generic Binary Instrumentation Language. Language Reference Manual. By: Andrew Calvano. COMS W4115 Fall 2015 CVN

CSE302: Compiler Design

CS5363 Final Review. cs5363 1

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Compiler Construction: Parsing

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Assignment 11: functions, calling conventions, and the stack

Static Semantics. Lecture 15. (Notes by P. N. Hilfinger and R. Bodik) 2/29/08 Prof. Hilfinger, CS164 Lecture 15 1

Intermediate Representations

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Context-free grammars (CFG s)

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Lecture 16: Static Semantics Overview 1

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

CS 536 Midterm Exam Spring 2013

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Compiler construction 2005 lecture 5

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11

Transcription:

Semantic actions for declarations and expressions Semantic actions Semantic actions are routines called as productions (or parts of productions) are recognized Actions work together to build up intermediate representations Conceptually think of this as follows: Every non-terminal should have some information associated with it (code, declared variables, etc.) Each child of a non-terminal can pass the information it has to its parent non-terminal, which uses the information from its children to build up more information We call these semantic records Semantic Records Data structures produced by semantic actions Associated with both non-terminals (code structures) and terminals (tokens/symbols) Standard organization: semantic stack When you process a terminal (leaf in the parse tree), push its semantic record onto the stack When you process a non-terminal: Pop all the records associated with its children Generate a record for the non-terminal Push that record onto the stack How do we manipulate stack? Action-controlled: actions directly manipulate stack (call push and pop) Parser-controlled: parser automatically manipulates stack LR-parser controlled Shift operations push semantic records onto stack (describing the token) Reduce operations pop semantic records associated with symbols off stack, replace with semantic record associated with production Action routines do not see stack. Can refer to popped off records using handles e.g., in yacc/bison, use $1, $2 etc. to refer to popped off records Example of semantic actions Consider following grammar: assign! ident := expr expr! term addop term term! ident LIT ident! ID addop! +

Example of semantic actions In Bison (note that lexer returns values for each token through yylval) Example of semantic stack Consider a := b + 1; assign! ident := expr {$$ = generateassign($1, $3);} expr! term addop term {$$ = generateexpr($1, $2, $3);} term! ident {$$ = generateterm($1);}!!! LIT {$$ = generateterm($1);} ident! ID {$$ = $1;} addop! + {$$ = ADD_OP;} {$$ = SUB_OP;} LL-controlled Even though LL parsers are not bottom up, semantic stack operates in basically the same way LL parsers take semantic actions as they encounter them while matching a production Add semantic action for a non-terminal at the end of the production Action for whole production gets processed after all of the intermediate parts of the production get processed In practice, this looks just like the LR-controlled stack Example of semantic actions In ANTLR: assign returns [Code c]!! ident := expr {$c = generatecode($ident.name, $expr.c);} expr returns [Code c]! t1=term addop t2=term {!! $c = generatecode($t1.t, $t2.t, $addop.optype);!! } term returns [Term t]!! ident {$t = generateterm($ident.s);}! LIT {$t = generateterm($lit.text);} ident returns [String s] ID {$s = $ID.text;} addop returns [OpType optype]! + {$optype = ADD_OP;} {$optype = SUB_OP;} Overview of declarations Symbol tables Action routines for simple declarations Action routines for advanced features Constants Enumerations Arrays Structs Pointers Symbol Tables Table of declarations, associated with each scope Internal structure used by compiler does not become code One entry for each variable declared Store declaration attributes (e.g., name and type) will discuss this in a few slides Table must be dynamic (why?) Possible implementations Linear list (easy to implement, only good for small programs) Binary search trees (better for large programs, but can still be slow) Hash tables (best solution)

Handling declarations Declarations of variables, arrays, functions, etc. Create entry in symbol table Allocate space in activation record Activation record stores information for a particular function call (arguments, return value, local variables, etc.) Need to have space for all of this information Activation record stored on program stack We will discuss these in more detail when we get to functions Simple declarations Declarations of simple types INT x; FLOAT f; Semantic action should Get the type and name of Check to see if is already in the symbol table If it isn t, add it, if it is, error Simple declarations (cont.) How do we get the type and name of an? var_decl! var_type id; var_type! INT FLOAT id! IDENTIFIER Where do we put the semantic actions? Simple declarations (cont.) How do we get the type and name of an? var_decl! var_type id; {currtable.add($1, $2) var_type! INT {$$ = INT} FLOAT {$$ = FLOAT} id! IDENTIFIER {$$ = $1} Where do we put the semantic actions? Pass up the type Pass up the variable name Use both to create a symbol table entry Managing symbol tables Maintain list of all symbol tables Maintain stack marking current symbol table Whenever you see a program block that allows declarations, create a new symbol table Push onto stack as current symbol table When you see declaration, add to current symbol table When you exit a program block, pop current symbol table off stack Managing symbol tables How do we manage multiple symbol tables? func_decls! func_decl func_decls empty func_decl! any_type id BEGIN decl statement_list END Where do we put the semantic actions?

Managing symbol tables How do we manage multiple symbol tables? func_decls! func_decl func_decls empty func_decl! any_type id BEGIN {symboltablestack.push(currtable) currtable = new symboltable();} decl statement_list {currtable = symboltablestack.pop()} END Where do we put the semantic actions? Constants Constants Symbol table needs a field to store constant value In general, the constant value may not be known until runtime (static final int i = 2 + j;) At compile time, we create code that allows the initialization expression to assign to the variable, then evaluate the expression at run-time Fixed size (static) arrays Arrays int A[10]; Store type and length of array When creating activation record, allocate enough space on stack for array What about variable size arrays? int A[M][N] Store information for a dope vector Tracks dimensionality of array, size, location Activation record stores dope vector At runtime, allocate array at top of stack, fill in dope vector Structs/classes Can have variables/methods declared inside, need extra symbol table Need to store visibility of members Complication: can create multiple instances of a struct or class! Need to store offset of each member in struct Pointers Need to store type information and length of what it points to Needed for pointer arithmetic int * a = &y; z = *(a + 1); Need to worry about forward declarations The thing being pointed to may not have been declared yet Class Foo; Foo * head; Class Foo {... }; Abstract syntax trees Tree representing structure of the program Built by semantic actions Some compilers skip this AST nodes Represent program construct Store important information about construct "10"

Referencing s ASTs for References Different behavior if is used in a declaration vs. expression If used in declaration, treat as before If in expression, need to: Check if it is symbol table Create new AST node with pointer to symbol table entry Note: may want to directly store type information in AST (or could look up in symbol table each time) Referencing Literals What about if we see a? primary INTLITERAL FLOATLITERAL Create AST node for Store string representation of 155, 2.45 etc. At some point, this will be converted into actual representation of For integers, may want to convert early (to do constant folding) For floats, may want to wait (for compilation to different machines). Why? More complex references Arrays A[i][j] is equivalent to A + i*dim_1 + j Extract dim_1 from symbol table or dope vector Structs A.f is equivalent to &A + offset(f) Find offset(f) in symbol table for declaration of record Strings Complicated depends on language Expressions Three semantic actions needed eval_binary (processes binary expressions) Create AST node with two children, point to AST nodes created for left and right sides eval_unary (processes unary expressions) Create AST node with one child process_op (determines type of operation) Store operator in AST node x + y + 5

x + y + 5 x + y + 5 x + y + 5 x + y + 5 Generating three-address code x + y + 5 For project, will need to generate three-address code op A, B, C //C = A op B Can do this directly or after building AST

Generating code from an AST Generating code directly Do a post-order walk of AST to generate code, pass generated code up data_object generate_code() { //pre-processing code data_object lcode = left.generate_code(); data_object rcode = right.generate_code(); return generate_self(lcode, rcode); } Important things to note: A node generates code for its children before generating code for itself Data object can contain code or other information Code generation is context free What does this mean? Generating code directly using semantic routines is very similar to generating code from the AST Why? Because post-order traversal is essentially what happens when you evaluate semantic actions as you pop them off stack AST nodes are just semantic records To generate code directly, your semantic records should contain structures to hold the code as it s being built Data objects Records various important info The temporary storing the result of the current expression Flags describing value in temporary Constant, L-value, R-value Code for expression L-values vs. R-values L-values: addresses which can be stored to or loaded from R-values: data (often loaded from addresses) Expressions operate on R-values Assignment statements: L-value := R-value Consider the statement a := a the a on LHS refers to the memory location referred to by a and we store to that location the a on RHS refers to data stored in memory location referred to by a so we will load from that location to produce the R-value Temporaries Can be thought of as an unlimited pool of registers (with memory to be allocated at a later time) Need to declare them like variables Name should be something that cannot appear in the program (e.g., use illegal character as prefix) Memory must be allocated if address of temporary can be taken (e.g. a := &b) Temporaries can hold either L-values or R-values Simple cases Generating code for constants/s Store constant in temporary Generating code for s Optional: pass up flag specifying this is a constant Generated code depends on whether is used as L- value or R-value Is this an address? Or data? One solution: just pass up to next level Mark it as an L-value (it s not yet data!) Generate code once we see how variable is used

Generating code for expressions Create a new temporary for result of expression Examine data-objects from subtrees If temporaries are L-values, load data from them into new temporaries Generate code to perform operation In project, no need to explicitly load If temporaries are constant, can perform operation immediately No need to perform code generation! Store result in new temporary Is this an L-value or an R-value? Return code for entire expression Generating code for assignment Store value of temporary from RHS into address specified by temporary from LHS Why does this work? Because temporary for LHS holds an address If LHS is an, we just stored the address of it in temporary If LHS is complex expression int *p = &x *(p + 1) = 7; it still holds an address, even though the address was computed by an expression Pointer operations So what do pointer operations do? Mess with L and R values & (address of operator): take L-value, and treat it as an R- value (without loading from it)! x = &a + 1; * (dereference operator): take R-value, and treat it as an L- value (an address)! *x = 7;