Semantic actions for declarations and expressions. Monday, September 28, 15

Similar documents
Semantic actions for declarations and expressions

Semantic actions for declarations and expressions

Semantic actions for expressions

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking

Last time. What are compilers? Phases of a compiler. Scanner. Parser. Semantic Routines. Optimizer. Code Generation. Sunday, August 29, 2010

CMPT 379 Compilers. Parse trees

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Syntax-Directed Translation. Lecture 14

Programming Languages Third Edition. Chapter 7 Basic Semantics

Intermediate Code Generation

Syntax-Directed Translation. Introduction

Syntax Errors; Static Semantics

Using an LALR(1) Parser Generator

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis?

CSE 12 Abstract Syntax Trees

The Compiler So Far. Lexical analysis Detects inputs with illegal tokens. Overview of Semantic Analysis

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Project Compiler. CS031 TA Help Session November 28, 2011

Wednesday, September 9, 15. Parsers

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

The role of semantic analysis in a compiler

Syntactic Directed Translation

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Compilers. Compiler Construction Tutorial The Front-end

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

Compiler Theory. (Semantic Analysis and Run-Time Environments)

3.5 Practical Issues PRACTICAL ISSUES Error Recovery

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

Compilers Project 3: Semantic Analyzer

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Time : 1 Hour Max Marks : 30

Lecture 7: Type Systems and Symbol Tables. CS 540 George Mason University

Yacc: A Syntactic Analysers Generator

CS 6353 Compiler Construction Project Assignments

COMPILER CONSTRUCTION Seminar 02 TDDB44

CSC 467 Lecture 13-14: Semantic Analysis

Building Compilers with Phoenix

Chapter 4 :: Semantic Analysis

// the current object. functioninvocation expression. identifier (expressionlist ) // call of an inner function

COMPILER CONSTRUCTION Seminar 03 TDDB

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

Compiler construction in4020 lecture 5

Chapter 2 A Quick Tour

Lexical and Syntax Analysis

Undergraduate Compilers in a Day

1 Lexical Considerations

The Structure of a Syntax-Directed Compiler

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam

Chapter 20: Binary Trees

Semantic Analysis. Lecture 9. February 7, 2018

Static Semantics. Winter /3/ Hal Perkins & UW CSE I-1

Wednesday, August 31, Parsers

Intermediate Representations

Alternatives for semantic processing

Semantic Analysis. Compiler Architecture

QUIZ. What is wrong with this code that uses default arguments?

CSc 453. Compilers and Systems Software. 13 : Intermediate Code I. Department of Computer Science University of Arizona

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

CSE P 501 Compilers. Static Semantics Hal Perkins Winter /22/ Hal Perkins & UW CSE I-1

Summary: Direct Code Generation

Error Detection in LALR Parsers. LALR is More Powerful. { b + c = a; } Eof. Expr Expr + id Expr id we can first match an id:

CS 432 Fall Mike Lam, Professor. Code Generation

Working of the Compilers

The SPL Programming Language Reference Manual

Semantic analysis and intermediate representations. Which methods / formalisms are used in the various phases during the analysis?

Static Checking and Intermediate Code Generation Pat Morin COMP 3002

Compiler construction 2002 week 5

Chapter 4. Lexical and Syntax Analysis

UNIT-4 (COMPILER DESIGN)

Chapter 4 - Semantic Analysis. June 2, 2015

Context-free grammars

Syntax-Directed Translation

Symbol Tables. ASU Textbook Chapter 7.6, 6.5 and 6.3. Tsan-sheng Hsu.

CSE 413 Languages & Implementation. Hal Perkins Winter 2019 Structs, Implementing Languages (credits: Dan Grossman, CSE 341)

GBIL: Generic Binary Instrumentation Language. Language Reference Manual. By: Andrew Calvano. COMS W4115 Fall 2015 CVN

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

cmps104a 2002q4 Assignment 3 LALR(1) Parser page 1

Principles of Compiler Design

Today's Topics. CISC 458 Winter J.R. Cordy

CMSC 330: Organization of Programming Languages

CSE302: Compiler Design

Intermediate Representations

CS5363 Final Review. cs5363 1

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Compiler Construction: Parsing

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Assignment 11: functions, calling conventions, and the stack

Static Semantics. Lecture 15. (Notes by P. N. Hilfinger and R. Bodik) 2/29/08 Prof. Hilfinger, CS164 Lecture 15 1

CS 536 Midterm Exam Spring 2013

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Context-free grammars (CFG s)

Lecture 16: Static Semantics Overview 1

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11

When do We Run a Compiler?

Transcription:

Semantic actions for declarations and expressions

Semantic actions Semantic actions are routines called as productions (or parts of productions) are recognized Actions work together to build up intermediate representations Conceptually think of this as follows: Every non-terminal should have some information associated with it (code, declared variables, etc.) Each child of a non-terminal can pass the information it has to its parent non-terminal, which uses the information from its children to build up more information We call these semantic records

Semantic Records Data structures produced by semantic actions Associated with both non-terminals (code structures) and terminals (tokens/symbols) Standard organization: semantic stack When you process a terminal (leaf in the parse tree), push its semantic record onto the stack When you process a non-terminal: Pop all the records associated with its children Generate a record for the non-terminal Push that record onto the stack

How do we manipulate stack? Action-controlled: actions directly manipulate stack (call push and pop) Parser-controlled: parser automatically manipulates stack

LR-parser controlled Shift operations push semantic records onto stack (describing the token) Reduce operations pop semantic records associated with symbols off stack, replace with semantic record associated with production Action routines do not see stack. Can refer to popped off records using handles e.g., in yacc/bison, use $1, $2 etc. to refer to popped off records

Example of semantic actions Consider following grammar: assign ident := expr expr term addop term term ident LIT ident ID addop +

Example of semantic actions In Bison (note that lexer returns values for each token through yylval) assign ident := expr {$$ = generateassign($1, $3);} expr term addop term {$$ = generateexpr($1, $2, $3);} term ident {$$ = generateterm($1);} LIT {$$ = generateterm($1);} ident ID {$$ = $1;} addop + {$$ = ADD_OP;} {$$ = SUB_OP;}

Example of semantic stack Consider a := b + 1;

LL-controlled Even though LL parsers are not bottom up, semantic stack operates in basically the same way LL parsers take semantic actions as they encounter them while matching a production Add semantic action for a non-terminal at the end of the production Action for whole production gets processed after all of the intermediate parts of the production get processed In practice, this looks just like the LR-controlled stack

Example of semantic actions In ANTLR: assign returns [Code c] ident := expr {$c = generatecode($ident.name, $expr.c);} expr returns [Code c] t1=term addop t2=term { $c = generatecode($t1.t, $t2.t, $addop.optype); } term returns [Term t] ident {$t = generateterm($ident.s);} LIT {$t = generateterm($lit.text);} ident returns [String s] ID {$s = $ID.text;} addop returns [OpType optype] + {$optype = ADD_OP;} {$optype = SUB_OP;}

Overview of declarations Symbol tables Action routines for simple declarations Action routines for advanced features Constants Enumerations Arrays Structs Pointers

Symbol Tables Table of declarations, associated with each scope Internal structure used by compiler does not become code One entry for each variable declared Store declaration attributes (e.g., name and type) will discuss this in a few slides Table must be dynamic (why?) Possible implementations Linear list (easy to implement, only good for small programs) Binary search trees (better for large programs, but can still be slow) Hash tables (best solution)

Handling declarations Declarations of variables, arrays, functions, etc. Create entry in symbol table Allocate space in activation record Activation record stores information for a particular function call (arguments, return value, local variables, etc.) Need to have space for all of this information Activation record stored on program stack We will discuss these in more detail when we get to functions

Simple declarations Declarations of simple types INT x; FLOAT f; Semantic action should Get the type and name of identifier Check to see if identifier is already in the symbol table If it isn t, add it, if it is, error

Simple declarations (cont.) How do we get the type and name of an identifier? var_decl! var_type id; var_type! INT FLOAT id! IDENTIFIER Where do we put the semantic actions?

Simple declarations (cont.) How do we get the type and name of an identifier? var_decl! var_type id; {currtable.add($1, $2) var_type! INT {$$ = INT} FLOAT {$$ = FLOAT} id! IDENTIFIER {$$ = $1} Where do we put the semantic actions? Pass up the type Pass up the variable name Use both to create a symbol table entry

Managing symbol tables Maintain list of all symbol tables Maintain stack marking current symbol table Whenever you see a program block that allows declarations, create a new symbol table Push onto stack as current symbol table When you see declaration, add to current symbol table When you exit a program block, pop current symbol table off stack

Managing symbol tables How do we manage multiple symbol tables? func_decls! func_decl func_decls empty func_decl! any_type id BEGIN decl statement_list END Where do we put the semantic actions?

Managing symbol tables How do we manage multiple symbol tables? func_decls! func_decl func_decls empty func_decl! any_type id BEGIN {symboltablestack.push(currtable) currtable = new symboltable();} decl statement_list {currtable = symboltablestack.pop()} END Where do we put the semantic actions?

Constants Constants Symbol table needs a field to store constant value In general, the constant value may not be known until runtime (static final int i = 2 + j;) At compile time, we create code that allows the initialization expression to assign to the variable, then evaluate the expression at run-time

Arrays Fixed size (static) arrays int A[10]; Store type and length of array When creating activation record, allocate enough space on stack for array What about variable size arrays? int A[M][N] Store information for a dope vector Tracks dimensionality of array, size, location Activation record stores dope vector At runtime, allocate array at top of stack, fill in dope vector

Structs/classes Can have variables/methods declared inside, need extra symbol table Need to store visibility of members Complication: can create multiple instances of a struct or class! Need to store offset of each member in struct

Pointers Need to store type information and length of what it points to Needed for pointer arithmetic int * a = &y; z = *(a + 1); Need to worry about forward declarations The thing being pointed to may not have been declared yet Class Foo; Foo * head; Class Foo {... };

Abstract syntax trees Tree representing structure of the program Built by semantic actions Some compilers skip this AST nodes Represent program construct Store important information about construct identifier "x" binary_op operator: + literal "10"

ASTs for References

Referencing identifiers Different behavior if identifier is used in a declaration vs. expression If used in declaration, treat as before If in expression, need to: Check if it is symbol table Create new AST node with pointer to symbol table entry Note: may want to directly store type information in AST (or could look up in symbol table each time)

Referencing Literals What about if we see a literal? primary INTLITERAL FLOATLITERAL Create AST node for literal 155, 2.45 etc. Store string representation of literal At some point, this will be converted into actual representation of literal For integers, may want to convert early (to do constant folding) For floats, may want to wait (for compilation to different machines). Why?

More complex references Arrays A[i][j] is equivalent to A + i*dim_1 + j Extract dim_1 from symbol table or dope vector Structs A.f is equivalent to &A + offset(f) Find offset(f) in symbol table for declaration of record Strings Complicated depends on language

Expressions Three semantic actions needed eval_binary (processes binary expressions) Create AST node with two children, point to AST nodes created for left and right sides eval_unary (processes unary expressions) Create AST node with one child process_op (determines type of operation) Store operator in AST node

Expressions example x + y + 5

Expressions example x + y + 5 binary_op operator: + binary_op operator: + literal "5" identifier "x" identifier "y"

Expressions example x + y + 5 binary_op operator: + binary_op operator: + literal "5" identifier "x" identifier "y"

Expressions example x + y + 5 binary_op operator: + binary_op operator: + literal "5" identifier "x" identifier "y"

Expressions example x + y + 5 binary_op operator: + binary_op operator: + literal "5" identifier "x" identifier "y"

Expressions example x + y + 5 binary_op operator: + binary_op operator: + literal "5" identifier "x" identifier "y"

Generating three-address code For project, will need to generate three-address code op A, B, C //C = A op B Can do this directly or after building AST

Generating code from an AST Do a post-order walk of AST to generate code, pass generated code up data_object generate_code() { //pre-processing code data_object lcode = left.generate_code(); data_object rcode = right.generate_code(); return generate_self(lcode, rcode); } Important things to note: A node generates code for its children before generating code for itself Data object can contain code or other information Code generation is context free What does this mean?

Generating code directly Generating code directly using semantic routines is very similar to generating code from the AST Why? Because post-order traversal is essentially what happens when you evaluate semantic actions as you pop them off stack AST nodes are just semantic records To generate code directly, your semantic records should contain structures to hold the code as it s being built

Data objects Records various important info The temporary storing the result of the current expression Flags describing value in temporary Constant, L-value, R-value Code for expression

L-values vs. R-values L-values: addresses which can be stored to or loaded from R-values: data (often loaded from addresses) Expressions operate on R-values Assignment statements: L-value := R-value Consider the statement a := a the a on LHS refers to the memory location referred to by a and we store to that location the a on RHS refers to data stored in memory location referred to by a so we will load from that location to produce the R-value

Temporaries Can be thought of as an unlimited pool of registers (with memory to be allocated at a later time) Need to declare them like variables Name should be something that cannot appear in the program (e.g., use illegal character as prefix) Memory must be allocated if address of temporary can be taken (e.g. a := &b) Temporaries can hold either L-values or R-values

Simple cases Generating code for constants/literals Store constant in temporary Generating code for identifiers Optional: pass up flag specifying this is a constant Generated code depends on whether identifier is used as L- value or R-value Is this an address? Or data? One solution: just pass identifier up to next level Mark it as an L-value (it s not yet data!) Generate code once we see how variable is used

Generating code for expressions Create a new temporary for result of expression Examine data-objects from subtrees If temporaries are L-values, load data from them into new temporaries Generate code to perform operation In project, no need to explicitly load If temporaries are constant, can perform operation immediately No need to perform code generation! Store result in new temporary Is this an L-value or an R-value? Return code for entire expression

Generating code for assignment Store value of temporary from RHS into address specified by temporary from LHS Why does this work? Because temporary for LHS holds an address If LHS is an identifier, we just stored the address of it in temporary If LHS is complex expression int *p = &x *(p + 1) = 7; it still holds an address, even though the address was computed by an expression

Pointer operations So what do pointer operations do? Mess with L and R values & (address of operator): take L-value, and treat it as an R- value (without loading from it) x = &a + 1; * (dereference operator): take R-value, and treat it as an L- value (an address) *x = 7;