Semantic actions for declarations and expressions

Similar documents
Semantic actions for declarations and expressions

Semantic actions for declarations and expressions. Monday, September 28, 15

Semantic actions for expressions

Syntax Errors; Static Semantics

Syntax-Directed Translation. Lecture 14

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

CMPT 379 Compilers. Parse trees

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis?

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Programming Languages Third Edition. Chapter 7 Basic Semantics

Using an LALR(1) Parser Generator

Intermediate Code Generation

Last time. What are compilers? Phases of a compiler. Scanner. Parser. Semantic Routines. Optimizer. Code Generation. Sunday, August 29, 2010

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Time : 1 Hour Max Marks : 30

The Compiler So Far. Lexical analysis Detects inputs with illegal tokens. Overview of Semantic Analysis

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Lexical and Syntax Analysis

Project Compiler. CS031 TA Help Session November 28, 2011

Compiler Theory. (Semantic Analysis and Run-Time Environments)

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

Error Detection in LALR Parsers. LALR is More Powerful. { b + c = a; } Eof. Expr Expr + id Expr id we can first match an id:

Wednesday, August 31, Parsers

The role of semantic analysis in a compiler

Alternatives for semantic processing

3.5 Practical Issues PRACTICAL ISSUES Error Recovery

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Compilers Project 3: Semantic Analyzer

Lecture 7: Type Systems and Symbol Tables. CS 540 George Mason University

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CSE 12 Abstract Syntax Trees

Yacc: A Syntactic Analysers Generator

Context-free grammars

CS 6353 Compiler Construction Project Assignments

Compiler Construction: Parsing

Syntactic Directed Translation

CIT Lecture 5 Context-Free Grammars and Parsing 4/2/2003 1

Context-free grammars (CFG s)

Summary: Direct Code Generation

Lecture 16: Static Semantics Overview 1

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

The Structure of a Syntax-Directed Compiler

Chapter 4 :: Semantic Analysis

CSc 453. Compilers and Systems Software. 13 : Intermediate Code I. Department of Computer Science University of Arizona

Compilers. Compiler Construction Tutorial The Front-end

Chapter 6 part 1. Data Types. (updated based on 11th edition) ISBN

Syntax-Directed Translation. Introduction

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

COMPILER CONSTRUCTION Seminar 03 TDDB

1 Lexical Considerations

COMPILER CONSTRUCTION Seminar 02 TDDB44

Monday, September 13, Parsers

The SPL Programming Language Reference Manual

Undergraduate Compilers in a Day

Object-Oriented Principles and Practice / C++

CSE 401 Final Exam. December 16, 2010

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

QUIZ. What is wrong with this code that uses default arguments?

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11

CSC 467 Lecture 13-14: Semantic Analysis

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam

CS2210: Compiler Construction. Code Generation

CSE 413 Languages & Implementation. Hal Perkins Winter 2019 Structs, Implementing Languages (credits: Dan Grossman, CSE 341)

Working of the Compilers

Concepts Introduced in Chapter 6

Principles of Compiler Design

CS5363 Final Review. cs5363 1

UNIT-4 (COMPILER DESIGN)

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Semantic Analysis. Compiler Architecture

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Parsing III. (Top-down parsing: recursive descent & LL(1) )

CSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1

Single-pass Static Semantic Check for Efficient Translation in YAPL

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

A Short Summary of Javali

Chapter 2 A Quick Tour

Writing an Interpreter Thoughts on Assignment 6

CS 432 Fall Mike Lam, Professor. Code Generation

ECE220: Computer Systems and Programming Spring 2018 Honors Section due: Saturday 14 April at 11:59:59 p.m. Code Generation for an LC-3 Compiler

Intermediate Representations

Syntax Analysis Part I

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

When do We Run a Compiler?

// the current object. functioninvocation expression. identifier (expressionlist ) // call of an inner function

Static Semantics. Winter /3/ Hal Perkins & UW CSE I-1

Symbol Tables. ASU Textbook Chapter 7.6, 6.5 and 6.3. Tsan-sheng Hsu.

How do LL(1) Parsers Build Syntax Trees?

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

Compiler construction in4020 lecture 5

Concepts Introduced in Chapter 6

Semantic analysis and intermediate representations. Which methods / formalisms are used in the various phases during the analysis?

Parsing II Top-down parsing. Comp 412

Chapter 4. Lexical and Syntax Analysis

Transcription:

Semantic actions for declarations and expressions Semantic actions Semantic actions are routines called as productions (or parts of productions) are recognized Actions work together to build up intermediate representations <if-stmt> IF <expr> #startif THEN <stmts> END #endif Semantic action for #startif needs to pass a semantic record to #endif Semantic Records Data structures produced by semantic actions Associated with both non-terminals (code structures) and terminals (tokens/symbols) Do not have to exist (e.g., no action associated with ; ) Control statements often require multiple actions (see <ifstmt> example on previous slide) Typically: semantic records are produced by actions associated with terminals, and are passed to actions associated with non-terminals, which combine them to produce new semantic records Standard organization: semantic stack How do we manipulate stack? Action-controlled: actions directly manipulate stack (call push and pop) Parser-controlled: parser automatically manipulates stack LR-parser controlled Shift operations push semantic records onto stack (describing the token) Reduce operations pop semantic records associated with symbols off stack, replace with semantic record associated with production Action routines do not see stack. Can refer to popped off records using handles e.g., in yacc/bison, use $1, $2 etc. to refer to popped off records Example of semantic actions Consider following grammar: assign! ident := expr expr! term addop term term! ident LIT ident! ID addop! +

Example of semantic actions In Bison (note that lexer returns values for each token through yylval) Example of semantic stack Consider a := b + 1; assign! ident := expr {$$ = generateassign($1, $3);} expr! term addop term {$$ = generateexpr($1, $2, $3);} term! ident {$$ = generateterm($1);}!!! LIT {$$ = generateterm($1);} ident! ID {$$ = $1;} addop! + {$$ = ADD_OP;} {$$ = SUB_OP;} LL-controlled Parse stack contains predicted productions, not matched productions Push empty semantic records onto stack when production is predicted Fill in records as symbols are matched When non-terminal is matched, pop off records associated with RHS, use to fill in the record associated with LHS (leave LHS record on stack) Example of semantic actions In ANTLR: assign returns [Code c]!! ident := expr {$c = generatecode($ident.name, $expr.c);} expr returns [Code c]! t1=term addop t2=term {!! $c = generatecode($t1.t, $t2.t, $addop.optype);!! } term returns [Term t]!! ident {$t = generateterm($ident.s);}! LIT {$t = generateterm($lit.text);} ident returns [String s] ID {$s = $ID.text;} addop returns [OpType optype]! + {$optype = ADD_OP;} {$optype = SUB_OP;} Overview of declarations Symbol tables Action routines for simple declarations Action routines for advanced features Constants Enumerations Arrays Structs Pointers Symbol Tables Table of declarations, associated with each scope Internal structure used by compiler does not become code One entry for each variable declared Store declaration attributes (e.g., name and type) will discuss this in a few slides Table must be dynamic (why?) Possible implementations Linear list (easy to implement, only good for small programs) Binary search trees (better for large programs, but can still be slow) Hash tables (best solution)

Managing symbol tables Maintain list of all symbol tables Maintain stack marking current symbol table Whenever you see a program block that allows declarations, create a new symbol table Push onto stack as current symbol table When you see declaration, add to current symbol table When you exit a program block, pop current symbol table off stack Handling declarations Declarations of variables, arrays, functions, etc. Create entry in symbol table Allocate space in activation record Activation record stores information for a particular function call (arguments, return value, local variables, etc.) Need to have space for all of this information Activation record stored on program stack We will discuss these in more detail when we get to functions Simple declarations Declarations of simple types INT x; FLOAT f; Semantic action should Get the type and name of Check to see if is already in the symbol table If it isn t, add it, if it is, error Simple declarations (cont.) How do we get the type and name of an? var_decl! var_type id; var_type! INT FLOAT id! IDENTIFIER Where do we put the semantic actions? Simple declarations (cont.) How do we get the type and name of an? var_decl! var_type id; #decl_id var_type! INT #int_type FLOAT #float_type id! IDENTIFIER #id Where do we put the semantic actions? When we process #int_type and #id, can store the type and name and pass them to #decl_id When creating activation record, allocate space based on type (why?) Constants Constants Symbol table needs a field to store constant value In general, the constant value may not be known until runtime (static final int i = 2 + j;) At compile time, we create code that allows the initialization expression to assign to the variable, then evaluate the expression at run-time

Fixed size (static) arrays Arrays int A[10]; Store type and length of array When creating activation record, allocate enough space on stack for array What about variable size arrays? int A[M][N] Store information for a dope vector Tracks dimensionality of array, size, location Activation record stores dope vector At runtime, allocate array at top of stack, fill in dope vector Defining new types Some declarations define new types! Enums, structs, classes This information must be stored in the symbol table, too (Why?) Enums Enumeration types: enum days {mon, tue, wed, thu, fri, sat, sun}; Create an entry for the enumeration type itself, and an entry for each member of the enumeration Entries are usually linked Processing enum declaration sets the enum counter to lower bound (usually 0) Each new member seen is assigned the next value and the counter is incremented In some languages (e.g., C), enum members may be assigned particular values. Should ensure that enum value isn t reused Structs/classes Can have variables/methods declared inside, need extra symbol table Need to store visibility of members Complication: can create multiple instances of a struct or class! Need to store offset of each member in struct Pointers Need to store type information and length of what it points to Needed for pointer arithmetic int * a = &y; z = *(a + 1); Need to worry about forward declarations The thing being pointed to may not have been declared yet Class Foo; Foo * head; Class Foo {... }; Abstract syntax trees Tree representing structure of the program Built by semantic actions Some compilers skip this AST nodes Represent program construct Store important information about construct "10"

Referencing s ASTs for References Different behavior if is used in a declaration vs. expression If used in declaration, treat as before If in expression, need to: Check if it is symbol table Create new AST node with pointer to symbol table entry Note: may want to directly store type information in AST (or could look up in symbol table each time) Referencing Literals What about if we see a? primary INTLITERAL FLOATLITERAL Create AST node for Store string representation of 155, 2.45 etc. At some point, this will be converted into actual representation of For integers, may want to convert early (to do constant folding) For floats, may want to wait (for compilation to different machines). Why? More complex references Arrays A[i][j] is equivalent to A + i*dim_1 + j Extract dim_1 from symbol table or dope vector Structs A.f is equivalent to &A + offset(f) Find offset(f) in symbol table for declaration of record Strings Complicated depends on language Expressions Three semantic actions needed eval_binary (processes binary expressions) Create AST node with two children, point to AST nodes created for left and right sides eval_unary (processes unary expressions) Create AST node with one child process_op (determines type of operation) Store operator in AST node x + y + 5

x + y + 5 x + y + 5 x + y + 5 x + y + 5 Generating three-address code x + y + 5 For project, will need to generate three-address code op A, B, C //C = A op B Can do this directly or after building AST

Generating code from an AST Generating code directly Do a post-order walk of AST to generate code, pass generated code up data_object generate_code() { //pre-processing code data_object lcode = left.generate_code(); data_object rcode = right.generate_code(); return generate_self(lcode, rcode); } Important things to note: A node generates code for its children before generating code for itself Data object can contain code or other information Code generation is context free What does this mean? Generating code directly using semantic routines is very similar to generating code from the AST Why? Because post-order traversal is essentially what happens when you evaluate semantic actions as you pop them off stack AST nodes are just semantic records To generate code directly, your semantic records should contain structures to hold the code as it s being built Data objects Records various important info The temporary storing the result of the current expression Flags describing value in temporary Constant, L-value, R-value Code for expression L-values vs. R-values L-values: addresses which can be stored to or loaded from R-values: data (often loaded from addresses) Expressions operate on R-values Assignment statements: L-value := R-value Consider the statement a := a the a on LHS refers to the memory location referred to by a and we store to that location the a on RHS refers to data stored in memory location referred to by a so we will load from that location to produce the R-value Temporaries Can be thought of as an unlimited pool of registers (with memory to be allocated at a later time) Need to declare them like variables Name should be something that cannot appear in the program (e.g., use illegal character as prefix) Memory must be allocated if address of temporary can be taken (e.g. a := &b) Temporaries can hold either L-values or R-values Simple cases Generating code for constants/s Store constant in temporary Generating code for s Optional: pass up flag specifying this is a constant Generated code depends on whether is used as L- value or R-value Is this an address? Or data? One solution: just pass up to next level Mark it as an L-value (it s not yet data!) Generate code once we see how variable is used

Generating code for expressions Create a new temporary for result of expression Examine data-objects from subtrees If temporaries are L-values, load data from them into new temporaries Generate code to perform operation In project, no need to explicitly load If temporaries are constant, can perform operation immediately No need to perform code generation! Store result in new temporary Is this an L-value or an R-value? Return code for entire expression Generating code for assignment Store value of temporary from RHS into address specified by temporary from LHS Why does this work? Because temporary for LHS holds an address If LHS is an, we just stored the address of it in temporary If LHS is complex expression int *p = &x *(p + 1) = 7; it still holds an address, even though the address was computed by an expression Pointer operations So what do pointer operations do? Mess with L and R values & (address of operator): take L-value, and treat it as an R- value (without loading from it)! x = &a + 1; * (dereference operator): take R-value, and treat it as an L- value (an address)! *x = 7;