Semantic analysis. Semantic analysis. What we ll cover. Type checking. Symbol tables. Symbol table entries

Similar documents
CSE 401: Introduction to Compiler Construction. Course Outline. Grading. Project

Static Semantics. Winter /3/ Hal Perkins & UW CSE I-1

Compiler Passes. Semantic Analysis. Semantic Analysis/Checking. Symbol Tables. An Example

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

The role of semantic analysis in a compiler

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far

Lecture 7: Type Systems and Symbol Tables. CS 540 George Mason University

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

CSE 431S Type Checking. Washington University Spring 2013

Type Bindings. Static Type Binding

Semantic Processing. Semantic Errors. Semantics - Part 1. Semantics - Part 1

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis?

The Compiler So Far. Lexical analysis Detects inputs with illegal tokens. Overview of Semantic Analysis

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Informatica 3 Syntax and Semantics

CSE 307: Principles of Programming Languages

Motivation for typed languages

CSCE 314 Programming Languages. Type System

Names, Scopes, and Bindings II. Hwansoo Han

Symbol Tables. For compile-time efficiency, compilers often use a symbol table: associates lexical names (symbols) with their attributes

Informal Semantics of Data. semantic specification names (identifiers) attributes binding declarations scope rules visibility

Concepts Introduced in Chapter 7

Symbol Table Information. Symbol Tables. Symbol table organization. Hash Tables. What kind of information might the compiler need?

Imperative Programming

Compilers. Type checking. Yannis Smaragdakis, U. Athens (original slides by Sam

Semantic actions for expressions

Semantic Analysis. How to Ensure Type-Safety. What Are Types? Static vs. Dynamic Typing. Type Checking. Last time: CS412/CS413

CSC 533: Organization of Programming Languages. Spring 2005

CSE P 501 Compilers. Static Semantics Hal Perkins Winter /22/ Hal Perkins & UW CSE I-1

Static Semantics. Lecture 15. (Notes by P. N. Hilfinger and R. Bodik) 2/29/08 Prof. Hilfinger, CS164 Lecture 15 1

Semantic actions for declarations and expressions

Compilers CS S-05 Semantic Analysis

The basic operations defined on a symbol table include: free to remove all entries and free the storage of a symbol table

Syntax Errors; Static Semantics

Semantic actions for declarations and expressions. Monday, September 28, 15

The SPL Programming Language Reference Manual

Lexical Considerations

1 Lexical Considerations

Intermediate Code Generation

About this exam review

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 17: Types and Type-Checking 25 Feb 08

Concepts Introduced in Chapter 6

Note 3. Types. Yunheung Paek. Associate Professor Software Optimizations and Restructuring Lab. Seoul National University

Chapter 5. Variables. Topics. Imperative Paradigm. Von Neumann Architecture

Context-free grammars (CFG s)

CS 330 Lecture 18. Symbol table. C scope rules. Declarations. Chapter 5 Louden Outline

1 Terminology. 2 Environments and Static Scoping. P. N. Hilfinger. Fall Static Analysis: Scope and Types

Lexical Considerations

Chapter 3:: Names, Scopes, and Bindings (cont.)

Chapter 3:: Names, Scopes, and Bindings (cont.)

Compiler construction

CS558 Programming Languages

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Acknowledgement. CS Compiler Design. Semantic Processing. Alternatives for semantic processing. Intro to Semantic Analysis. V.

Symbol Tables. ASU Textbook Chapter 7.6, 6.5 and 6.3. Tsan-sheng Hsu.

Static Checking and Type Systems

Lecture 16: Static Semantics Overview 1

Pierce Ch. 3, 8, 11, 15. Type Systems

Concepts Introduced in Chapter 6

The PCAT Programming Language Reference Manual

Attributes, Bindings, and Semantic Functions Declarations, Blocks, Scope, and the Symbol Table Name Resolution and Overloading Allocation, Lifetimes,

Programming Languages, Summary CSC419; Odelia Schwartz

CS S-06 Semantic Analysis 1

CSE 504. Expression evaluation. Expression Evaluation, Runtime Environments. One possible semantics: Problem:

Short Notes of CS201

Introduction to Programming Using Java (98-388)

Fundamentals of Programming Languages

G Programming Languages - Fall 2012

CS201 - Introduction to Programming Glossary By

COMP520 - GoLite Type Checking Specification

Review of the C Programming Language

SE352b: Roadmap. SE352b Software Engineering Design Tools. W3: Programming Paradigms

COMP520 - GoLite Type Checking Specification

CSE 452: Programming Languages. Outline of Today s Lecture. Expressions. Expressions and Control Flow

Type systems. Static typing

Type Checking and Type Inference

Expressions and Assignment

Today's Topics. CISC 458 Winter J.R. Cordy

Chapter 5 Names, Binding, Type Checking and Scopes

Chapter 5. Names, Bindings, and Scopes

Context Analysis. Mooly Sagiv. html://

9/7/17. Outline. Name, Scope and Binding. Names. Introduction. Names (continued) Names (continued) In Text: Chapter 5

Visitors. Move functionality

Programming Languages Third Edition. Chapter 7 Basic Semantics

Semantic actions for declarations and expressions

Connecting Definition and Use? Tiger Semantic Analysis. Symbol Tables. Symbol Tables (cont d)

Data Types. Every program uses data, either explicitly or implicitly to arrive at a result.

Types and Type Inference

G Programming Languages - Fall 2012

Lecture 9: Parameter Passing, Generics and Polymorphism, Exceptions

Programming Languages Third Edition. Chapter 10 Control II Procedures and Environments

Programmiersprachen (Programming Languages)

Lecture 4 Memory Management

Programming Languages

UNIT-4 (COMPILER DESIGN)

Topic IV. Parameters. Chapter 5 of Programming languages: Concepts & constructs by R. Sethi (2ND EDITION). Addison-Wesley, 1996.

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Topic IV. Block-structured procedural languages Algol and Pascal. References:

Answer: Early binding generally leads to greater efficiency (compilation approach) Late binding general leads to greater flexibility

Transcription:

Susan Eggers 1 CSE 401 Semantic analysis Semantic analysis Final part of analysis half of compilation lexical analysis syntactic analysis semantic analysis Afterwards comes synthesis half of compilation Input: AST Output: AST Purpose of semantic analysis: perform final checking of legality of input program, not done by lexical and syntactic checking Checks: type checking relate assignments to & references of particular variables with declarations uniqueness checks: e.g., labels, declarations same name at beginning and end of a module/procedure declaration flow of control checking: break Susan Eggers 2 CSE 401 Type checking What we ll cover Type of a construct is correct for its context Examples of type checking: operands compatible with operator function call has correct number & type of arguments result of an operation is the correct type only index arrays only dereference pointers Symbol tables: creation & use Types & type checking Type equivalence & conversion Compiling vs. interpreting Susan Eggers 3 CSE 401 Susan Eggers 4 CSE 401 Symbol tables Symbol table entries Key data structure during semantic analysis, code generation Stores information about names used in a program keywords can be hardcoded entries declarations: add entries to symbol table uses of name: look up symbol table entry need an organization that grows dynamically need an organization that reflects scope var x integer location (later) var y array[20] of bool location (later) const pair integer 2 proc foo int, float:void location (later) formal a integer: by value label target location (later) keyword if var ptr pointer to integer Susan Eggers 5 CSE 401 Susan Eggers 6 CSE 401

Susan Eggers 7 CSE 401 Implementation strategies Implementation strategies, cont d. Option 1: linked list of key/attribute pairs enter time: O(1) (assumes no checking to see if already entered) lookup time: O(n) (n entries in table) space cost: O(n) Option 4: hash table requires keys that can be hashed well requires good guess of size of table (k) enter time: O(1) expected, O(n) worst case lookup time: O(1) expected, O(n) worst case space cost: O(k+n) Option 2: sorted binary search tree requires keys that can be ordered enter time: O(log n) expected, O(n) worst case lookup time: O(log n) expected, O(n) worst case space cost: O(n) Summary: use hash tables for big mappings, binary tree or linked list for small mappings ideal: self-reorganizing data structure Option 3: balanced search tree (AVL, red-black, splay, ) like 2, but O(log n) worst (or amortized) case Susan Eggers 8 CSE 401 Scope Nested scopes In a statically (aka lexically) scoped, block structured language, the Scope of a declared name is the region of the program text within which uses of that name all refer to the same variable. E.g., in C the scope of a declaration is - the entire block in which it appears, - (or whole file, if not in a block) - excluding nested blocks in which the name is re-declared Algol, Pascal, Ada (& PL/0) are similar, but can nest procedure declarations, too. How to handle nested scopes? procedure foo(x:int, q:int); var z:bool; const y:bool = true; procedure bar(x:array[5] of bool); var y:int; begin x[y] := z; end bar; begin while z do var z:int, y:int; y := z * x; end; output := x + y; end foo; Scope and lifetime related, but independent - Locals usually stack-allocated, lifetime = lifetime of call - but, e.g., C static variables can have local scope, same lifetime as globals Susan Eggers 9 CSE 401 Susan Eggers 10 CSE 401 Nested scopes Nested scope operations Want references to use closest textually-enclosing declaration static/lexical scoping, block structure When enter new scope during semantic analysis/type-checking: create a new, empty scope push it on top of scope stack Simple solution: keep stack (linked list) of scopes stack represents static nesting structure of program top of stack = most closely nested When encounter declaration: add entry to scope on top of stack check for duplicates in that scope only Used in PL/0 statically a tree of scopes each SymTabScope points to enclosing SymTabScope (_parent) maintains down links, too (_children) used like a stack during semantic analysis When encounter use: search scopes for declaration, beginning with top of stack can find name in any scope When exit scope: pop top scope off stack Susan Eggers 11 CSE 401 Susan Eggers 12 CSE 401

Susan Eggers 13 CSE 401 Symbol table interface in PL/0 Symbol table entries class SymTabScope { public: SymTabScope(SymTabScope* enclosingscope) {; // routines to add & lookup ST entries: void enter(symtabentry* newsymbol); SymTabEntry* lookup(char* name); SymTabEntry* lookup(char* name, SymTabScope*& retscope); class SymTabEntry { public: char* name(); Type* type(); virtual bool isconstant(); virtual bool isvariable(); virtual bool isformal(); virtual bool isprocedure(); // space allocation routines: void allocatespace(); int allocatelocal(int size); int allocateformal(int size); ; // space allocation routine: virtual void allocatespace(symtabscope* s); // constants only: virtual int value(); // variables only: virtual int offset(symtabscope* s); ; class VarSTE : public SymTabEntry { ; class FormalSTE: public VarSTE { ; class ConstSTE : public SymTabEntry { ; class ProcSTE : public SymTabEntry { ; Susan Eggers 14 CSE 401 Creating symbol table entries Types void VarDeclItem::typecheck(SymTabScope* s) { VarSTE* varste = new VarSTE(_name, t); s->enter(varste); Types are abstractions of values that share common properties Type checking uses types to compute whether operations on values will be legal void ConstDeclItem::typecheck(SymTabScope* s) { ConstSTE* constste = new ConstSTE (_name, t, constant_value); s->enter(constste); Susan Eggers 15 CSE 401 Susan Eggers 16 CSE 401 Taxonomy of types Representing types in PL/0 Basic types: int, bool, char, real, string, void user-defined types: SymTabScope, Type constructors: ptr (type) array (index-range, element-type) record (name 1 :type 1,, name n :type n ) union (type 1,, type n ) function (arg-types, result-type) class Type { virtual bool same(type* t); bool different(type* t) { return!same(t); ; class IntegerType : public Type {; class BooleanType : public Type {; class ProcedureType : public Type { TypeArray* _formaltypes; ; IntegerType* integertype; BooleanType* booleantype; Susan Eggers 17 CSE 401 Susan Eggers 18 CSE 401

Susan Eggers 19 CSE 401 Type checking terminology Bottom-up type checking Static vs. dynamic typing static: checking done at compile time dynamic: checking done during execution Strong vs. weak typing strong: guarantees no illegal operations performed weak: can t make guarantees static dynamic strong Ada Lisp Smalltalk weak C Fortran Traverse AST graph from leaves up At each node: recursively type check subnodes (if any) check legality of current node, given types of subnodes compute & return result type of current node (if any) Needs info from enclosing context, too need to know types of variables referenced pass down symbol table during traversal legality of e.g., break, return statements pass down whether in loop, result type of function Caveats: hybrids are common mistaken usages are common strong for static untyped, typeless could mean dynamic or weak Susan Eggers 20 CSE 401 Type checking expressions Type checking expressions Type* IntegerLiteral::typecheck(SymTabScope* s) { // return result type return integertype; Type* VarRef::typecheck(SymTabScope* s) { SymTabEntry* ste = s->lookup(_ident); // check for errors if (ste == NULL) { Plzero->typeError( undeclared var ); if (! ste->isconstant() &&! ste->isvariable()) { Plzero->typeError( not a var or const ); // return result type return ste->type(); Type* BinOp::typecheck(SymTabScope* s) { // check & compute types of subexpressions Type* left = _left->typecheck(s); Type* right = _right->typecheck(s); // check the types of the operands switch(_op) { case PLUS: case MINUS: case LEQ: if ( left->different(integertype) right->different(integertype)) { Plzero->typeError( args not ints ); break; case EQL: case NEQ: if (left->different(right)) { Plzero->typeError( args not same type ); break; // return result type switch (_op) { case PLUS: case MINUS: case MUL: case DIVIDE: return integertype; case EQL: case NEQ: return booleantype; Susan Eggers 21 CSE 401 Susan Eggers 22 CSE 401 Type checking statements Type checking statements void AssignStmt::typecheck(SymTabScope* s) { // check & compute types of subexpressions Type* lhs = _lvalue->typecheck_lvalue(s); Type* rhs = _expr->typecheck(s); // check legality of subexpression types if (lhs->different(rhs)) { Plzero->typeError( lhs & rhs types differ ); void IfStmt::typecheck(SymTabScope* s) { // check & compute types of subexpressions Type* test = _test->typecheck(s); // check legality of subexpression types if (test->different(booleantype)) { Plzero->typeError( test not a boolean ); // check nested statements for (int i = 0; i < _then_stmts->length(); i++) { _then_stmts->fetch(i)->typecheck(s); Susan Eggers 23 CSE 401 Susan Eggers 24 CSE 401

Susan Eggers 25 CSE 401 Type checking statements Type checking declarations void CallStmt::typecheck(SymTabScope* s) { // type check arguments, accumulate list of // argument types TypeArray* argtypes = new TypeArray; for (int i = 0; i < _args->length(); i++) { Type* argtype = _args->fetch(i)-> typecheck(s); argtypes->add(argtype); ProcType* proctype = new ProcType(argTypes); void VarDecl::typecheck(SymTabScope* s) { for (int i = 0; i < _items->length(); i++) { _items->fetch(i)->typecheck(s); void VarDeclItem::typecheck(SymTabScope* s) { Type* t = _type->typecheck(s); VarSTE* entry = new VarSTE(_name, t); s->enter(entry); // check callee procedure SymTabEntry* ste = s->lookup(_ident); if (ste == NULL) { Plzero->typeError( undeclared procedure ); Type* proctype2 = ste->type(); // check compatibility of actuals & formals if (proctype2->different(proctype)) { Plzero->typeError( wrong arg types ); Susan Eggers 26 CSE 401 Type checking declarations Type checking declarations void ConstDecl::typecheck(SymTabScope* s) { for (int i = 0; i < _items->length(); i++) { _items->fetch(i)->typecheck(s); void ConstDeclItem::typecheck(SymTabScope* s) { Type* t = _type->typecheck(s); // type check initializer Type* exprtype = _expr->typecheck(s); // make sure initializer is constant expr int value = _expr->resolve_constant(s); // make sure rhs matches declared type if (t->different(exprtype)) { Plzero->typeError( init of wrong type ); ConstSTE* entry = new ConstSTE(_name, t, value); s->enter(entry); void ProcDecl::typecheck(SymTabScope* s) { // create scope for body of procedure SymTabScope* body_scope = new SymTabScope(s); // enter formals into nested scope TypeArray* formaltypes = new TypeArray; for (int i = 0; i < _formals->length();i++) { FormalDecl* formal = _formals->fetch(i); Type* t = formal->typecheck(s, body_scope); formaltypes->add(t); // construct procedure s type ProcType* proctype = new ProcType(formalTypes); // add entry for procedure in enclosing scope ProcSTE* entry = new ProcSTE(_name, proctype); s->enter(procste); // type check procedure body _block->typecheck(body_scope); Susan Eggers 27 CSE 401 Susan Eggers 28 CSE 401 Starting out Extensions int TPlzero::main2(int argc, char** argv) { typecheckphase(); void TPlzero::typeCheckPhase() { module->typecheck(null); // no enclosing symbol table void ModuleDecl::typecheck(SymTabScope* s) { // create new scope for body of module SymTabScope* body_scope = new SymTabScope(s); // type check body of module _block->typecheck(body_scope); void Block::typecheck(SymTabScope* s) { for (int i = 0; i < _decls->length(); i++) { _decls->fetch(i)->typecheck(_scope); for ( i = 0; i < _stmts->length(); i++) { _stmts->fetch(i)->typecheck(_scope); 1) adding else to if 2) adding for statements 3) adding break statements 4) adding constant expressions 5) adding arrays 6) adding call-by-reference 7) adding return statements Consult the project description! Susan Eggers 29 CSE 401 Susan Eggers 30 CSE 401

Susan Eggers 31 CSE 401 Type checking records. An implementation For the type record : represent record type & fields of record represent public vs. private nature of fields type R = record begin public x:int; public a:array[10] of bool; private m:char; end record; Represent record type using a symbol table for fields class RecordType: public Type { SymTabScope* _fields; ; Add RecordSTE symbol table entries for user-defined types (R) Need to be able to: give names to user-defined record types access fields of record values var r:r; r.x Add FieldSTE symbol table entries for fields (r.x) For public vs. private, add boolean flag to SymTabEntry class FieldSTE : public SymTabEntry { public: FieldSTE(char* name, Type* t, bool p) : SymTabEntry(name, t, p) { ; Susan Eggers 32 CSE 401 An implementation Type equivalence To type check r.x: type check r check it s a record lookup x in r s symbol table check that it s public, or that current scope is nested in record (private) extract & return type of x var r record pointer to ST for fields field x integer public location (later) field a array[10] of bool public location (later) field m char private location (later) Type checking often involves knowing when two types are equal implemented in PL/0 with Type::same function When is one type equal to another? Obvious for basic types like int, char, string What about type constructors like arrays? (pl0) var a1 :array[10] of int; var a2,a3 :array[10] of int; var a4 :array[20] of int; var a5 :array[10] of bool; Susan Eggers 33 CSE 401 Susan Eggers 34 CSE 401 Structural vs. name equivalence Structural vs. name equivalence Structural equivalence: two types are equal if they have same structure basic types type constructors: same constructor structurally equivalent arguments to constructor, recursively example (Pascal) type ar1 = array [1..10] of integer; type ar2 = array [1..10] of integer; var myarray : ar1; var yourarray : ar2; myarray := yourarray; implement with recursive implementation of same Name equivalence: two types are equal if they came from the same textual occurrence of a type constructor example (Ada) type LIST_10 is array (1..10) of integer; C, D : LIST_10 E : LIST_10 requires that all types have a name defined names type celsius is FLOAT; type fahrenheit is FLOAT; anonymous types A: array (Integer range 1..10) of integer; (type Anonymous1 is array (Integer range 1..10) of Integer;) each declaration has a different name B: array (Integer range 1..10) of integer; A: array (Integer range 1..10) of integer; Susan Eggers 35 CSE 401 Susan Eggers 36 CSE 401

Susan Eggers 37 CSE 401 Structural vs. name equivalence Structural vs. name equivalence type count is int; type index is int; sheep, toten : count; blessings : count; a : index; b : index; TYPE t1 = ARRAY [1..10] OF INTEGER, t2 = t1; TYPE t3 = ARRAY [1..10] OF INTEGER; VAR x: t1; y: t2; z: t3; w: ARRAY [1..10] OF INTEGER; Susan Eggers 38 CSE 401 Type conversions and coercions Type conversions and coercions Why needed: types have different representations different machine instructions used for different types Implicit conversion (coercion) type system does the conversion automatically system must insert unary conversion operators as part of type checking done where code does not make sense for type checking or code generation example (C) i = j + 2.1416 j converted to real; real truncated to int + programmer not have to code it - only coerce if no loss in precision: widening e.g., an object of type int to one of type float Susan Eggers 39 CSE 401 Susan Eggers 40 CSE 401 Type conversions and coercions Overloading Explicit conversion conversion stated in the code using unary operators ( casting ) or built-in functions + provides more flexibility in kinds of conversions - programmer has to do it Symbol has different meaning depending on the context e.g., same operation on different types + for integer add + for floating point add Different implementations for each type example of functions (Modula-2) i := TRUNC (FLOAT (j) + 2.1416) j converted to real; real truncated to int example of functions (Fortran) INT(r): real, double, complex to integer REAL(i): integer to real DBLE(i): integer to double CMPLX(i): integer to complex example of casting (C) float pi; int pentium; (int)pi / pentium; Why do it? to avoid proliferation of names Overloading is resolved when a unique type is determined semantic rules (e.g., look at types of operands for +) context determines the type example that uses both (Ada) function * (i,j:integer) return integer; function * (i,j:integer) return complex; function * (x,y:complex) return complex; means that (3*5)*complexValue is complex means that (3*5)*integerValue is integer bottom-up type checking choose a unique operation top-down Susan Eggers 41 CSE 401 Susan Eggers 42 CSE 401

Susan Eggers 43 CSE 401 Polymorphic functions Example (C++) Non-polymorphic function: arguments have fixed type Polymorphic functions: arguments can have different types why do it? support for ADTs one implementation binding between code & type done dynamically example (C): pointer operator & example (ML) fun first(x,y) = x first (2,3) first (2.0,3.0) first ([1,2], [3,4]) typecheck is overloaded (different implementation depending on the context) If you consider the receiver an argument, than typecheck is a polymorphic function (its argument has different types) virtual Type* typecheck(symtabscope* s) { Plzero->fatal("need to implement"); return NULL; Type* VarRef::typecheck(SymTabScope* s) { SymTabEntry* ste = s->lookup(_ident); if (ste == NULL) { Plzero->typeError( undeclared var ); if (! ste->isconstant() &&! ste->isvariable()) { Plzero->typeError( not a var or const ); return ste->type(); Susan Eggers 44 CSE 401 Implementing a language Interpreters Given type-checked AST program representation: can generate target program that is then run separately (compiler) can interpret AST directly, carry out operations given data input (interpreter) Simulate the program: Create data structures to represent run-time program state activation record for each called procedure environment to store local variable bindings pointer to calling activation record (dynamic link) for procedure return pointer to lexically-enclosing activation record/ environment (static link) for variable lookups Interpretation loop that evaluates the AST and executes code to carry out the operations Susan Eggers 45 CSE 401 Susan Eggers 46 CSE 401 Pros and cons of interpretation Compilation + simple conceptually, easy to implement + good programming environment for program development & debugging + some machine independence - slow to execute evaluation overhead vs. direct machine instructions no optimizations across AST nodes program text is reexamined & analyzed variable lookup vs. registers or direct access data structures for values vs. machine registers & stack Divide interpreter run-time into two parts: compile-time run-time Compile-time does preprocessing perform analysis of source code & synthesis of target code at compile-time once produce an equivalent but more efficient program in machine language that gets run many times Only advantage over interpreters: faster running programs Susan Eggers 47 CSE 401 Susan Eggers 48 CSE 401

Susan Eggers 49 CSE 401 Compile-time processing Decide representation of run-time data values Decide where data will be stored format of in-memory data structures (e.g. records, arrays) registers format of stack frames global memory Do optimizations across instructions Generate machine code to do basic operations just like interpreting expression, except generate code that will evaluate it later