SEMANTIC ANALYSIS TYPES AND DECLARATIONS

Similar documents
CS143 Handout 14 Autumn 2007 October 24, 2007 Semantic Analysis

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Programming Languages Third Edition. Chapter 7 Basic Semantics

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far

1 Lexical Considerations

The role of semantic analysis in a compiler

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

CSCE 314 Programming Languages. Type System

CS 406: Syntax Directed Translation

Syntax Errors; Static Semantics

Acknowledgement. CS Compiler Design. Semantic Processing. Alternatives for semantic processing. Intro to Semantic Analysis. V.

Lexical Considerations

Semantic Analysis and Type Checking

Informal Semantics of Data. semantic specification names (identifiers) attributes binding declarations scope rules visibility

Short Notes of CS201

Compiler Theory. (Semantic Analysis and Run-Time Environments)

Lexical Considerations

Review of the C Programming Language

Static Semantics. Lecture 15. (Notes by P. N. Hilfinger and R. Bodik) 2/29/08 Prof. Hilfinger, CS164 Lecture 15 1

CS201 - Introduction to Programming Glossary By

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Introduction to Programming Using Java (98-388)

UNIT-4 (COMPILER DESIGN)

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis?

Language Reference Manual simplicity

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

CA Compiler Construction

COMP 181. Agenda. Midterm topics. Today: type checking. Purpose of types. Type errors. Type checking

A programming language requires two major definitions A simple one pass compiler

SE352b: Roadmap. SE352b Software Engineering Design Tools. W3: Programming Paradigms

Review of the C Programming Language for Principles of Operating Systems

LECTURE 3. Compiler Phases

Lecture 7: Type Systems and Symbol Tables. CS 540 George Mason University

Static Checking and Type Systems

IPCoreL. Phillip Duane Douglas, Jr. 11/3/2010

CSE P 501 Compilers. Static Semantics Hal Perkins Winter /22/ Hal Perkins & UW CSE I-1

Static Semantics. Winter /3/ Hal Perkins & UW CSE I-1

Intermediate Code Generation

Compilers. Type checking. Yannis Smaragdakis, U. Athens (original slides by Sam

A Simple Syntax-Directed Translator

5. Semantic Analysis!

Syntax and Grammars 1 / 21

Informatica 3 Syntax and Semantics

11. a b c d e. 12. a b c d e. 13. a b c d e. 14. a b c d e. 15. a b c d e

LECTURE 18. Control Flow

Type Checking. Outline. General properties of type systems. Types in programming languages. Notation for type rules.

CSE 504. Expression evaluation. Expression Evaluation, Runtime Environments. One possible semantics: Problem:

The Compiler So Far. Lexical analysis Detects inputs with illegal tokens. Overview of Semantic Analysis

Outline. General properties of type systems. Types in programming languages. Notation for type rules. Common type rules. Logical rules of inference

Types. What is a type?

Symbol Tables. For compile-time efficiency, compilers often use a symbol table: associates lexical names (symbols) with their attributes

Programming Languages, Summary CSC419; Odelia Schwartz

The Decaf Language. 1 Lexical considerations

Types and Type Inference

5. Semantic Analysis. Mircea Lungu Oscar Nierstrasz

5. Semantic Analysis. Mircea Lungu Oscar Nierstrasz

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

CSE 307: Principles of Programming Languages

Compiler construction in4020 lecture 5

Project Compiler. CS031 TA Help Session November 28, 2011

Symbol Table Information. Symbol Tables. Symbol table organization. Hash Tables. What kind of information might the compiler need?

Classes, Structs and Records. Internal and External Field Access

Compilers. Compiler Construction Tutorial The Front-end

The compilation process is driven by the syntactic structure of the program as discovered by the parser

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

IC Language Specification

CS201 Some Important Definitions

Semantic Processing. Semantic Errors. Semantics - Part 1. Semantics - Part 1

Concepts Introduced in Chapter 7

CS5363 Final Review. cs5363 1

Intermediate Code Generation

Semantic actions for declarations and expressions

Programming Languages Third Edition. Chapter 10 Control II Procedures and Environments

G Programming Languages - Fall 2012

Lecture 16: Static Semantics Overview 1

MID TERM MEGA FILE SOLVED BY VU HELPER Which one of the following statement is NOT correct.

CS /534 Compiler Construction University of Massachusetts Lowell. NOTHING: A Language for Practice Implementation

Semantic actions for declarations and expressions. Monday, September 28, 15

Time : 1 Hour Max Marks : 30

Single-pass Static Semantic Check for Efficient Translation in YAPL

Semantic actions for declarations and expressions

The SPL Programming Language Reference Manual

The Structure of a Syntax-Directed Compiler

CS558 Programming Languages

Compiler construction 2002 week 5

CSE 12 Abstract Syntax Trees

Semantic Analysis. How to Ensure Type-Safety. What Are Types? Static vs. Dynamic Typing. Type Checking. Last time: CS412/CS413

The Decaf language 1

Undergraduate Compilers in a Day

Today's Topics. CISC 458 Winter J.R. Cordy

Fundamentals of Programming Languages

Symbol Tables. ASU Textbook Chapter 7.6, 6.5 and 6.3. Tsan-sheng Hsu.

Compilers CS S-05 Semantic Analysis

Concepts Introduced in Chapter 6

G Programming Languages Spring 2010 Lecture 4. Robert Grimm, New York University

Context Analysis. Mooly Sagiv. html://

Chapter 5 Names, Binding, Type Checking and Scopes

Intermediate Code Generation

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

Transcription:

SEMANTIC ANALYSIS CS 403: Type Checking Stefan D. Bruda Winter 2015 Parsing only verifies that the program consists of tokens arranged in a syntactically valid combination now we move to check whether they form a sensible set of instructions in the programming language semantic analysis Any noun phrase followed by some verb phrase makes a syntactically correct English sentence, but a semantically correct one has subjectverb agreement has proper use of gender the components go together to express an idea that makes sense For a program to be semantically valid: all variables, functions, classes, etc. must be properly defined expressions and variables must be used in ways that respect the type system access control must be respected etc. Note however that a valid program is not necessariy correct int Fibonacci(int n) { if (n <= 1) return 0; // should be return 1;! return Fibonacci(n - 1) + Fibonacci(n - 2); int main() { Print(Fibonacci(40)); Valid but not correct! CS 403: Type Checking (S. D. Bruda) Winter 2015 1 / 18 SEMANTIC ANALYSIS (CONT D) Reject the largest number of incorrect programs Accept the largest number of correct programs Do so quickly! http://xkcd.com/303/ Some semantic analysis done while parsing (syntax directed translation) Some languages specifically designed for exclusive syntax directed translation (one-pass compilers) Other languages require repeat traversals of the AST after parsing Several components of semantic analysis: Type and scope checking Other semantic rules (language dependent) CS 403: Type Checking (S. D. Bruda) Winter 2015 2 / 18 TYPES AND DECLARATIONS A type is a set of values and a set of operations operating on those values Three categories of types in most programming languages: Base types (int, float, double, char, bool, etc.) primitive types provided directly by the underlying hardware Compound types (enums, arrays, structs, classes, etc.) types are constructed as aggregations of the base types Complex types (lists, stacks, queues, trees, heaps, tables, etc) abstract data types, may or may not exist in a language In many languages the programmer must first establish the name, type, and lifetime of a data object (variable, function, etc.) through declarations Most type systems rely on declarations Notable exceptions: functional languages that do not require declarations but work hard to infer the data types of variables from the code CS 403: Type Checking (S. D. Bruda) Winter 2015 3 / 18

TYPE CHECKING THE SYMBOL TABLE INTERFACE The bulk of semantic analysis = the process of verifying that each operation respects the type system of the language Generally means that all operands in any expression are of appropriate types and number Sometimes the rules are defined by other parts of the code (e.g., function prototypes), and sometimes such rules are a part of the language itself (e.g., both operands of a binary arithmetic operation must be of the same type ) Type checking can be done compilation, during execution, or across both A language is considered strongly typed if each and every type error is detected during compilation Static type checking is done at compile-time The information needed is obtained via declarations and stored in a master symbol table The types involved in each operation are then checked Dynamic type checking is implemented by including type information for each data location at run time The symbol table is used to keep track of which declaration is in effect upon encountering a reference to an id Used in both type and scope checking, so it must keep track of scopes as well as declarations A suitable interface therefore contains the following functions ENTERSYMBOL(name, type) adds the id name in the symbol table (current scope) with type type RETRIEVESYMBOL(name) returns the currently valid entry in the symbol table for name or a null pointer if no such entry exists OPENSCOPE() opens a new scope so that any new symbols will be processed in the new scope CLOSESCOPE() closes the current scope, so that all references revert to the outer scope DECLAREDLOCALLY(name) tests whether name is declared in the current scope CS 403: Type Checking (S. D. Bruda) Winter 2015 4 / 18 CS 403: Type Checking (S. D. Bruda) Winter 2015 5 / 18 THE SYMBOL TABLE IMPLEMENTATION The symbol table is an association list, capable of storing pairs key-data and retrieve stored data based on key values Some additional complications are caused by the existence of scopes (to be addressed later) The usual suspects provide adequate implementations; the most efficient include Balanced binary search trees O(log n) access Note that simple binary search trees will likely be inefficient since keys (variable names) are seldom random so the tree is likely to be unbalanced Hash tables particularly suited for implementing association lists, the most used data structure in practice CS 403: Type Checking (S. D. Bruda) Winter 2015 6 / 18 TYPE CHECKER DESIGN Design process defining a type system: 1 Identify the types that are available in the language 2 Identify the language constructs that have types associated with them 3 Identify the semantic rules for the language C++-like language example (declarations required = somewhat strongly typed) Base types (int, double, bool, string) + compound types (arrays, classes) Arrays can be made of any type (including other arrays) ADTs can be constructed using classes (no need to handle them separately) Type-related language constructs: Constants: type given by the lexical analysis Variables: all variables must have a declared type (base or compound) Functions: precise type signature (arguments + return) Expressions: each expression has a type based on the type of the composing constant, variable, return type of the function, or type of operands Other constructs (if, while, assignment, etc.) also have associate types (since they have expressions inside) Semantic rules govern what types are allowable in the various language constructs Rules specific to individual constructs: operand to a unary minus must either be double or int, expression used in a loop test must be of bool type, etc. General rules: all variables must be declared, all classes are global, etc. CS 403: Type Checking (S. D. Bruda) Winter 2015 7 / 18

TYPE CHECKING IMPLEMENTATION First step: record type information with each identifier The lexical analyzer gives the name The parser needs to connect that name with the type (based on declaration) This information is stored in a symbol table When building the node for a var construct (say, int a;) the parser can associate the type (int) with the variable (a) A suitable entry in the symbol table can them be created Typically the symbol table is stored outside the AST The class or struct entry in a symbol table is a table in itself (recording all fields and their types) decl ::= var ; decl var ::= type identifier type ::= int bool double string identifier type [ ] Second step: verify language constructs for type consistency Can be done while parsing (in such a case declarations must precede use) Can also be done in a subsequent parse tree traversal (more flexible on the placement of declarations) Second step: verify language constructs for type consistency, continued 1 Verification based on the rules of the grammar While examining an expr + expr node the types of the two expr must agree with each other and be suitable for addition While examining a id = expr the type of expr (determined recursively) must agree with the type of id (retrieved from the symbol table) Etc. expr ::= const id expr + expr expr / expr stmt ::= id = expr 2 Verification based on the general type rules of the language Examples: The index in an array selection must be of integer type The two operands to logical && must both have bool type; the result is bool type The type of each actual argument in a function call must be compatible with the type of the respective formal argument Essentially the process consists of annotating all AST nodes with type information, making sure that all annotations are consistent CS 403: Type Checking (S. D. Bruda) Winter 2015 8 / 18 CS 403: Type Checking (S. D. Bruda) Winter 2015 9 / 18 The AST annotation process is accomplished using synthesis rules Specifies how to compute the type of a node from on the types of its children Examples include: Various rules as specified in the language definition, e.g. if f has type s t and x has type s then f (x) has type t Rules for type inference (if applicable), e.g. if f (x) is an expression then for some α and β, f has type α β and x has type α Type inference is necessary in languages such as ML and HASKELL which do type checking but do not require declarations Rules for type conversions (if allowed in the language), e.g. if E 1.type = integer and E 2.type = integer then (E 1 + E 2 ).type = integer else if E 1.type = float and E 2.type = integer then (E 1 + E 2 ).type = float Rules for overloaded functions, e.g. if f can have the type s i t i for 1 i n with s i s j for i j and x has type s k then f (x) has type t k Better (more general) approach to type conversions: Establish a type hierarchy or partial order, based on the data storable in the types t 1 t 2 iff t 2 can store all the data storable in t 1 Define the function MAX(t 1, t 2 ) which returns the least upper bound of t 1 and t 2 in the partial order Define the function WIDEN(a, t 1, w) which converts if necessary expression a from type t 1 to type w If conversion is necessary then a new AST node will be inserted If no conversion is necessary then the AST is not changed if E 1.type = t 1 and E 2.type = t 2 then w MAX(t 1, t 2 ) if w is undefined then signal type error else WIDEN(E 1, t 1, w) WIDEN(E 1, t 1, w) (E 1 + E 2 ).type w CS 403: Type Checking (S. D. Bruda) Winter 2015 10 / 18 CS 403: Type Checking (S. D. Bruda) Winter 2015 11 / 18

SCOPE CHECKING Scope constrains the visibility of an identifier to some subsection of the program Local variables are only visible in the block in this they are defined Global variables are visible in the whole program A scope is a section of the program enclosed by basic program delimiters such as { in C Many languages allow nested scopes The scope defined by the innermost current such a unit is called the current scope The scopes defined by the current scope and any enclosing program units are open scopes All other scopes are closed Scope checking: given a point in the program and an identifier, determine whether that identifier is accessible at that point In essence, the program can only access identifiers that are in the currently open scopes In addition, in the event of name clashed the innermost scope wins int a; // (1) void bubble(int a) { // (2) int a; // (3) a = 2; // (3) wins! CS 403: Type Checking (S. D. Bruda) Winter 2015 12 / 18 IMPLEMENTATION OF SCOPE CHECKING Scope checking is implemented at the symbol table level, with two approaches 1 One symbol table per scope organized into a scope stack When a new scope is opened, a new symbol table is created and pushed on the stack When a scope is closed, the top table is popped All declared identifiers are put in the top table To find a name we start at the top table and continue our way down until found; if we do not find it, then the variable is not accessible 2 Single symbol table Each scope is assigned a number Each entry in the symbol table contains the number of the enclosing scope A name is searched in the table in decreasing scope number (higher number has priority) need efficient data organization for the symbol table (hash table) A name may appear in the table more than once as long as the scope numbers are different When a new scope is created, the scope number is incremented When a scope is closed, all entries with that scope number are deleted from the table and then the current scope number is decremented CS 403: Type Checking (S. D. Bruda) Winter 2015 13 / 18 IMPLEMENTATION OF SCOPE CHECKING (CONT D) SYMBOL TABLES: ADVANCED FEATURES 1 Stack of symbol tables Disadvantages Overhead in maintaining the stack structure (and creating symbol tables) Global variables at the bottom of the stack heavy penalty for accessing globals Advantages Once the symbol table is populated it remains unchanged throughout the compilation process more robust code 2 Single symbol table Disadvantages Closing a scope can be an expensive operation Advantages Efficient access to all scopes (including global variables) Compound types: types defined using other types, with arbitrary depth Common storage technique: store compound types as a tree structure struct { char *s; int n; int nums[5]; arr [12]; 12 pointer (s) int (n) array (nums) char array (arr) struct 5 int Alternate technique: Each compound type entry is a symbol table by itself (containing the names and types of the members) Overloading: multiple ids with different type signatures Possible storage techniques: have the id associated with a list of types (rather than a single type), or encode the type in the table key Type hierarchies: inheritance, interfaces, etc. CS 403: Type Checking (S. D. Bruda) Winter 2015 14 / 18 CS 403: Type Checking (S. D. Bruda) Winter 2015 15 / 18

EQUIVALENCE OF COMPOUND TYPES Equivalence of compound types can be done recursively based on the tree structure bool AreEquivalent(struct typenode *tree1, struct typenode *tree2) { if (tree1 == tree2) // if same type pointer, must be equivalent! return true; if (tree1->type!= tree2->type) // check types first return false; switch (tree1->type) { case T_INT: case T_DOUBLE:... // same base type return true; case T_PTR: return AreEquivalent(tree1->child[0], tree2->child[0]); case T_ARRAY: return AreEquivalent(tree1->child[0], tree2->child[0]) && AreEquivalent(tree1->child[1], tree2->child[1]);... Also needs some way to deal with circular types, such as marking the visited nodes so that we do not compare them ever again CS 403: Type Checking (S. D. Bruda) Winter 2015 16 / 18 EQUIVALENCE OF COMPOUND TYPES (CONT D) When are two custom types equivalent? Named equivalence: when the two names are identical Equivalence assessed by name only (just like base types) Structural equivalence: when the types hold the same kind of data (possibly recursively) Equivalence assessed by equivalence of the type trees (as above) Structural equivalence is not always easy to do, especially on infinite (graph) types Named of structural equivalence is a feature of the language Most (but not all) languages only support named equivalence Modula-3 and Algol have structural equivalence. C, Java, C++, and Ada have name equivalence. Pascal leaves it undefined: up to the implementation CS 403: Type Checking (S. D. Bruda) Winter 2015 17 / 18 TYPE (CLASS) DEFINITIONS A class definition generated a new type just like for structures/records However, this type also includes signatures for member functions Using a symbol table for each class declaration more efficient than the tree implementation Need to maintain pointers to the parent classes (if any) and to the interfaces being implemented (if any) The pointers to the interfaces must be used to verify that all the interfaces are properly implemented The pointers to the parents will be used to resolve subsequent references to the members of the class CS 403: Type Checking (S. D. Bruda) Winter 2015 18 / 18