CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Similar documents
Syntax Errors; Static Semantics

Static Semantics. Lecture 15. (Notes by P. N. Hilfinger and R. Bodik) 2/29/08 Prof. Hilfinger, CS164 Lecture 15 1

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

Lecture 16: Static Semantics Overview 1

The role of semantic analysis in a compiler

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

Lecture 7: Type Systems and Symbol Tables. CS 540 George Mason University

Semantic Analysis. Lecture 9. February 7, 2018

(Not Quite) Minijava

Project Compiler. CS031 TA Help Session November 28, 2011

1 Lexical Considerations

The Compiler So Far. Lexical analysis Detects inputs with illegal tokens. Overview of Semantic Analysis

CS 330 Lecture 18. Symbol table. C scope rules. Declarations. Chapter 5 Louden Outline

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis?

UNIT-4 (COMPILER DESIGN)

Semantic Analysis. Compiler Architecture

Lexical Considerations

Introduction to Programming Using Java (98-388)

Compiler Theory. (Semantic Analysis and Run-Time Environments)

CSE P 501 Compilers. Static Semantics Hal Perkins Winter /22/ Hal Perkins & UW CSE I-1

Lexical Considerations

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Today's Topics. CISC 458 Winter J.R. Cordy

Midterm 2 Solutions Many acceptable answers; one was the following: (defparameter g1

Static Semantics. Winter /3/ Hal Perkins & UW CSE I-1

Compilers. Type checking. Yannis Smaragdakis, U. Athens (original slides by Sam

Every language has its own scoping rules. For example, what is the scope of variable j in this Java program?

Flow Control. CSC215 Lecture

Type Inference Systems. Type Judgments. Deriving a Type Judgment. Deriving a Judgment. Hypothetical Type Judgments CS412/CS413

LECTURE 14. Names, Scopes, and Bindings: Scopes

Time : 1 Hour Max Marks : 30

5. Semantic Analysis!

CS558 Programming Languages

CSE 504. Expression evaluation. Expression Evaluation, Runtime Environments. One possible semantics: Problem:

Semantic Processing. Semantic Errors. Semantics - Part 1. Semantics - Part 1

Programming Languages Third Edition. Chapter 7 Basic Semantics

School of Computer Science CPS109 Course Notes 5 Alexander Ferworn Updated Fall 15

Lecture 15 CIS 341: COMPILERS

Java Primer 1: Types, Classes and Operators

C++ Support Classes (Data and Variables)

5. Semantic Analysis. Mircea Lungu Oscar Nierstrasz

Computer Science & Information Technology (CS) Rank under AIR 100. Examination Oriented Theory, Practice Set Key concepts, Analysis & Summary

Principle of Complier Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking

CSE 504: Compiler Design. Intermediate Representations Symbol Table

When do We Run a Compiler?

Procedure and Object- Oriented Abstraction

Semantic Analysis and Type Checking

LECTURE 18. Control Flow

Acknowledgement. CS Compiler Design. Semantic Processing. Alternatives for semantic processing. Intro to Semantic Analysis. V.

Tail Calls. CMSC 330: Organization of Programming Languages. Tail Recursion. Tail Recursion (cont d) Names and Binding. Tail Recursion (cont d)

Pace University. Fundamental Concepts of CS121 1

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

G Programming Languages - Fall 2012

CSI33 Data Structures

SPARK-PL: Introduction

QUIZ on Ch.5. Why is it sometimes not a good idea to place the private part of the interface in a header file?

Topics Covered Thus Far. CMSC 330: Organization of Programming Languages. Language Features Covered Thus Far. Programming Languages Revisited

CS558 Programming Languages

Run-time Environments - 2

The SPL Programming Language Reference Manual

CS558 Programming Languages. Winter 2013 Lecture 3

CSE 307: Principles of Programming Languages

The Structure of a Syntax-Directed Compiler

Intermediate Representations & Symbol Tables

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

About this exam review

CMSC 330: Organization of Programming Languages

CA Compiler Construction

CSE 401 Midterm Exam Sample Solution 2/11/15

Homework #3 CS2255 Fall 2012

CSCE 314 Programming Languages. Type System

A Short Summary of Javali

Ruby: Introduction, Basics

CSC 467 Lecture 13-14: Semantic Analysis

Compiler Construction

Names, Scope, and Bindings

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

CMSC 330: Organization of Programming Languages. OCaml Imperative Programming

CSE 5317 Midterm Examination 4 March Solutions

CS558 Programming Languages

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

Review of the C Programming Language for Principles of Operating Systems

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

Principles of Programming Languages Topic: Scope and Memory Professor Louis Steinberg Fall 2004

Scope. Chapter Ten Modern Programming Languages 1

The compilation process is driven by the syntactic structure of the program as discovered by the parser

CMSC 330: Organization of Programming Languages. OCaml Imperative Programming

Control Flow. COMS W1007 Introduction to Computer Science. Christopher Conway 3 June 2003

CS 112 Introduction to Computing II. Wayne Snyder Computer Science Department Boston University

Names, Scope, and Bindings

CS5363 Final Review. cs5363 1

Overview of Semantic Analysis. Lecture 9

G Programming Languages Spring 2010 Lecture 4. Robert Grimm, New York University

More On Syntax Directed Translation

CS 4201 Compilers 2014/2015 Handout: Lab 1

IPCoreL. Phillip Duane Douglas, Jr. 11/3/2010

CS61C Machine Structures. Lecture 4 C Pointers and Arrays. 1/25/2006 John Wawrzynek. www-inst.eecs.berkeley.edu/~cs61c/

Transcription:

CS1622 Lecture 15 Semantic Analysis CS 1622 Lecture 15 1 Semantic Analysis How to build symbol tables How to use them to find multiply-declared and undeclared variables. How to perform type checking CS 1622 Lecture 15 2 The Compiler So Far Lexical analysis Detects inputs with illegal tokens e.g.: main$ (); Parsing Detects inputs with ill-formed parse trees e.g.: missing semicolons Semantic analysis Last front end phase Catches all remaining errors CS 1622 Lecture 15 3 1

Introduction typical semantic errors: multiple declarations: a variable should be declared (in the same scope) at most once undeclared variable: a variable should not be used before being declared. type mismatch: type of the left-hand side of an assignment should match the type of the righthand side. wrong arguments: methods should be called with the right number and types of arguments. CS 1622 Lecture 15 4 An sample semantic analyzer works in two phases i.e., it traverses the AST created by the parser: For each scope in the program: process the declarations = add new entries to the symbol table and report any variables that are multiply declared process the statements = find uses of undeclared variables, and update the "ID" nodes of the AST to point to the appropriate symbol-table entry. Process all of the statements in the program again, use the symbol-table information to determine the type of each expression, CS 1622 Lecture and to 15find type errors. 5 Symbol Table = Map keys -> values purpose: keep track of names declared in the program names of variables, classes, fields, methods, etc. symbol table entry: associates a name with a set of attributes, e.g.: kind of name (variable, class, field, method, etc) type (int, float, etc) nesting level memory location (i.e., where will it be found at runtime). CS 1622 Lecture 15 6 2

Scoping symbol table design influenced by what kind of scoping is used by the compiled language In most languages, the same name can be declared multiple times if its declarations occur in different scopes, and/or involve different kinds of names. CS 1622 Lecture 15 7 Scoping: example Java: can use same name for a class, instance variables, a method of the class, and a local variable of the method legal Java program: class Test { int Test; void Test( ) { double Test; CS 1622 Lecture 15 8 Scoping: overloading Java and C++ (but not in Pascal or C): can use the same name for more than one method as long as the number and/or types of parameters are unique. int add(int a, int b); float add(float a, float b); CS 1622 Lecture 15 9 3

Scoping: general rules The scope rules of a language: determine which declaration of a named object corresponds to each use of the object. i.e., scoping rules map uses of objects to their declarations. C++ and Java use static scoping: mapping from uses to declarations is made at compile time. C++ uses the "most closely nested" rule a use of variable x matches the declaration in the most closely enclosing scope such that the declaration precedes the use. CS 1622 Lecture 15 10 Scope levels Each function has two or more scopes: one for the parameters, one for the function body, and possibly additional scopes in the function for each for loop and each nested block (delimited by curly braces) CS 1622 Lecture 15 11 Example void f( int k ) { int k = 0; while (k) { int k = 1; variable, in a loop // k is a parameter // also a local variable // another local the outmost scope includes just the name "f", and function f itself has three (nested) scopes: The outer scope for f just includes parameter k. The next scope is for the body of f, and includes the variable k that is initialized to 0. The innermost scope is for the body of the while loop, and includes the CS 1622 variable Lecture k 15that is initialized to 1. 12 4

Dynamic scoping Not all languages use static scoping. Lisp, APL, and Snobol use dynamic scoping. Dynamic scoping: A use of a variable that has no corresponding declaration in the same function corresponds to the declaration in the most-recently-called still active function. CS 1622 Lecture 15 13 Attributes about symbols Variables - textual name, type, nesting level, offset in storage Methods or procedures - number and type of arguments, type of return value, nesting level Structures - number of dimensions, type of index, type of element Call by reference parameters and pointers - possible aliases CS 1622 Lecture 15 14 Symbol table structure affected by scoping We will focus on static scoping - dynamic scoping seldom used except in rare conditions e.g., exception handling in Java. can a name be used before they are defined? Different rules for static scoping Everything must be declared before used Java: a method or instance variable name can be used before the definition appears, not true for a local variable (defined within method) Most closely nested rule What is a scope? Loop, block, parameters CS 1622 Lecture 15 15 5

Example class Test { void f() { val = 0; // instance variable val has not yet been declared -- OK g(); // method g has not yet been declared -- OK x = 1; // var x has not yet been declared -- ERROR! int x; void g() { int val; CS 1622 Lecture 15 16 Simplification From now on, assume that our language:(rules for Simple) uses static scoping requires that all names be declared before used does not allow multiple declarations of a name in the same scope (even for different kind of names) does allow the same name to be declared in multiple nested scopes uses the same scope for a method's parameters and for the local variables declared at the beginning of the method This will make your life in P4 much easier! CS 1622 Lecture 15 17 Symbol Table Implementations In addition to the above simplification, assume that the symbol table will be used to answer two questions: Given a declaration of a name, is there already a declaration of the same name in the current scope i.e., is it multiply declared? Given a use of a name, to which declaration does it correspond (using the "most closely nested" rule), or is it undeclared? CS 1622 Lecture 15 18 6

Note The symbol table is only needed to answer those two questions, i.e. after all declarations have been processed to build the symbol table, and all uses have been processed to link each ID node in the abstract-syntax tree with the corresponding symbol-table entry, then the symbol table functions are no longer needed because no more lookups based on name will be performed CS 1622 Lecture 15 19 What operation do we need? Given the above assumptions, we will need: Look up a name in the current scope only to check if it is multiply declared Look up a name in the current and enclosing scopes to check for a use of an undeclared name, and to link a use with the corresponding symbol-table entry Insert a new name into the symbol table with its attributes. Change symbol table when a new scope is entered. Change symbol table when a scope is exited. CS 1622 Lecture 15 20 Two possible symbol table implementations a list of tables a table of lists For each approach, we will consider what must be done when entering and exiting a scope, when processing a declaration, and when processing a use. Simplification: assume each symbol-table entry includes only: the symbol name its type CS 1622 Lecture 15 21 the nesting level of its declaration 7

Method 1: List of Hash tables The idea: symbol table = a list of hash tables, one hash table for each currently visible scope. When processing a scope S: front of list end of list declarations made in S declarations made in scopes that enclose S CS 1622 Lecture 15 22 Example: void f(int a, int b) { double x; while (...) { int x, y;... void g() { f(); After processing declarations inside the while loop: format: var: type, nesting level x: int, 3 y: int, 3 a: int, 2 b: int, 2 x: double, 2 f: (int, int) void, 1 CS 1622 Lecture 15 23 List of hash tables: the operations On scope entry: increment the current level number add a new empty hash table to the front of the list. To process a declaration of x: look up x in the first table in the list - hash on name: h(x) If it is there, then issue a "multiply declared variable" error; otherwise, add x to the first table in the list. CS 1622 Lecture 15 24 8

continued To process a use of x: look up x starting in the first table in the list; if it is not there, then look up x in each successive table in the list. if it is not in any table then issue an "undeclared variable" error. On scope exit, remove the first table from the list and decrement the current level number. CS 1622 Lecture 15 25 Remember for our language Simple method names belong into the hash table for the outermost scope not into the same table as the method's variables For example, in the example above: method name f is in the symbol table for the outermost scope name f is not in the same scope as parameters a and b, and variable x. This is so that when the use of name f in method g is processed, the name is found in an enclosing scope's table. CS 1622 Lecture 15 26 The running times for each operation: Scope entry: time to initialize a new, empty hash table; probably proportional to the size of the hash table. Process a declaration: using hashing, constant expected time (O(1)). Process a use: using hashing to do the lookup in each table in the list, the worst-case time is O(depth of nesting) Scope exit: time to remove a table from the list, if storage reclamation is ignored O(1) CS 1622 Lecture 15 27 9

Consider another language C++ does not use exactly the scoping rules that we have been assuming. In particular, C++ does allow a function to have both a parameter and a local variable with the same name any uses of the name refer to the local variable Questions 1: Consider the following code. What would the symbol table look like after processing the declarations in the body of f under: the scoping rules we have been assuming C++ scoping rules void g(int x, int a) { void f(int x, int y, int z) { int a, b, x;... CS 1622 Lecture 15 28 continued Question 2: Which of the four operations described above scope entry, process a declaration, process a use, scope exit would change (and how) if the following rules for name reuse were used instead of C++ rules: the same name can be used within one scope as long as the uses are for different kinds of names, and the same name cannot be used for more than one variable declaration in a nested scope CS 1622 Lecture 15 29 Method 2: Hash table of Lists the idea: when processing a scope S, the structure of the symbol table is: x: y: z: CS 1622 Lecture 15 30 10

Definition there is just one big hash table, containing an entry for each variable for which there is some declaration in scope S or in a scope that encloses S. Associated with each variable is a list of symbol-table entries. The first list item corresponds to the most closely enclosing declaration; the other list items correspond to declarations in enclosing scopes. CS 1622 Lecture 15 31 Example void f(int a) { double x; while (...) { int x, y;... void g() { f(); f: x: y: z: int void, 1 int, 2 int, 3 double, 2 int, 3 CS 1622 Lecture 15 32 Nesting level information is crucial the level-number attribute stored in each list item enables us to determine whether the most closely enclosing declaration was made in the current scope or in an enclosing scope. CS 1622 Lecture 15 33 11

Hash table of lists: the operations On scope entry: increment the current level number. To process a declaration of x: look up x in the symbol table. If x is there, fetch the level number from the first list item. If that level number = the current level then issue a "multiply declared variable" error; otherwise, add a new item to the front of the list with the appropriate type and the current level number. CS 1622 Lecture 15 34 process use of x: look up x in the symbol table. If it is not there, then issue an "undeclared variable" error. On scope exit: must restore table to previous nesting level sometimes must keep entry around Three ways: scan all entries in the symbol table, looking at the first item on each list. If that item's level number = the current level number, then remove it from its list (and if the list becomes empty, remove the entire symbol-table entry). Finally, decrement the current level number. CS 1622 Lecture 15 35 Exiting from scope - continued 2.Use extra pointer in each element to link all items of the same scope 3. Use a stack that has entries for each scope level 1 level 2 level 3 Pop entry and delete from hash table - keep stack entries around CS 1622 Lecture 15 36 12

What if collisions? void f(int a) { double x; while (...) { int x, y;... void g() { f(); After processing the declarations inside the while loop: assume x and f collide. f,x: x,int, 2 f,int void, 1 y: z: y,int, 3 y,double, 2 z,int, 3 CS 1622 Lecture 15 37 Running times Scope entry: time to increment the level number, O(1). Process a declaration: using hashing, constant expected time (O(1)). Process a use: using hashing, constant expected time (O(1)). Scope exit: time proportional to the number of names in the symbol table (or perhaps even the size of the hashtable if no auxiliary information is maintained to allow iteration through the non-empty hashtable buckets). CS 1622 Lecture 15 38 Another example Assume that the symbol table is implemented using a hash table of lists. How does the symbol table changes as each declaration in the following code is processed. void g(int x, int a) { double d; while (...) { int d, w; double x, b; if (...) { int a,b,c; while (...) { int x,y,z; CS 1622 Lecture 15 39 13

Type Checking the job of the type-checking phase is to: Determine the type of each expression in the program (each node in the AST that corresponds to an expression) Find type errors The type rules of a language define how to determine expression types, and what is considered to be an error. The type rules specify, for every operator (including assignment), what types the operands can have, and what is the type of the result. CS 1622 Lecture 15 40 Types Basic types (primitives): integers, characters, booleans Input/output streams - pointer to buffer Aggregate - arrays User defined - classes, enumerated CS 1622 Lecture 15 41 Example both C++ and Java allow the addition of an int and a double, and the result is of type double. However, C++ also allows a value of type double to be assigned to a variable of type int, Java considers that an error. CS 1622 Lecture 15 42 14

Determining the types of expressions Type of a Integer Integer Real Real Double Type of b Integer Double Real Double Double Type of a+b Integer Double Real Double Double CS 1622 Lecture 15 43 Rules are language dependent List as many of the operators that can be used in a Java program as you can think of don't forget to think about the logical and relational operators as well as the arithmetic ones For each operator, say what types the operands may have, and what is the type of the result. CS 1622 Lecture 15 44 Other type errors the type checker must also find type errors having to do with expression context e.g., the context of some operators must be boolean, type errors having to do with method calls. Examples of the context errors: the condition of an if statement the condition of a while loop the termination condition part of a for loop Examples of method errors: calling something that is not a method calling a method with the wrong number of arguments calling a method with arguments of the wrong types CS 1622 Lecture 15 45 15

How to infer types Can use syntax directed translation schemes- saw an example last time Can use type rules and AST Type rules for Simple are fairly easy. CS 1622 Lecture 15 46 Example 2: Compute the type of an expression exp -> exp + exp if ((exp 2.trans == INT) and (exp 3.trans == INT) then exp 1.trans = INT else exp 1.trans = ERROR exp -> exp and exp if ((exp 2.trans == BOOL) and (exp 3.trans == BOOL) then exp 1.trans = BOOL else exp 1.trans = ERROR exp -> exp == exp if ((exp 2.trans == exp 3.trans) and (exp 2.trans!= ERROR)) then exp 1.trans = BOOL else exp 1.trans = ERROR exp -> true exp.trans = BOOL exp -> false exp.trans = BOOL exp -> int exp.trans = INT exp -> ( exp ) exp 1.trans = exp 2.trans CS 1622 Lecture 15 47 Status We have covered the front-end phases Lexical analysis Parsing Semantic analysis Next are the back-end phases Optimization Code generation We ll do code generation first... CS 1622 Lecture 15 48 16

Run-time environments Before discussing code generation, we need to understand what we are trying to generate There are a number of standard techniques for structuring executable code that are widely used Run time environment - what environment exists when the code is executing? CS 1622 Lecture 15 49 Issues in run time environments Management of run-time resources Correspondence between static (compile-time) and dynamic (run-time) structures Storage organization CS 1622 Lecture 15 50 Run-time Resources Execution of a program is initially under the control of the operating system When a program is invoked: The OS allocates space for the program The code is loaded into part of the space The OS jumps to the entry point (i.e., main ) CS 1622 Lecture 15 51 17