The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

Similar documents
Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

The role of semantic analysis in a compiler

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Compilers. Type checking. Yannis Smaragdakis, U. Athens (original slides by Sam

Static Semantics. Lecture 15. (Notes by P. N. Hilfinger and R. Bodik) 2/29/08 Prof. Hilfinger, CS164 Lecture 15 1

Syntax Errors; Static Semantics

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far

Semantic Analysis. Lecture 9. February 7, 2018

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking

Lecture 16: Static Semantics Overview 1

The Compiler So Far. Lexical analysis Detects inputs with illegal tokens. Overview of Semantic Analysis

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis?

Static Semantics. Winter /3/ Hal Perkins & UW CSE I-1

Intermediate Code Generation

Type Checking. Outline. General properties of type systems. Types in programming languages. Notation for type rules.

Outline. General properties of type systems. Types in programming languages. Notation for type rules. Common type rules. Logical rules of inference

Semantic Analysis. How to Ensure Type-Safety. What Are Types? Static vs. Dynamic Typing. Type Checking. Last time: CS412/CS413

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Programming Languages Third Edition. Chapter 7 Basic Semantics

Lecture 7: Type Systems and Symbol Tables. CS 540 George Mason University

Semantic Analysis. Compiler Architecture

Semantic Analysis and Type Checking

LECTURE 3. Compiler Phases

Types. Type checking. Why Do We Need Type Systems? Types and Operations. What is a type? Consensus

Introduction to Programming Using Java (98-388)

CSCE 314 Programming Languages. Type System

Lexical Considerations

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

CS 330 Lecture 18. Symbol table. C scope rules. Declarations. Chapter 5 Louden Outline

1 Lexical Considerations

The Structure of a Syntax-Directed Compiler

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

COMP 181. Agenda. Midterm topics. Today: type checking. Purpose of types. Type errors. Type checking

Overview of Semantic Analysis. Lecture 9

CS558 Programming Languages

More On Syntax Directed Translation

Compiler Passes. Semantic Analysis. Semantic Analysis/Checking. Symbol Tables. An Example

COS 320. Compiling Techniques

Outline. Java Models for variables Types and type checking, type safety Interpretation vs. compilation. Reasoning about code. CSCI 2600 Spring

Informatica 3 Syntax and Semantics

Scoping and Type Checking

Time : 1 Hour Max Marks : 30

A Short Summary of Javali

CMSC 330: Organization of Programming Languages

Lexical Considerations

(Not Quite) Minijava

CS558 Programming Languages

SE352b: Roadmap. SE352b Software Engineering Design Tools. W3: Programming Paradigms

Informal Semantics of Data. semantic specification names (identifiers) attributes binding declarations scope rules visibility

Topics Covered Thus Far. CMSC 330: Organization of Programming Languages. Language Features Covered Thus Far. Programming Languages Revisited

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 17: Types and Type-Checking 25 Feb 08

Semantic Processing. Semantic Errors. Semantics - Part 1. Semantics - Part 1

Expressions and Assignment

5. Semantic Analysis!

Type Bindings. Static Type Binding

Topics Covered Thus Far CMSC 330: Organization of Programming Languages

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

Programmiersprachen (Programming Languages)

The SPL Programming Language Reference Manual

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

Answer: Early binding generally leads to greater efficiency (compilation approach) Late binding general leads to greater flexibility

G Programming Languages - Fall 2012

CSC 533: Organization of Programming Languages. Spring 2005

Undergraduate Compilers in a Day

CS558 Programming Languages

When do We Run a Compiler?

Ruby: Introduction, Basics

CS 4201 Compilers 2014/2015 Handout: Lab 1

Decaf Language Reference

Programming Languages, Summary CSC419; Odelia Schwartz

The compilation process is driven by the syntactic structure of the program as discovered by the parser

CS61C Machine Structures. Lecture 4 C Pointers and Arrays. 1/25/2006 John Wawrzynek. www-inst.eecs.berkeley.edu/~cs61c/

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

CSE P 501 Compilers. Static Semantics Hal Perkins Winter /22/ Hal Perkins & UW CSE I-1

Weiss Chapter 1 terminology (parenthesized numbers are page numbers)

Binding and Variables

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Compilers Project 3: Semantic Analyzer

We do not teach programming

Data Types. Every program uses data, either explicitly or implicitly to arrive at a result.

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

Functional Programming. Pure Functional Programming

Typescript on LLVM Language Reference Manual

Final CSE 131B Spring 2004

Fundamentals of Programming Languages

M/s. Managing distributed workloads. Language Reference Manual. Miranda Li (mjl2206) Benjamin Hanser (bwh2124) Mengdi Lin (ml3567)

Semantic actions for declarations and expressions

The PCAT Programming Language Reference Manual

Compiler Theory. (Semantic Analysis and Run-Time Environments)

Midterm 2 Solutions Many acceptable answers; one was the following: (defparameter g1

Announcements. Working on requirements this week Work on design, implementation. Types. Lecture 17 CS 169. Outline. Java Types

Crafting a Compiler with C (II) Compiler V. S. Interpreter

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

Acknowledgement. CS Compiler Design. Semantic Processing. Alternatives for semantic processing. Intro to Semantic Analysis. V.

Attributes, Bindings, and Semantic Functions Declarations, Blocks, Scope, and the Symbol Table Name Resolution and Overloading Allocation, Lifetimes,

Compilers CS S-05 Semantic Analysis

Formal Languages and Compilers Lecture IX Semantic Analysis: Type Chec. Type Checking & Symbol Table

5. Semantic Analysis. Mircea Lungu Oscar Nierstrasz

Transcription:

The Compiler So Far CSC 4181 Compiler Construction Scanner - Lexical analysis Detects inputs with illegal tokens e.g.: main 5 (); Parser - Syntactic analysis Detects inputs with ill-formed parse trees e.g.: missing semicolons Semantic analysis Last front end analysis phase Catches all remaining errors 1 Source code tokens AST AST Lexical Analysis Syntactic Analysis Intermediate Code Gen lexical errors syntax errors semantic errors What s wrong with this code? (Note: it parses perfectly) Beyond Syntax foo(int a, char * s){ int bar() { int f[3]; int i, j, k; char *p; float k; foo(f[6], 10, j); break; i->val = 5; j = i + k; printf( %s,%s.\n,p,q); goto label23; 2 3 Goals of a Semantic Analyzer Kinds of Checks Compiler must do more than recognize whether a sentence belongs to the language Find remaining errors that would make program invalid undefined variables, types type errors that can be caught statically Figure out useful information for later phases types of all expressions data layout Terminology Static checks done by the compiler Dynamic checks done at run time Uniqueness checks Certain names must be unique Many languages require variable declarations Flow-of-control checks Match control-flow operators with structures Example: break applies to innermost loop/switch Type checks Check compatibility of operators and operands Logical checks Program is syntactically and semantically correct, but does not do the correct thing 4 5 1

Examples of Reported Errors Program Checking Undeclared identifier Multiply declared identifier Index out of bounds Wrong number or types of args to call Incompatible types for operation Break statement outside switch/loop Goto with no label Why do we care? Obvious: Report mistakes to programmer Avoid bugs: f[6] will cause a run-time failure Help programmer verify intent How do these checks help compilers? Allocate right amount of space for variables Select right machine operations Proper implementation of control structures 6 7 Can We Catch Everything? Inlined TypeChecker and CodeGen Try compiling this code: void { int i=21, j=42; printf( Hello World\n ); printf( Hello World, N=%d\n ); printf( Hello World\n, i, j); printf( Hello World, N=%d\n ); printf( Hello World, N=%d\n ); You could type check and generate code as part of semantic actions: expr : expr PLUS expr { if ($1.type == $3.type && ($1.type == IntType $1.type == RealType)) $$.type = $1.type else error( + applied on wrong type! ); GenerateAdd($1, $3, $$); 8 9 Problems Compiler main program Difficult to read Difficult to maintain Compiler must analyze program in order parsed Instead we split up tasks void Compile() { AST tree = Parser(program); if (TypeCheck(tree)) IR ir = GenIntermedCode(tree); EmitCode(ir); Compile AST Parser gettoken readstream 10 11 2

Typical Semantic Errors Multiple declarations: a variable should be declared (in the same scope) at most once Undeclared variable: a variable should not be used before being declared Type mismatch: type of the LHS of an assignment should match the type of the RHS Wrong arguments: methods should be called with the right number and types of arguments A Sample Semantic Analyzer Works in two phases traverses the AST created by the parser 1. For each scope in the program process the declarations add new entries to the symbol table and report any variables that are multiply declared process the statements find uses of undeclared variables, and update the "ID" nodes of the AST to point to the appropriate symbol-table entry. 2. Process all of the statements in the program again use the symbol-table information to determine the type of each expression, and to find type errors. 12 13 Scoping Scoping: Overloading In most languages, the same name can be declared multiple times if its declarations occur in different scopes, and/or involve different kinds of names Java: can use same name for a class field of the class a method of the class a local variable of the method class Test { int Test; void Test( ) { double Test; Java and C++ (but not in Pascal or C): can use the same name for more than one method as long as the number and/or types of parameters are unique int add(int a, int b); float add(float a, float b); 14 15 Scoping: General Rules Scope levels The scope rules of a language: Determine which declaration of a named object corresponds to each use of the object Scoping rules map uses of objects to their declarations C++ and Java use static scoping: Mapping from uses to declarations at compile time C++ uses the "most closely nested" rule a use of variable x matches the declaration in the most closely enclosing scope such that the declaration precedes the use Each function has two or more scopes: One for the function body Sometimes parameters are separate scope! (Not true in C) void f( int k ) { // k is a parameter int k = 0; // also a local variable while (k) { int k = 1; // another local var, in a loop Additional scopes in the function each for loop and each nested block (delimited by curly braces) 16 17 3

Checkpoint #1 Match each use to its declaration, or say why it is a use of an undeclared variable. int k=10, x=20; void foo(int k) { int a = x; int x = k; int b = x; while (...) { if (x == k) { int k, y; k = y = x; if (x == k) { int x = y; Dynamic Scoping Not all languages use static scoping Lisp, APL, and Snobol use dynamic scoping Dynamic scoping: A use of a variable that has no corresponding declaration in the same function corresponds to the declaration in the most-recently-called still active function 18 19 Example For example, consider the following code: int i = 1; void func() { cout << i << endl; int main () { int i = 2; func(); return 0; If C++ used dynamic scoping, this would print out 2, not 1 Checkpoint #2 Assuming that dynamic scoping is used, what is output by the following program? void { int x = 0; f1(); g(); f2(); void f1() { int x = 10; g(); void f2() { int x = 20; f1(); g(); void g() { print(x); 20 21 Keeping Track Need a way to keep track of all identifier types in scope { int i, n = ; for (i=0; i < n; i++) boolean b= i int n int i int n int b boolean? Symbol Tables Purpose: keep track of names declared in the program Symbol table entry: associates a name with a set of attributes, e.g.: kind of name (variable, class, field, method, ) type (int, float, ) nesting level mem location (where will it be found at runtime) Functions: Type Lookup(String id) Void Add(String id, Type binding) Bindings: name type pairs {a string, b int 22 23 4

Environments 0 Represents a set of mappings in the symbol table function f(a:int, b:int, c:int) = ( print_int(a+c); let var j := a+b var a := hello in print(a); print_int(j) end; print_int(b) ) Lookup in 1 1 = 0 + a int 2 = 1 + j int 1 0 How Symbol Tables Work (1) 24 25 How Symbol Tables Work (2) How Symbol Tables Work (3) 26 27 How Symbol Tables Work (4) How Symbol Tables Work (5) 28 29 5

How Symbol Tables Work (6) A Symbol Table Implementation Two structures: Hash table, Scope Stack Symbol = foo Hash(foo) = i Symbol table i x foo bar 30 31 Enter/Exit Scope We also need a stack to keep track of the nesting level as we traverse the tree Variables vs. Types Often, compilers maintain separate symbol tables for Types vs. Variables/Functions X a \ y \ x \ y a x y Scope change Lecture Checkpoint: Scopes Types x x x 32 33 Types What is a type? The notion varies from language to language Consensus A set of values A set of operations allowed on those values Certain operations are legal for each type It doesn t make sense to add a function pointer and an integer in C It does make sense to add two integers But both have the same assembly language implementation! Type Systems A language s type system specifies which operations are valid for which types The goal of type checking is to ensure that operations are used with the correct types Enforces intended interpretation of values Type systems provide a concise formalization of the semantic checking rules 34 35 6

Why Do We Need Type Systems? Consider the assembly language fragment addi $r1, $r2, $r3 What are the types of $r1, $r2, $r3? Type Checking Overview Four kinds of languages: Statically typed: All or almost all checking of types is done as part of compilation Dynamically typed: Almost all checking of types is done as part of program execution (no compiler) as in Perl, Ruby Mixed Model : Java Untyped: No type checking (machine code) 36 37 Type Checking and Type Inference Type Checking is the process of verifying fully typed programs Given an operation and an operand of some type, determine whether the operation is allowed Type Inference is the process of filling in missing type information Given the type of operands, determine the meaning of the operation the type of the operation OR, without variable declarations, infer type from the way the variable is used The two are different, but are often used interchangeably Issues in Typing Does the language have a type system? Untyped languages (e.g. assembly) have no type system at all When is typing performed? Static typing: At compile time Dynamic typing: At runtime How strictly are the rules enforced? Strongly typed: No exceptions Weakly typed: With well-defined exceptions Type equivalence & subtyping When are two types equivalent? What does "equivalent" mean anyway? When can one type replace another? 38 39 Components of a Type System Component: Built-in Types Built-in types Rules for constructing new types Where do we store type information? Rules for determining if two types are equivalent Rules for inferring the types of expressions Integer usual operations: standard arithmetic Floating point usual operations: standard arithmetic Character character set generally ordered lexicographically usual operations: (lexicographic) comparisons Boolean usual operations: not, and, or, xor 40 41 7

Component: Type Constructors Arrays array(i,t) denotes the type of an array with elements of type T, and index set I multidimensional arrays are just arrays where T is also an array operations: element access, array assignment, products Strings bitstrings, character strings operations: concatenation, lexicographic comparison Records (structs) Groups of multiple objects of different types where the elements are given specific names. Component: Type Constructors Pointers addresses operations: arithmetic, dereferencing, referencing issue: equivalency Function types A function such as "int add(real, int)" has type real int int 42 43 Component: Type Equivalence Name equivalence Types are equiv only when they have the same name Structural equivalence Types are equiv when they have the same structure Example C uses structural equivalence for structs and name equivalence for arrays/pointers Component: Type Equivalence Type Coercion If x is float, is x=3 acceptable? Disallow Allow and implicitly convert 3 to float "Allow" but require programmer to explicitly convert 3 to float What should be allowed? float to int? int to float? What if multiple coercions are possible? Consider 3 + "4" 44 45 Formalizing Types: Rules of Inference We have seen two examples of formal notation specifying parts of a compiler Regular expressions (for the lexer) Context-free grammars (for the parser) Summary Compiler must do more than recognize whether a sentence belongs to the language The appropriate formalism for type checking is logical rules of inference e 1 : int e 2 : int Checks of all kinds undefined variables, types type errors that can be caught statically Store useful information for later phases types of all expressions e 1 < e 2 : boolean 47 8