Semantic Analysis Lecture 9 February 7, 2018
Midterm 1 Compiler Stages 12 / 14 COOL Programming 10 / 12 Regular Languages 26 / 30 Context-free Languages 17 / 21 Parsing 20 / 23 Extra Credit 4 / 6 Average 92 Compiler Construction 2/48
Formal Languages Wrap-up You made it! Takeaways from lexical analysis and parsing Regular languages define tokens Context-free languages define parse trees Classes of languages require different levels of expressive power Compiler Construction 3/48
Compilers Reminder 1. Lexical Analysis Source code Tokens 2. Syntactic Analysis Tokens Abstract Syntax Tree 3. Semantic Analysis (later today) 4. Code Generation 5. Optimization Compiler Construction 4/48
Compilers Reminder Remember, we want to support developers effort to write software They give our compiler source code Our job is to turn it into a program Automatic conversion requires various levels of analysis On the front end, we want to extract structure from the string of source code Compiler Construction 5/48
Parsing to Extract Structure Consider two input strings: x = 1; y = 1; And what about: z = 42; y = 100; = Is there a structural difference between these strings? x 1 ; Compiler Construction 6/48
Parsing to Extract Structure Consider two input strings: x = 1; y = 1; And what about: z = 42; y = 100; = Is there a structural difference between these strings? y 1 ; Compiler Construction 6/48
Parsing to Extract Structure Consider two input strings: x = 1; y = 1; And what about: z = 42; y = 100; = Is there a structural difference between these strings? z 42 ; Compiler Construction 6/48
Parsing to Extract Structure Consider two input strings: x = 1; y = 1; And what about: z = 42; y = 100; = Is there a structural difference between these strings? y 100 ; Assign x = 1; y = 1; z = 42; y = 100;... Compiler Construction 6/48
Parsing to Extract Structure Consider two input strings: x = 1; y = 1; And what about: z = 42; y = 100; = Is there a structural difference between these strings? y 100 ; Assign x = 1; y = 1; z = 42; y = 100;... Compiler Construction 6/48
Parsing to Extract Structure Consider two input strings: x = 1; y = 1; And what about: z = 42; y = 100; = Assign ID = VALUE ; Is there a structural difference between these strings? ID VALUE ; Compiler Construction 6/48
Parsing to Extract Structure We want a concise grammar Don t want to depend on specific values! S ID = VAL ;, not S x = 42 ; Well, we need an earlier lexical analysis step Bin lexemes into tokens x, y, z all the same thing: identifiers 1, 42, 100 all the same thing: integer constants Compiler Construction 7/48
Wrapping Up Formal Languages Lexical Analysis Regular Language = DFA = NFA = Regular Expression Syntactic Analysis / Parsing Context-free Language = Context-free Grammar = Nondeterministic Pushdown Automaton = Parsing Automaton Compiler Construction 8/48
Formal Languages Highlights Regular Languages are easier to recognize than CFLs! Need a stack with CFLs As compiler designers, we restrict our choices of CFL LL and LR are less expressive, but easier to parse Parsing DFA is kind of a lie! Technically, CFLs are recognized with NPDAs This is not a formal languages course Compiler Construction 9/48
Parse Tree vs. Abstract Syntax Tree Almost the same Input: 6 + ( ( 5 ) ) Parse Tree: EXP EXP EXP EXP int EXP 6 + int ( ( 5 ) ) Compiler Construction 10/48
Parse Tree vs. Abstract Syntax Tree Almost the same Input: 6 + ( ( 5 ) ) Abstract Syntax Tree: ADD_EXP int:6 int:5 Compiler Construction 10/48
Parsing Highlights You should understand that LL and LR are different restricted subsets of CFLs Trade language expressiveness for parsing ease You should know top-down and bottom-up parsing Real parser generators build bottom-up parsers We talked about LL(1), recursive descent, and LR(1) parsing There are many other parsing algorithms: LALR (lookahead LR) SLR (simple) GLR (generalized) IELR (inadequacy elimination) Earley Compiler Construction 11/48
Compiler Construction 12/48
The Front End Lexical Analysis Detects inputs with illegal tokens Syntactic Analysis Detects inputs with ill-formed parse trees Semantic Analysis Last front end phase Detects more invalid inputs Compiler Construction 12/48
Intro to Type Checking We have just parsed our program It followed the rules of a context-free grammar We know it is a structurally valid program Compiler Construction 13/48
Intro to Type Checking We have just parsed our program It followed the rules of a context-free grammar We know it is a structurally valid program Ultimately, programming languages are not context-free Need a rigorous analysis to tell us about semantic validity of the program Basically, a laundry list of tasks Compiler Construction 13/48
Type Checking Summary Scoping rules match identifier uses with identifier definitions. A type is a set of values coupled with a set of operations on those values. A type system specifies which operations are valid for which types. Type checking can be done statically (at compile time) or dynamically (at run time). Compiler Construction 14/48
What s Wrong? Example 1: 1 l e t y : I n t in 2 x + 3 Example 2: 1 l e t y : S t r i n g < " abc " in 2 y + 3 Compiler Construction 15/48
Semantic Analysis: Why? Parsing cannot catch some errors Some language constructs are not context-free. All used variables must have been declared (scoping) All methods must be invoked with arguments of the proper type (i.e. typing) Compiler Construction 16/48
What does semantic analysis do? Lots of things! In Cool: All identifiers are declared Static types Inheritance relationships Classes defined only once Methods in a class defined only once Reserved identifiers are not misused several others... The requirements are language-dependent. Which of these are checked by Python? C++? Compiler Construction 17/48
Scope Scoping rules match identifier uses with identifier declarations Critical analysis in most languages (including Cool) Compiler Construction 18/48
Scope The scope of an identifier is the portion of a program in which that identifier is accessible The same identifier may refer to different things in different parts of the program Different scopes for the same name do not overlap An identifier may have restricted scope Compiler Construction 19/48
Static vs. Dynamic Scope Most languages have static scope Scope depends only on the program text, not the run-time behavior Cool is statically scoped A few languages are dynamically scoped L A TEX, SNOBOL Compiler Construction 20/48
Static vs. Dynamic Scope 1 int x; 2 3 int main () { 4 x = 2; 5 f(); 6 g(); 7 } 8 void f () { 9 int x = 3; 10 h(); 11 } 12 void g () { 13 int x = 4; 14 h(); 15 } 16 void h() { 17 printf ("%d\n", x); 18 } Compiler Construction 21/48
Static vs. Dynamic Scope 1 int x; 2 3 int main () { 4 x = 2; 5 f(); 6 g(); 7 } 8 void f () { 9 int x = 3; 10 h(); 11 } 12 void g () { 13 int x = 4; 14 h(); 15 } 16 void h() { 17 printf ("%d\n", x); 18 } Static scoping output: 2 2 Dynamic scoping output: 3 4 Compiler Construction 21/48
Static Scoping Example 1 let x : Int <- 0 in 2 { 3 x; 4 { let x : Int <- 1 in 5 x ; 6 }; 7 x; 8 } Compiler Construction 22/48
Static Scoping Example 9 let x : Int <- 0 in 10 { 11 x; 12 { let x : Int <- 1 in 13 x ; 14 }; 15 x; 16 } Uses of x refer to closest enclosing definition. Compiler Construction 22/48
Scope in Cool Cool identifier bindings are introduced by Class declarations (class names) Method definitions (method names) Let expressions (object id s) Formal parameters (object id s) Attribute definitions in a class (object id s) Case expressions (object id s) Compiler Construction 23/48
Implementing the Most-Closely Nested Rule Much of semantic analysis can be expressed as a recursive descent of an AST! Process an AST node n Process the children of n Finish processing the node n Compiler Construction 24/48
Implementing... (2) Example: the scope of let bindings is one subtree 1 l e t x : I n t < 0 in expr x can be used in subtree expr Compiler Construction 24/48
Symbol Tables Recall: let x : Int <- 0 in expr Idea: Before processing expr, add definition of x to current definitions (override any previous definition) After processing expr, remove definition of x and restore old definition of x (if any) A symbol table is a data structure that tracks the current bindings of identifiers Part of PA4 How might we implement this? Compiler Construction 25/48
Exceptions to Most-closely Nested Not all kinds of identifiers follow the most-closely nested rule Class definitions in Cool are globally visible throughout all of the program Implication: class names can be used before it is defined Compiler Construction 26/48
Example: Use before define 1 class Foo { 2... let y : Test in... 3 }; 4 class Test { 5... 6 }; Compiler Construction 27/48
More Scope in Cool Attribute names are global within the class in which they are defined 1 class Foo { 2 f() : Int { tm }; 3 tm : Int <- 0; 4 }; Compiler Construction 28/48
More Scope in Cool Method and attribute names have complex rules A method need not be defined in the class in which it is used, but in some parent class that s inheritance Methods may also be redefined (overridden) Which languages allow this? Cool does Compiler Construction 29/48
Class Definitions Class names can be used before being defined We can t check this property in one pass :( Solution Pass 1: Collect all the class names Pass 2: Do the checking? Pass 4: Profit! Semantic analysis requires multiple passes Compiler Construction 30/48
Types What is a Type? The notion varies from language to language Consensus A set of values A set of operations on those values Classes are one instantiation of the modern notion of type Compiler Construction 31/48
Why do we need types? Consider this assembly: addi, $r1, $r2, $r3 What are the types of r1, r2, and r3? Compiler Construction 32/48
Types and Operations Cretain operations are legal or valid for values of each type It doesn t make sense to add a function pointer and an integer in C It does make sense to add two integers Both are the same in assembly! Compiler Construction 33/48
Type Systems A language s type system specifies which operations are valid for which types The goal of type checking is to ensure that operations are used with the correct types Enforces intended interpretation of values, because nothing else will! Our last, best hope... for victorious execution Type systems provide a concise formalization of the semantic checking rules Compiler Construction 34/48
What Can Types Do? Can detect certain kinds of errors Memory errors: Reading from an invalid pointer, etc. Violation of abstraction boundaries class FileSystem { open ( x : String ) : File {... }; }; The f method can t see data inside fdesc! class Client { f( fs : FileSystem ) { File fdesc <- fs. open (" foo ")... } }; Compiler Construction 35/48
Type Checking Overview Three kinds of languages: Statically typed: All or almost all checking of types is done as part of compilation (C, Java, Cool) Dynamically typed: Almost all checking of types is done during execution (Python, PHP, Ruby) Untyped: No type checking (machine code) Compiler Construction 36/48
The Type Wars Competing views on static vs. dynamic typing Static typing proponents say: Static checking catches many programming errors at compile time Avoids overhead of runtime type checks Dynamic typing proponents say: Static type systems are restrictive Rapid prototyping is easier in a dynamic type system Compiler Construction 37/48
The Type Wars (2) In practice, most code is written in statically-typed languages with an escape mechanism. Unsafe casts in C, native methods in Java,... Dynamic typing (a.k.a. duck typing) is big in the scripting world Compiler Construction 38/48
Cool Types In Cool, types are Class names SELF_TYPE Technically, there are no native types Everything is boxed as an object Java Integer int The use declares types for all identifiers The compiler infers types for expressions every expression Compiler Construction 39/48
Type checking and Type Inference Type Checking is the process of verifying fully-typed programs Type Inference is the process of filling in missing type information Sometimes used interchangably Compiler Construction 40/48
Rules of Inference You have survived formal languages in specifying parts ot he compiler Regular expressions (lexer) Context-free grammars (parser) We use logical rules of inference as a formalism for type inference Compiler Construction 41/48
Rules of Inference? Inference rules have the form If Hypothesis is true, then Conclusion is true Type checking computes via reasoning: If E 1 and E 2 have certain types, then E 3 has a certain type Rules of inference are a compact notation for If-then statements Compiler Construction 42/48
From English to Inference Rules Start with a simplified system and gradually add features... Building blocks Symbol is and Symbol is if-then x:t is x has type T Compiler Construction 43/48
English to Inference Rules (2) If e 1 has type Int and e 2 has type Int, then e 1 + e 2 has type Int (e 1 has type Int e 2 has type Int) e 1 + e 2 has type Int (e 1 : Int e 2 : Int) e 1 + e 2 : Int This is an inference rule! e 1 : Int e 2 : Int are two hypotheses, e 1 + e 2 : Int is a conclusion! Compiler Construction 44/48
Notation for Inference Rules By tradition, inference rules are written Hypothesis 1... Hypothesis n Conclusion Cool type rules have hypotheses and conclusions of the form: e : T means we can prove that... Compiler Construction 45/48
Examples (i is an Integer) i : Int Int e 1 : Int e 2 : Int e 1 + e 2 : Int Add (the result of adding to Ints is an Int) Compiler Construction 46/48
Examples (2) e 1 : Int e 2 : Int e 1 + e 2 : Int Add These rules give templates describing how to type integers and + expressions... We can fill in the templates, producing a complete typing for any expression! false : Int true : Int true + false : Int Compiler Construction 47/48
Example: 1+2 1 : Int 2 : Int 1 + 2 : Int Compiler Construction 48/48