CSE450 Translation of Programming Languages Lecture 11: Semantic Analysis: Types & Type Checking
Structure Project 1 - of a Project 2 - Compiler Today! Project 3 - Source Language Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator Front End Intermediate Code Code Optimizer Target Code Generator Back End Target Language
Importance of Semantic Analysis Parsing cannot catch all possible errors. Parsing assumes that we are working with a context-free grammar. Example language constructs that require context: Have variables been declared? Is a variable available in the current scope? Are the operands of an expression valid types? Is an assignment using legal types? Are the arguments to a function of the correct type?
Types Why do we need to worry about type checking? Consider the Tube-IC fragment: add s12 s20 s34 What types are s12, s20, and s34? They can be anything! Likewise, processors treat registers generically. This makes their operations flexible and reusable, but not type safe.
Types and Operations Legal operations can vary depending on the type of a value. It typically does not make sense to add a function pointer to an integer in C++ It does makes sense to add integers Both of these operations can potentially have the same implementation in assembly. As far as the processor is concerned, an integer and a pointer look the same.
Type Systems A language type system specifies which types are available, and what operations can be used on those types. The goal of type checking is to ensure that only "sensible" operations are allowed to be performed. Type checking also can provide the ability to have different operations performed depending on the types involved.
Three basic kinds of Type Checking Statically Typed Almost all type checking happens at compile time Each variable is limited to a single type Language examples include C/C++, Java, Tubular Dynamically Typed Almost all type checking occurs at runtime Variables can typically contain any type of value Most scripting languages do this (Javascript, Python, Ruby, Scheme, etc.) Untyped No checking is done, such as in assembly
Static vs. Dynamic Typing In practice, most languages use some statically typed and dynamically typed elements. Provide escape mechanisms (casting) to allow static elements to be used as needed. There are three basic kinds of type checking systems: Static typing Many errors can be caught at compile time Optimizations can be easier to perform Runtime environment can be faster, type decisions have already been performed Dynamically Typed Less restrictive, easier to express operations, faster development Programs can be more modular, extensible, and adaptive More runtime machinery required, can be slower during execution
Types in Tubular We will have two basic types in Project 4: val char - floating point quantity (already implemented) - a single ascii char And one meta type will be added for Project 5: array A consecutive grouping of a basic type array(char) can also be referred to as string
Tubular Type Checking For Tubular, we will be using static typing. Simpler to implement the runtime environment. Four basic scenarios where types will need to be checked: Variable Assignments: type of RHS must match variable Mathematical Operations: type must be val for + - * / % && and! Comparison Operators: types must both be val or both be char Generic commands, like print: any type accepted Function calls (coming in Project 7): arguments must match
Variable Assignments val x; char y; assignment: var_any '=' expression x = 1; y = 'b'; = = x 1 y 'b' x = 'a'; y = 2; = = x 'a' y 2
Mathematical Operations val x = 1; char y; x = x + 2; y = 'c'; x = y + 3; y = x; y = 'a' + 'b'; expr: expr '+' expr + x 2 + y 3 'a' + 'b'
Functions and Commands The print command can take anything. Type information is used to determine what operation to perform. If the type of the argument is val, use out_val If the type of the argument is char, use out_char Starting with Project 5, If the type of the argument is an array print out each element of that array with the internal type Other commands and functions may have particular type requirements, depending on argument position.
Type Checking Char versus Val
The char Type Like val variables can be declared type char. val x = 0; char y = a ; Char variables are single characters between single quotes. The symbol table must keep track of type to ensure that no illegal operations take place.
Escape Characters char a = \n ; char b = \t ; char c = \ ; char d = \\ ; The 4 escape characters are preceded by a backslash. No other escape characters should be implemented.
Special Note - # is a normal character char a = # ; The comment character # is allowed between single quotes and doesn t denote a comment.
Type Checking - Assignment char a = x ; val b; b = a; # ERROR You cannot assign a variable of one type to another. With static type checking, we know the type of every variable at compile time and can ensure correctness.
Type Checking - Relationship Operators char a = x ; char b = y ; b > a; val c = 0; You can compare (==,!=, >, >=, <, <=) two variables of the same type. But you cannot compare two different types. a!= c; # ERROR
Type Checking - Mathematical Operators char a = x ; char b = y ; a + b; # ERROR a && b; # ERROR The char type cannot be used by math operators (+, +=, -, -=, *, *=, /, /=), nor boolean operators (&&,,!).
Type Checking - Boolean Evaluation char a = x ; if (a) { # ERROR a = b ; } The char type cannot be used where a boolean result is needed (conditions for if and while statements).
Type Checking - Type Specific Commands char a = x ; random(a); # ERROR print(a); The random command only takes the type val, giving it anything else is an error. The print command happily takes type char as an argument.
Hold up the colors that are legal. #1 val x = 1; x = a ; #3 char x = a ; char y = b ; x!= y; #2 char x = a ; char y = b ; char z = x + y; #4 char x = a ; if (x == b ) { x = b ; }
Char Implementation - val_copy char a = \n ; becomes val_copy \n s1 val_copy s1 s2 Tube Intermediate Code handles char s just like val s. Escape characters are treated identically to Tubular (original source).
Char Implementation - other ops a > b ; becomes val_copy a s1 val_copy b s2 test_gtr s1 s2 s3 The other TubeIC operators behave with char like val.
Char Implementation - out_char print(1, a ) becomes val_copy 1 s1 val_copy a s2 out_val s1 out_char s2 out_char \n You ve already been using the one char specific TubeIC instruction.
How to keep track of TYPE Every variable (temporary or named) needs to know its type. You can use the symbol table to store this information. For this class, there will only be a finite number of types (val, char, and a few others introduced in future projects).
Implementing char type 1. Make the lexer include escape characters 2. Make the parser allow type char in variable declarations 3. Make the symbol table store type of every variable used 4. Make the abstract syntax tree include a node for literal char values 5. For each node in the AST, make sure that the types of its children are legal or raise an error if not. This can be done at the creation of the node.
Scope Refresher Symbol Tables and Decrementing Scope
Implementing Scoping Scoping can be implemented right within your symbol table(s). When a variable is declared: Check that it has not been previously defined within this scope (but lower scopes are allowed) Add it to the table, recording its name, type, etc., along with the scope in which it was created. When leaving a scope, simply deactivate symbols that are no longer accessible. They can t be used again in the source program. (But you will need to reference them when outputting your intermediate code!)
Stack of SymbolTables Given: val a = 123; val b = 44; if (a == 123) { char a = 'x'; print(a); } print(a); SymbolTable[0]: val a val b SymbolTable[1]: char a