CS415 Compilers Context-Sensitive Analysis Type checking Symbol tables These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Lecture 18 1
Announcements Homework 6 due on Monday, April 9 Homework 7 will be posted on Wednesday, April 11 Second project has been posted; due date: Monday, April 23 Midterm will be returned in recitation Midterm sample solutions (A and B versions) have been posted; Please review the sample solutions, and if you have any remaining question, please see the TA/me grader for the problem: 1: Rajanya 2: Ulrich Kremer 3: Chen 4: Jeff Project 1 grades will be posted soon. cs415, spring 18 Lecture 18 2
Attribute Grammars Add rules to compute the decimal value of a signed binary number Productions Attribution Rules Number Sign List List.pos 0 If Sign.neg then Number.val List.val else Number.val List.val Sign + Sign.neg false Sign.neg true List 0 List 1 Bit List 1.pos List 0.pos + 1 Bit.pos List 0.pos List 0.val List 1.val + Bit.val Bit Bit.pos List.pos List.val Bit.val Bit 0 Bit.val 0 1 Bit.val 2 Bit.pos cs415, spring 18 Lecture 18 3
Attribute Grammars Productions Attribution Rules List 0 List 1 Bit List 1.pos List 0.pos + 1 Bit.pos List 0.pos List 0.val List 1.val + Bit.val LIST 0 pos LIST 1 pos val val pos val BIT semantic rules define partial dependency graph value flow top down or across: inherited attributes value flow bottom-up: synthesized attributes cs415, spring 18 Lecture 18 4
Attribute Grammars LIST 0 pos LIST 1 pos val val pos val BIT Note: semantic rules associated with production A a have to specify the values for all - synthesized attributes for A (root) - inherited attributes for grammar symbols in a (children) Þ rules must specify local value flow! terminals can be associated with values returned by the scanner. These input values are associated with a synthesized attribute. Starting symbol cannot have inherited attributes. cs415, spring 18 Lecture 18 5
Using Attribute Grammars Attribute grammars can specify context-sensitive actions Take values from syntax Perform computations with values Insert tests, logic, Synthesized Attributes Inherited Attributes Use values from children & from constants S-attributed grammars: synthesized attributes only Evaluate in a single bottom-up pass Good match to LR parsing S-attributed Ì L-attributed Use values from parent, constants, & siblings L-attributed grammars: A X 1 X 2 X n and each inherited attribute of X i depends on - attributes of X 1 X 2 X i-1, and - inherited attributes of A Evaluate in a single top-down pass (left to right) Good match for LL parsing cs415, spring 18 Lecture 18 6
The Moral of the Story Non-local computation needed lots of supporting rules Complex local computation is relatively easy The Problems Copy rules increase cognitive overhead Copy rules increase space requirements Need copies of attributes Result is an attributed tree Must build the parse tree Either search tree for answers or copy them to the root cs415, spring 18 Lecture 18 7
Addressing the Problem What would a good programmer do? Introduce a central repository for facts Table of names Field in table for loaded/not loaded state Avoids all the copy rules, allocation & storage headaches All inter-assignment attribute flow is through table Clean, efficient implementation Good techniques for implementing the table (hashing, B.4) When its done, information is in the table! Cures most of the problems Unfortunately, this design violates the functional, AG paradigm Do we care? cs415, spring 18 Lecture 18 8
The Realist s Alternative Ad-hoc syntax-directed translation Associate pieces of code with each production At each reduction, the corresponding code is executed Allowing arbitrary code provides complete flexibility Includes ability to do tasteless & bad things To make this work Need names for attributes of each symbol on lhs & rhs Typically, one attribute passed through parser + arbitrary code (structures, globals, ) Yacc introduced $$, $1, $2, $n, left to right Need an evaluation scheme Fits nicely into LR(1) parsing algorithm cs415, spring 18 Lecture 18 9
YACC : parse.y parse.y : %{ #include <stdio.h> #include "attr.h" int yylex(); void yyerror(char * s); #include "symtab.h" %} Will be included verbatim in parse.tab.c List and assign attributes %union {tokentype token; } %token PROG PERIOD PROC VAR ARRAY RANGE OF %token INT REAL DOUBLE WRITELN THEN ELSE IF %token BEG END ASG NOT %token EQ NEQ LT LEQ GEQ GT OR EXOR AND DIV NOT %token <token> ID CCONST ICONST RCONST %start program %% program : PROG ID ';' block PERIOD { } ; block : BEG ID ASG ICONST END { } ; %% void yyerror(char* s) { fprintf(stderr,"%s\n",s); } int main() { printf("1\t"); yyparse(); return 1; } Rules with semantic actions Main program and helper functions; may contain initialization code of global structures. Will be included verbatim in parse.tab.c cs415, spring 18 Lecture 18 10
Project #2 (see lex & yacc, Levine et al., O Reilly) You do not have to change the scanner (scan.l) How to specify and use attributes in YACC? Define attributes as types in attr.h typedef struct info_node {int a; int b} infonode; Include type attribute name in %union in parse.y %union {tokentype token; infonode myinfo; } Assign attributes in parse.y to Terminals: %token <token> ID ICONST Non-terminals: %type <myinfo> block variables procdecls cmpdstmt Accessing attribute values in parse.y use $$, $1, $2 etc. notation: block : variables procdecls {$2.b = $1.b + 1;} cmpdstmt { $$.a = $1.a + $2.a + $4.b;} cs415, spring 18 Lecture 18 11
Summary: Is This Really Ad-hoc? Example on ilab: ~uli/cs415/spring18/examples/lexyacc Relationship between practice and attribute grammars Similarities Both rules & actions associated with productions Application order determined by tools (Somewhat) abstract names for symbols Differences Actions applied as a unit; not true for AG rules Anything goes in ad-hoc actions; AG rules are (purely) functional AG rules are higher level than ad-hoc actions cs415, spring 18 Lecture 18 12
Applications of SDT (Semantic Analysis) Building a symbol table Enter declaration information as processed At end of declaration syntax, do some post processing Use table to check errors as parsing progresses Simple error checking/type checking Define before use lookup on reference Dimension, type,... check as encountered Type conformability of expression bottom-up walk Procedure interfaces are harder Build a representation for parameter list & types Check actual vs. formal parameter list Positional or keyword associations assumes table is global cs415, spring 18 Lecture 18 13
Project 2 A simple compiler Perform type checking (boolean and integer operations) Error messages Generate ILOC code Lex & Yacc (C version) cs415, spring 18 Lecture 18 14
Types and Type Systems Type: A set of values and meaningful operations on them Types provide semantic sanity checks (consistency checks) and determine efficient implementations for data objects Types help identify errors, if an operator is applied to an incompatible operand dereferencing of a non-pointer adding a function to something incorrect number of parameters to a procedure which operation to use for overloaded names and operators, or what type coercion to use (e.g.: 3.0 + 1) identification of polymorphic functions cs415, spring 18 Lecture 18 15
Types and Type Systems Type system: Each language construct (operator, expression, statement, ) is associated with a type expression. The type system is a collection of rules for assigning type expressions to these constructs. Type expressions for basic types: integer, char, real, boolean, typeerror constructed types, e.g., one-dimensional arrays: array(lb, ub, elem_type), where elem_type is a type expression A type checker implements a type system. It computes or constructs type expressions for each language construct cs415, spring 18 Lecture 18 16
Types and Type Systems Example type inference rule: E e 1 : integer E e 2 : integer E (e 1 + e 2 ): integer where E is a type environment that maps constants and variables to their type expressions. Questions: How to specify rules that allow type coercion (type widening) from integers to reals in arithmetic expressions? 3.0 + 1 or 1 + 3.0 cs415, spring 18 Lecture 18 17
Types and Type Systems Example type inference rule pointer dereferencing: E e :??? E *e :??? where E is a type environment that maps constants and variables to their type expressions. cs415, spring 18 Lecture 18 18
Types and Type Systems Example type inference rule pointer dereferencing: E E e : pointer(integer) *e : integer where E is a type environment that maps constants and variables to their type expressions. pointer( ) is now part of the type expression language such as array( ). cs415, spring 18 Lecture 18 19
Types and Type Systems Example type inference rule pointer dereferencing: E e : pointer( β ) E *e : β where E is a type environment that maps constants and variables to their type expressions. Type expressions may also contain type variables such as β. Type variables can denote any type expression. Type variables are needed to express polymorphic types. cs415, spring 18 Lecture 18 20
Types and Type Systems Example type inference rule address computation: E e : integer E &e :??? where E is a type environment that maps constants and variables to their type expressions. What about a polymorphic version of this rule? cs415, spring 18 Lecture 18 21
Types and Type Systems Formal proof that a program can be typed correctly. int a;...... *(&a) + 3... cs415, spring 18 Lecture 18 22
Type Names and Type Systems Programmers may define their own types and give them names: type my_int is int; int a; my_int b; a + b Type names can also be part of the type expression language. Note: type names and type variables are different! cs415, spring 18 Lecture 18 23
Type Equivalence Structural -- type equivalence: type names are expanded Name -- type equivalence: type names are not expanded Example: type A is array(1..10) of integer; type B is array(1..10) of integer; a : A; b : B; c, d: array(1..10) of integer; e: array(1..10) of integer; Answer: structural equivalence: name equivalence: cs415, spring 18 Lecture 18 24
Type Equivalence Structural -- type equivalence: type names are expanded Name -- type equivalence: type names are not expanded Example: type A is array(1..10) of integer; type B is array(1..10) of integer; a : A; b : B; c, d: array(1..10) of integer; e: array(1..10) of integer; Answer: structural equivalence: (a, b, c, d, e) name equivalence: (a); (b); (c, d, e); cs415, spring 18 Lecture 18 25
Syntax Directed Translation Scheme (SDT) Revisit our type inference rule for +. exp : exp + exp { if ($1 == TYPE_INT && $3 == TYPE_INT) $$ = TYPE_INT; else { } } $$ = TYPE_INT; printf( \n***error: illegal operand types\n ); PROJECT HINT: The definition of type expression as C types (structs) should be done in attr.h. attr.c may contain helper functions. The assignment of type expression C types to terminals and nonterminals of the grammar is done in parse.y. cs415, spring 18 Lecture 18 26
Lexically-scoped Symbol Tables 5.5 in EaC The problem The compiler needs a distinct record for each declaration Nested lexical scopes admit duplicate declarations The interface insert(name, level ) creates record for name at level lookup(name, level ) returns pointer or index delete(level ) removes all names declared at level Many implementation schemes have been proposed We ll stay at the conceptual level Hash table implementation is tricky, detailed, & fun (see B.4) Symbol tables are compile-time structures the compiler use to resolve references to names. We ll see the corresponding run-time structures that are used to establish addressability later. cs415, spring 18 Lecture 18 27
Example procedure p { int a, b, c procedure q { int v, b, x, w procedure r { int x, y, z. } procedure s { int x, a, v } r s } q } B0: { int a, b, c B1: { int v, b, x, w B2: { int x, y, z. } B3: { int x, a, v } } } cs415, spring 18 Lecture 18 28
Example procedure p { int a, b, c procedure q { int v, b, x, w procedure r { int x, y, z. } b, w } c x, a, v } q procedure s { int x, a, v } r s B0: { int a 0, b 1, c 2 B1: { int v 3, b 4, x 5, w 6 B2: { int x 7, y 8, z 9. } B3: { int x 10, a 11, v 12 } } } a 11, b 4, c 2, v 12, w 6,, x 10, no y or z Picturing it as a series of Algol-like procedures cs415, spring 18 Lecture 18 29
Lexically-scoped Symbol Tables High-level idea Create a new table for each scope Chain them together for lookup s x a v q v b x w... p a b c... Chain of tables implementation insert() may need to create table it always inserts at current level lookup() walks chain of tables & returns first occurrence of name delete() throws away table for level p, if it is top table in the chain Individual tables can be hash tables. cs415, spring 18 Lecture 18 30
Lexically-scoped Symbol Tables High-level idea Create a new table for each scope Chain them together for lookup s x a v q v b x w... p a b c... Remember a 11, b 4, c 2, v 12, w 6,, x 10, no y or z the names visible in s If we add the subscripts, the relationship between the code and the table becomes clear cs415, spring 18 Lecture 18 31
Implementing Lexically Scoped Symbol Tables Stack organization nextfree s (level 2) q (level 1) p (level 0) v a x w x b v c b a growth Implementation insert () creates new level pointer if needed and inserts at nextfree lookup () searches linearly from nextfree 1 forward delete () sets nextfree to the equal the start location of the level deleted. Advantage Uses much less space Disadvantage Lookups can be expensive cs415, spring 18 Lecture 18 32
Implementing Lexically Scoped Symbol Tables Stack organization a 11, b 4, c 2, v 12, w 6,, x 10, no y or z nextfree s (level 2) q (level 1) p (level 0) v a x w x b v c b a growth 12 11 10 6 5 4 3 2 1 0 Implementation insert () creates new level pointer if needed and inserts at nextfree lookup () searches linearly from nextfree 1 down stack delete () sets nextfree to the equal the start location of the level deleted. Advantage Uses much less space Disadvantage Lookups can be expensive cs415, spring 18 Lecture 18 33
Implementing Lexically Scoped Symbol Tables Threaded stack organization growth Implementation insert () puts new entry at the head of the list for the name h(x) v a x w x b v c b a s q p lookup () goes direct to location delete () processes each element in level being deleted to remove from head of list Advantage lookup is fast Disadvantage delete takes time proportional to number of declared variables in level cs415, spring 18 Lecture 18 34
Implementing Lexically Scoped Symbol Tables Threaded stack organization a 11, b 4, c 2, v 12, w 6,, x 10, no y or z h(x) v a x w x b v c b a growth 12 11 10 6 5 4 3 2 1 0 s q p Implementation insert () puts new entry at the head of the list for the name lookup () goes direct to location delete () processes each element in level being deleted to remove from head of list Advantage lookup is fast Disadvantage delete takes time proportional to number of declared variables in level cs415, spring 18 Lecture 18 35
Things to do and next class Start working on the project! Intermediate representations Read EaC: Chapter 5.1 5.3 Code generation Read EaC: Chapter 7.1 7.5 cs415, spring 18 Lecture 18 36