PART 4 - SYNTAX DIRECTED TRANSLATION F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 109 / 309
Setting Translation of context-free languages Information attributes of grammar symbols Values of attributes are defined by semantic rules 2 possibilities: Syntax directed definitions (high-level spec) Translation schemes (implementation details) Evaluation: (1) Parse input, (2) Generate parse tree, (3) Evaluate parse tree F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 110 / 309
Syntax directed definitions Generalization of context-fee grammars Each grammar symbol has a set of attributes Synthesized vs. inherited attributes Attribute: string, number, type, memory location,... Value of attribute is defined by semantic rules Synthesized: Value of child node in parse tree Inherited: Value of parent node in parse tree Semantic rules define dependencies between attributes Dependency graph defines calculation order of semantic rules Semantic rules can have side effects F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 111 / 309
Form of a syntax directed definition Grammar production: A α Associated semantic rule: b := f(c 1,..., c k ) f is a function Synthesized: b is a synthesized attribute of A and c 1,..., c k are grammar symbols of the production Inherited: b is an inherited attribute of a grammar symbol on the right side of the production and c 1,..., c k are grammar symbols of the production b depends on c 1,..., c k F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 112 / 309
Example Calculator -program: val is a synthesized attribute for nonterminals E, T and F Production L En E E 1 +T E T T T 1 *F T F F (E) F digit Semantic Rule print(e.val) E.val := E 1.val + T.val E.val := T.val T.val := T 1.val F.val T.val := F.val F.val := E.val F.val := digit.lexval F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 113 / 309
S-attributed grammar Attributed grammar exclusively using synthesized attributes Example-evaluation: 3*5+4n (annotated parse tree) L E.val=19 n E.val=15 + T.val=4 T.val=15 F.val=4 T.val=3 * F.val=5 digit.lexval=4 F.val=3 digit.lexval=5 digit.lexval=3 F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 114 / 309
Inherited attributes Definition of dependencies of program language constructs and their context Example: (type checking) Production D T L T int T real L L 1, id L id Semantic Rule L.in := T.type T.type := integer T.type := real L 1.in := L.in addtype(id.entry, L.in) addtype(id.entry, L.in) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 115 / 309
Inherited attributes Annotated parse tree real id 1, id 2, id 3 D T.type=real L.in=real real L.in=real, id 3 L.in=real, id 2 id 1 F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 116 / 309
Dependency graphs Show dependencies between attributes Each rule is represented in the form b := f(c 1,..., c k ) Nodes correspond to attributes; edges to dependencies Definition: for each node n in the parse tree do for each attribute a of the grammar symbol at n do construct a node in the dependency graph for a for each node n in the parse tree do for each semantic rule b := f(c 1,..., c k ) associated with the production used at n do for i := 1 to k do construct an edge from the node for c i to the node for b F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 117 / 309
Dependency graph Example D T type 4 5 in L 6 real in 7 L 8, id 3 3 entry in 9 L, 10 id 2 2 entry id 1 1 entry F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 118 / 309
Topological sort Arrangement of m 1,..., m k nodes in a directed, acyclic graph where edges point from smaller nodes to bigger nodes If m i m j is an edge, then the node m i is smaller than the node m j Important for order in which the attributes are calculated Example (cont.): 1 a 4 := real 2 a 5 := a 4 3 addtype(id 3.entry, a 5 ) 4 a 7 := a 5 5 addtype(id 2.entry, a 7 ) 6 a 9 := a 7 7 addtype(id 1.entry, a 9 ) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 119 / 309
Example - syntax trees Abstract syntax tree = simplified form of a parse tree Operators and keywords are supplied to intermediate nodes by leaf nodes Productions with only one element can collapse Examples: if-then-else + B S S 1 2 * 3 5 4 F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 120 / 309
Syntax trees Expressions Functions (return value: pointer to new node): mknode(op, left, right): node label op, 2 child nodes left, right mkleaf(id, entry): leaf id, entry in symbol table entry mkleaf(num, val): leaf num, value val Syntax directed definition: Production E E 1 + T E E 1 T E T T (E) T id T num Semantic Rule E.nptr := mknode( +, E 1.nptr, T.nptr) E.nptr := mknode(, E 1.nptr, T.nptr) E.nptr := T.nptr T.nptr := E.nptr T.nptr := mkleaf(id, id.entry) T.nptr := mkleaf(num, num.val) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 121 / 309
Syntax trees Expressions (ex.) Syntax tree for a-4+c E nptr E nptr + T nptr E - T nptr id + T nptr num id - id to entry for c id num 4 to entry for a F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 122 / 309
Evaluation of S-attributed definitions Attributed definition exclusively using synthesized attributes Evaluation using bottom-up parser (LR-parser) Idea: store attribute information on stack State Val Semantic rule:...... A.a := f(x.x, Y.y, Z.z) X X.x Production: A XY Z Y Y.y Before XY Z is reduced to A, value top Z Z.z of Z.z stored in val[top], Y.y stored...... in val[top 1], X.x in val[top 2] F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 123 / 309
Example - S-attributed evaluation Calculator -example: Production Code Fragment L En print(val[top 1]) E E 1 + T val[ntop] := val[top 2] + val[top] E T T T 1 F val[ntop] := val[top 2] val[top] T F F (E) val[ntop] := val[top 1] F digit Code executed before reduction ntop = top r + 1, after reduction: top := ntop F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 124 / 309
Result for 3*5+4n Input state val Production used 3*5+4n *5+4n 3 3 *5+4n F 3 F digit *5+4n T 3 T F 5+4n T * 3 +4n T * 5 3 5 +4n T * F 3 5 F digit +4n T 15 T T F +4n E 15 E T 4n E + 15 n E + 4 15 4 n E + F 15 4 F digit n E + T 15 4 T F n E 19 E E + T E n 19 L 19 L En F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 125 / 309
L-attributed definitions Definition: A syntax directed definition is L-attributed if each inherited attribute of X j, 1 j n, on the right side of A X 1,..., X n is only dependent on: 1 the attributes X 1,..., X j 1 to the left of X j and 2 the inherited attributes of A Each S-attributed grammar is a L-attributed grammar Evaluation using depth-first order procedure df visit(n : node) for each child m of n, from left to right do evaluate inherited attributes of m df visit(m) end evaluate synthesized attributes of n end F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 126 / 309
Translation schemes Translation scheme = context-free language with attributes for grammar symbols and semantic actions which are placed on the right side of a production between grammar symbols and are confined within {} Example: T T 1 F {T.val := T 1.val F.val} If only synthesized attributes are used, the action is always placed at the end of the right side of a production Note: Actions may not access attributes which are not calculated yet (limits positions of semantic actions) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 127 / 309
Translation schemes (cont.) If both inherited and synthesized attributes are used the following needs to be taken into consideration: 1 An inherited attribute of a symbol on the right side of a production has to be calculated in an action which is positioned to the left of the symbol 2 An action may not reference a synthesized attribute belonging to a symbol which is positioned to the right of the action 3 A synthesized attribute of a nonterminal on the left side can only be calculated if all referenced attributes have already been calculated actions like these are usually placed at the end of the right side F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 128 / 309
Example translation scheme S A 1 A 2 {A 1.in := 1; A 2.in := 2} A a {print(a.in)} Above grammar does not fulfill the three conditions for translation schemes The inherited attribute A.in is not yet defined at the point in time when it should be printed But: For each L-attributed grammar a translation scheme can be found which fulfills the three conditions, e.g.: S {A 1.in := 1} A 1 {A 2.in := 2} A 2 A a {print(a.in)} F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 129 / 309
Top-down translation Removal of left recursions in translation scheme is necessary E E 1 + T {E.val := E 1.val + T.val} E E 1 T {E.val := E 1.val T.val} Example: E T {E.val := T.val} T (E) {T.val := E.val} T num {T.val := num.val} F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 130 / 309
Example top-down translation E T {R.i := T.val} R {E.val := R.s} R + T {R 1.i := R.i + T.val} R 1 {R.s := R 1.s} R T {R 1.i := R.i T.val} R 1 {R.s := R 1.s} R ɛ {R.s := R.i} T ( E ) {T.val := E.val} T num {T.val := num.val} F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 131 / 309
Evaluation of 9-5+2 E T.val = 9 R.i = 9 num.val = 9 - T.val = 5 R.i = 4 num.val = 5 + T.val = 2 R.i = 6 num.val = 2 ε F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 132 / 309
Summary transformation Given translation scheme: A A 1 Y {A.a := g(a 1.a, Y.y)} A X {A.a := f(x.x)} After removal of left recursions: A XR R Y R ɛ Transformed scheme: A X {R.i := f(x.x)} R {A.a := R.s} R Y {R 1.i := g(r.i, Y.y)} R 1 {R.s := R 1.s} R ɛ {R.s := R.i} F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 133 / 309
Predictive parsing with schemes Input: syntax-directed translation scheme; Outp.: Syntax-directed translator 1 For each nonterminal A, construct a function that has a formal parameter for each inherited attribute of A and that returns the values of the synthesized attributes of A. This function has a local variable for each attribute of each grammar symbol that appears in a production for A. 2 As previously described (see predictive parsing), the code for nonterminal A decides what production to use based on the current input symbol. 3 The code for each production does the following (evaluation from left to right): 1 Token X with synthesized attribute x: Save the value of x in a variable X.x. Generate a call to match token X. 2 Nonterminal B: Generate c := B(b 1,..., b k ); b 1,..., b k variables for inherited attributes of B; c variable for synthesized attribute of B. 3 For an action, copy the code into the parser, replacing each reference to an attribute by the variable for that attribute. F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 134 / 309
Example - predictive parsing Grammar: E T {R.i := T.val} R {E.val := R.s} R op T {R 1.i := mknode(op.lexeme, R.i, T.nptr)} R 1 {R.s := R 1.s} R ɛ {R.s := R.i} T ( E ) {T.val := E.val} T num {T.val := num.val} Functions: function E : node function R(i : node) : node function T : node F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 135 / 309
Parsing procedure R Procedure without translation scheme procedure R() begin if lookahead = op then begin match(op); T (); return R() end else begin return; end end F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 136 / 309
Parsing function R function R (i: node) : node var nptr, i1, s1, s: node; oplexeme : char; begin if lookahead = op then begin oplexeme := lexval; match(op); nptr := T (); i1 := mknode(oplexeme,i,nptr); s1 := R(i1); s := s 1 end else s := i; return s end F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 137 / 309
Bottom-up with inherited attribute Implementation of L-attributed grammars in bottom-up parsers For LL(1)-grammars and many LR(1)-grammars Removal of embedding actions from translation schemes: Actions have to be placed at end of right side of a production Ensured by new marker nonterminals Example: E T R R +T {print( + )}R T {print( )}R ɛ T num{print(num.val)} E T R R +T MR T NR ɛ T num{print(num.val)} M ɛ{print( + )} N ɛ{print( )} F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 138 / 309
Inherited attributes on the stack Idea: Production A XY, synthesized attribute X.x and inherited attribute Y.y Before a reduction (of X Y ), X.x is on the stack In the case of Y.y = X.x (copy action), the value of X.x can be used whenever the value of Y.y is required Example: Parser for variable declarations real p,q,r F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 139 / 309
Variable declaration - example D T {L.in := T.type} L T int {T.type := integer} T real {T.type := real} L L 1 {L 1.in := L.in}, id {addtype(id.entry, L.in)} L id {addtype(id.entry, L.in)} T real type in D L p in L, in L, q r F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 140 / 309
Calculation using the stack Input state Production Used real p,q,r p,q,r real p,q,r T T real,q,r T p,q,r T L L id q,r T L,,r T L, q,r T L L L, id r T L, T L, r T L L L, id D D T L Implementation: Production Code Fragment D T L T int val[top] := integer T real val[top] := real L L, id addtype(val[top], val[top 3]) L id addtype(val[top], val[top 1]) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 141 / 309
Problems Positions of attributes on the stack need to be known When the reduction C c is conducted, it is unknown Production Semantic Rule whether the value of C.i is S aac C.i := A.s located in val[top 1] or in S aabc C.i := A.s val[top 2]! It depends on C c C.s := g(c.i) whether a B is located on the stack. Solution: Introduction of a marker M: S aac C.i := A.s S aabmc M.i := A.s; C.i := M.s C c C.s := g(c.i) M ɛ M.s := M.i F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 142 / 309
Problems (cont.) Simulation of semantic rules which are no copy actions Usage of marker! S aanc N.i := A.s; C.i := N.s S aac C.i := f(a.s) N ɛ N.s := f(n.i) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 143 / 309
Bottom-up parsing...... with calculation of inherited attributes Input: L-attributed definition (and LL(1)-grammar) Output: Parser, which calculates attribute values on stack 1 Assumptions: Each nonterminal A has an inherited attribute A.i, each grammar symbol X has a synthesized attribute X.s. If X is a terminal, then X.s is the lexical value of X (supplied by the lexical analyser). The values are stored on the stack in form of an array val. 2 For each production A X 1... X n create n new markers (nonterminals) M 1,..., M n and replace the production with A M 1 X 1... M n X n. Note: synthesized values for X i are stored in the val array entry, which belongs to X i. Inherited values X i.i are stored in entries which are associated to M i. 3 Invariant: The new inherited attribute A.i (if existing) is always directly beneath the position of M 1 within the val array. F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 144 / 309
Simplifications Reduction of markers: 1 If X j has no inherited attribute, then no marker M j is required positions of attributes on the stack are shifting! 2 If X 1.i exists and is calculated by X 1.i = A.i, then M 1 is not required F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 145 / 309
Removal of inherited attributes Replacement of inherited attributes by synthesized ones Not always possible Requires modification of grammar! Example: Declarations in Pascal D L : T T integer char L L, id id convert to: D idl L, idl : T T integer char F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 146 / 309
Difficult syntax directed definition The following definition cannot be processed by bottom-up parsers using current approaches S L L.count := 0 L L 1 1 L 1.count := L.count + 1 L ɛ print(l.count) Reason: L ɛ receives the number of 1s by means of inheritance However, as L ɛ is used in the reduction first, no value is specified yet! F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 147 / 309
Recursive evaluators Evaluation of attributes Based on parse tree Not possible in conjunction with parsing Order of nodes which are visited during evaluation is arbitrary For each nonterminal a translation function exists Extensions may visit nodes more than once Order of node visits needs to regard the following: 1 Each inherited attribute of a node has to be calculated before the node is visited 2 Synthesized attributes are calculated before the node is left (for the last time) Order is determined by dependencies F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 148 / 309
Example Recursive evaluators Production A LM A QR i A s Semantic Rules L.i := l(a.i) M.i := m(l.s) A.s := f(m.s) R.i := r(a.i) Q.i := q(r.s) A.s := f(q.s) i L s i M s i Q s i R s i A s function A(n, ai) if production(n) = A LM then li := l(ai) ls := L(child(n, 1), li) mi := m(ls) ms := M(child(n, 2), mi) return f(ms) if production(n) = A QR then ri := r(ai) rs := R(child(n, 2), ri) qi := q(rs) qs := Q(child(n, 1), qi) return f(qs) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 149 / 309
PART 5 - TYPE CHECKING F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 150 / 309
Static Program Checking Type Checks Check of the used type. Error if operands are incompatible with the used operator. Example: 1.2 + 2 (real + int). Flow-of-Control Checks Check if the transfer of the program execution is possible. Example: break needs an enclosing loop. goto label needs a defined label. Uniqueness Checks Check if an object has been defined exactly once. Example: In Pascal each identifier must be unique. Name-related Checks In some languages, names (e.g. for procedures) are used which need to occur at a different location (e.g. at the end of a procedure). F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 151 / 309
Tasks Check if the type system of the language is satisfied. Separate type checker is not always necessary. token stream parse tree parser type syntax tree checker intermediate code generator intermediate representation Typesystems (Examples): If both operands of the arithmetic operators of addition, subtraction and multiplication are of type integer, then the result is of type integer The result of the unary & operator is a pointer to the object referred to by the operand. If the type of the operand is..., the type of the result is pointer of.... F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 152 / 309
Type Expressions A type expression is: 1 a Basic Type integer, boolean, char, and real as well as a special Basic Type type error or void. 2 the Type Name 3 a composite type in the form of: 1 Arrays. array(i, T ); set of indexes I, type T 2 Products. T 1 T 2 3 Records. record((n 1 T 1 )... (N k T k )); name N i, types T i 4 Pointers. pointer(t ) 5 Functions. T 1 T 2 4 and type variables. F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 153 / 309
Types Examples type row = record address: integer; lexeme: array[1..15] of char end; var table: array[1..101] of row; row can be represented as record((address integer), (lexeme array(1..15, char))). function f(a,b: char): integer; is represented as: char char pointer(integer). F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 154 / 309
Graphical Representation of Types as DAG (Directed Acyclic Graph) pointer char or as a tree integer pointer char char integer F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 155 / 309
Typesystems Set of Rules specified using attributed grammars (or verbally) Static vs. Dynamic Checking of Types Sound Typesystem = static type checking is sufficient Language is strongly typed = the compiler guarantees that an accepted program runs without type errors. But some checks can only occur dynamically table: array[0..255] of char; i: integer; The correctness of the call table[i] in the program can not be checked by the compiler. Error Recovery is important (even for type errors) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 156 / 309
Type Checker Spec Language Definition: Example: P D ; E D D ; D id : T T char integer array [ num ] of T T E literal num id E mod E E[E] E key: integer ; key mod 1999 array [256] of char array(1... 256, char) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 157 / 309
1. Secure Type Info Production P D ; E D D ; D D id : T T char T integer T array [ num ] of T 1 T T 1 Semantic Rule {addtype( id.entry, T.type)} {T.type := char} {T.type := integer} {T.type := array(1... num.val, T 1.type)} {T.type := pointer(t 1.type)} F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 158 / 309
2. Type Checking Expressions Production Semantic Rule E literal E.type := char E num E.type := integer E id E.type := lookup( id.entry) if E 1.type = integer and E E 1 mod E 2 E.type := E 2.type = integer then integer else type error if E 1.type = array(s, t) and E E 1 [E 2 ] E.type := E 2.type = integer then t { else type error if E1.type = pointer(t) then t E E 1 E.type := else type error F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 159 / 309
3. Type Checking Statements Production Semantic { Rule if id.type = E.type then void S id := E S.type := { else type error if E.type = boolean then S1.type S if E then S 1 S.type := { else type error if E.type = boolean then S1.type S while E do S 1 S.type := else type error if S 1.type = void and S S 1 ; S 2 S.type := S 2.type = void then void else type error F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 160 / 309
4. Type Checking Functions T T T Syntax Extension: E E ( E ) Type Extraction + Type Checking: Definition FunctionCall Production Semantic Rule T T 1 T 2 T.type := T 1.type T 2.type if E 1.type = s t and E E 1 ( E 2 ) E.type := E 2.type = s then t else type error Example: root : ((real real) real) real F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 161 / 309
Type Equivalence When are types equivalent??? structural equivalence name equivalence F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 162 / 309
Structural Equivalence (1) function sequiv(s, t) : boolean; begin (2) if s and t are the same basic type then (3) return true (4) else if s = array(s 1, s 2 ) and t = array(t 1, t 2 ) then (5) return s 1 = t 1 and sequiv(s 2, t 2 ) (6) else if s = s 1 s 2 and t = t 1 t 2 then (7) return sequiv(s 1, t 1 ) and sequiv(s 2, t 2 ) (8) else if s = pointer(s 1 ) and t = pointer(t 1 ) then (9) return sequiv(s 1, t 1 ) (10) else if s = s 1 s 2 and t = t 1 t 2 then (11) return sequiv(s 1, t 1 ) and sequiv(s 2, t 2 ) (12) else return false end F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 163 / 309
Encoding of Type Expressions Expression as Bit vector (efficient storage and comparison) Example: Type Constructor Encoding Basic Type Encoding pointer 01 boolean 0000 array 10 char 0001 freturns 11 integer 0010 real 0011 Type expression Encoding char 00 00 00 0001 f returns(char) 00 00 11 0001 pointer(f returns(char)) 00 01 11 0001 array(pointer(f returns(char))) 10 01 11 0001 F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 164 / 309
Name vs. structural Equivalence Example (Pascal Programm) type link = cell; var next: link; last : link; p : cell; q,r : cell; Do all variables have the same type? Depends on the typesystem (and the compiler in pascal!) Implementation of the above example creates implicit types (e.g. type np : cell for variable p). F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 165 / 309
Cyclic Typedefinition (Example) type link = cell; cell = record info: integer; next: link; end; cell = record cell = record info integer next pointer info integer next pointer cell F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 166 / 309
Type conversion / Coercions Statement of the problem: x+i with x as a real- and i as an integer variable. There exist only operators for (real + real) or (int + int) Type conversion necessary! x = int2real(i) Implicit (by the compiler) or explicit (by the programmer) possible Implicit = Coercion Loss of information should be prevented (int real but not real int). Performance!!! for I := 1 to N do X[I] := int2real(1) (PASCAL; X is an array of reals) needs 48,4 µs for I := 1 to N do X[I] := 1.0 needs only 5,4 µs. F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 167 / 309
Type conversion - Semantic Rules (1) Production E id Semantic Rule E.type := lookup(id.entry) E.txt := id.entry E E 1 op E 2 E.type := if E 1.type = integer and E 2.type = integer then integer else if E 1.type = integer and E 2.type = real then real else if E 1.type = real and E 2.type = integer then real else if E 1.type = real and E 2.type = real then real else type error F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 168 / 309
Type conversion - Semantic Rules (2) Production Semantic Rule E E 1 op E 2 E.txt := if E 1.type = integer and E 2.type = integer then E 1.txt E 2.txt else if E 1.type = integer and E 2.type = real then int2real(e 1.txt) E 2.txt else if E 1.type = real and E 2.type = integer then E 1.txt int2real(e 2.txt) else if E 1.type = real and E 2.type = real then E 1.txt E 2.txt else type error E num E.type := integer E.txt := val E num.num E.type := real E.txt := val F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 169 / 309
Overloading Symbols with different meaning (dependent on application context) mathematics: + operator (integer, reals, complex numbers) ADA: ()-Expression for array access AND function calls Overloading is resolved, when the meaning is clear (operator identification) Overloading can often be resolved by the types of operands. F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 170 / 309
Overloading - Possible Types Example (ADA): function "*"(i,j: integer) return complex; function "*"(i,j: complex) return complex; integer integer integer Possible types for * are: integer integer complex complex complex complex Assumption: 2,3,5 are integer 3*5 is either integer or complex. So 2 * (3 * 5) must be of type integer. (3*5)*z is of type complex, if z is of type complex. F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 171 / 309
Handling Overloading Instead of a type the set of all possible types must be stored in an attribute. Attribute types! E E E id E E 1 ( E 2 ) Example: 3*5 E.types = E.types E.types = {lookup( id.entry)} E.types = {t s E 2.types s t E 1.types} E: {i,c} E: {i} *: E: {i} { i i i, i i c, 3: {i} 5: {i} c c c } F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 172 / 309
Uniqueness of Types Expressions may only have one type (otherwise type error) Production Semantic Rule E E E.types := E.types E.unique := if E.types = {t} then t else type error E id E.types := {lookup( id.entry)} E.types := {s s E 2.types (s s ) E 1.types} t := E.unique E E 1 ( E 2 ) S := {s s E 2.types s t E 1.types} E 2.unique := if S = {s} then s else type error E 1.unique := if S = {s} then s t else type error F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 173 / 309
Polymorphic Functions Polymorphic Function = Function, whose argument may have a arbitrary type Polymorphic refers to functions and operators Examples: Built-in operators for array-access, pointer manipulation Reason for polymorphism: Code can be used for various data structures Example: finding the length of lists (e.g. ML) fun length(lptr) = if null(lptr) then 0 else length(tl(lptr)) length([sun,mon,tue]), length([1,2,3,4]) not possible in PASCAL! F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 174 / 309
Type Variables Variables, that allow us to talk about unknown types Note: Type Variables as greek letters α, β,.... Type Inference = problem of deciding the type of an expression taking into account the application (of the expression). Example type link cell; procedure mlist ( lptr : link; procedure p) begin while lptr <> nil do begin p(lptr); lptr := lptr.next end end; mlist: link procedure void p: link void F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 175 / 309
Example - Type Inference Program: function deref(p); begin return p end; Derivation: 1 Type of p is β (Assumption) 2 From p follows that p must be a pointer. Therefore it holds: β = pointer(α). 3 Furthermore, we know that the type of p must be α. 4 Therefore, it follows: α : pointer(α) α is the type of the function deref. F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 176 / 309
Language for Polymorphism Type expression of the form α.e(α) denotes a polymorph type. Language definition: P D ; E D D ; D id : Q Q type variable. Q T T T T T T (T ) unary constructor ( T ) basic type type variable E E ( E ) E, E id F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 177 / 309
Example deref : α.pointer(α) α ; q : pointer(pointer(integer)) ; deref(deref(q)) apply : α 0 deref 0 : pointer(α 0) α 0 apply : α i deref i : pointer(α i) α i q : pointer(pointer(integer)) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 178 / 309
Differences in Type Handling In distinction from former type handling (without polymorphism): 1 Arguments of polymorph functions in an expression may have different types. 2 The concept of type equivalence is different. pointer(α) = pointer(pointer(integer))??? 3 Calculated Types must be used in further consequence. The effect of the unification of two expressions must be preserved. α is assigned the type t. If α is referenced elsewhere, t must be used! Terms: Substitution, Instances, Unification F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 179 / 309
Substitution, Instances Substitution = function that maps type variables to type expressions. S : type variables type expressions Example: α pointer(integer) Application of a substitution: function subst(t : type expression) : type expression begin if t is a basic type then return t else if t is a variable then return S(t) else if t is t 1 t 2 then return subst(t 1 ) subst(t 2 ) end S(t)... Instance. We write s < t s is instance of t. F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 180 / 309
Examples Instances: pointer(integer) < pointer(α) pointer(real) < pointer(α) integer integer < α α pointer(α) < β α < β No Instances: integer real substitution on Basic Types not possible integer real α α inconsistent replacement of α integer α α α all occurrences must be replaced F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 181 / 309
Unification 2 types t 1, t 2 are unifiable if there exists a substitution S, so that S(t 1 ) = S(t 2 ) holds. In praxis, we are interested in the Most General Unifier (MGU). 1 S(t 1 ) = S(t 2 ) 2 Every substitution S with S (t 1 ) = S (t 2 ) must be an instance of S. F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 182 / 309
Checking Polymorphic Functions 2 Functions: 1 f resh(t) replaces all Variables in the type expression t with new variables. A pointer to the node representing the new expression is returned. 2 unify(m, n) unifies the two expressions m and n. As a side effect the substitution is performed. Translation Schema: Production Semantic Rule p := mkleaf(newtypevar); E E 1 ( E 2 ) unify(e 1.type, mknode(, E 2.type, p)); E.type := p E E 1, E 2 E.type := mknode(, E 1.type, E 2.type) E id E.type := f resh( id.type) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 183 / 309
Example Type Checking apply : α 0 deref 0 : pointer(α 0 ) α 0 apply : α i deref i : pointer(α i ) α i q : pointer(pointer(integer)) β Summary (Bottom-up type detection): Expression : Type Substitution q : pointer(pointer(integer)) deref i : pointer(α i ) α i deref i (q) : pointer(integer) α i = pointer(integer) deref 0 : pointer(α 0 ) α 0 deref 0 (deref i (q)) : integer α 0 = integer F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 184 / 309
Unification Algorithm Input. A graph and a pair of nodes m and n, which should be unified. Output. True, if the nodes can be unified, False otherwise. Method. A node is represented by the record [constructor, lef t, right, set], where set is the Set of equivalent nodes. A node of set is chosen as representative of this set. In the beginning, each set contains only the node itself. find(n) returns the representative node union(m,n) merges the equivalence sets. The new representative node is a node which does not correspond with a variable. If there exists no such node, a former representative node is chosen as the new one. F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 185 / 309
Algorithm - Pseudocode function unify(m, n : node) : boolean begin s := find(m); t := find(n); if s = t then return true else if s and t are nodes that represent the same basic type then return true else if s is an op-node with children s 1, s 2 and t is an op-node with children t 1, t 2 then begin union(s, t); return unify(s 1, t 1) and unify(s 2, t 2) end else if s or t represents a variable then begin union(s, t) return true end else return false end F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 186 / 309
Example Unification Type expression: ((α 1 α 2 ) list(α 3 )) list(α 2 ) ((α 3 α 4 ) list(α 3 )) α 5.: 1 : 9 : 2 list : 8 : 10 α 5 : 14 : 3 list : 6 : 11 list : 13 α 1 : 4 α 2 : 5 α 3 : 7 α 4 : 12 Question: unif y(1, 9) =? F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 187 / 309