PART 4 - SYNTAX DIRECTED TRANSLATION. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Similar documents
PART 4 - SYNTAX DIRECTED TRANSLATION. F. Wotawa TU Graz) Compiler Construction Summer term / 264

Syntax-Directed Translation

Compilers. 5. Attributed Grammars. Laszlo Böszörmenyi Compilers Attributed Grammars - 1

Syntax-Directed Translation Part I

Type systems. Static typing

Abstract Syntax Tree

Syntax-Directed Translation. Concepts Introduced in Chapter 5. Syntax-Directed Definitions

Type Checking. Error Checking

Principles of Programming Languages

Principles of Programming Languages

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

[Syntax Directed Translation] Bikash Balami

Syntax-Directed Translation

Syntax-Directed Translation. Introduction

Syntax-Directed Translation

Syntax-Directed Translation Part II

Lecture Compiler Construction

Formal Languages and Compilers Lecture IX Semantic Analysis: Type Chec. Type Checking & Symbol Table

Static Checking and Type Systems

Syntax Directed Translation

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 8a Syntax-directed Transla1on Elias Athanasopoulos

Compilers. Type checking. Yannis Smaragdakis, U. Athens (original slides by Sam

Lecture 7: Type Systems and Symbol Tables. CS 540 George Mason University

Compiler Principle and Technology. Prof. Dongming LU April 15th, 2019

Syntax-Directed Translation

Compilerconstructie. najaar Rudy van Vliet kamer 124 Snellius, tel rvvliet(at)liacs.

Syntax Directed Translation

5. Syntax-Directed Definitions & Type Analysis

Concepts Introduced in Chapter 6

Chapter 6 Intermediate Code Generation

Concepts Introduced in Chapter 6

Intermediate Code Generation Part II

Type Checking. Outline. General properties of type systems. Types in programming languages. Notation for type rules.

Outline. General properties of type systems. Types in programming languages. Notation for type rules. Common type rules. Logical rules of inference

Lecture 14 Sections Mon, Mar 2, 2009

Syntax-Directed Translation

COP5621 Exam 3 - Spring 2005

Semantic Analysis Attribute Grammars

Syntax-Directed Translation. CS Compiler Design. SDD and SDT scheme. Example: SDD vs SDT scheme infix to postfix trans

Semantic Analysis computes additional information related to the meaning of the program once the syntactic structure is known.

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

Syntactic Directed Translation

5. Semantic Analysis!

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Intermediate Code Generation

COMP 181. Agenda. Midterm topics. Today: type checking. Purpose of types. Type errors. Type checking

A programming language requires two major definitions A simple one pass compiler

Type checking of statements We change the start rule from P D ; E to P D ; S and add the following rules for statements: S id := E

Syntax-directed translation. Context-sensitive analysis. What context-sensitive questions might the compiler ask?

Type Systems. Seman&cs. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class

CA Compiler Construction

Summary: Semantic Analysis

10/18/18. Outline. Semantic Analysis. Two types of semantic rules. Syntax vs. Semantics. Static Semantics. Static Semantics.

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis?

Semantic analysis and intermediate representations. Which methods / formalisms are used in the various phases during the analysis?

Context-sensitive analysis. Semantic Processing. Alternatives for semantic processing. Context-sensitive analysis

Type Checking and Type Inference

Semantic Analysis and Type Checking

Semantic Processing (Part 2)

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

A Simple Syntax-Directed Translator

COP4020 Programming Languages. Semantics Prof. Robert van Engelen

Intermediate Code Generation

5. Semantic Analysis. Mircea Lungu Oscar Nierstrasz

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Abstract Syntax Trees Synthetic and Inherited Attributes

Intermediate Code Generation

Static Semantics. Winter /3/ Hal Perkins & UW CSE I-1

Semantic Analysis. CSE 307 Principles of Programming Languages Stony Brook University

CS558 Programming Languages

More On Syntax Directed Translation

Formal Languages and Compilers Lecture X Intermediate Code Generation

Chapter 4. Action Routines

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

COP4020 Programming Languages. Semantics Robert van Engelen & Chris Lacher

The compilation process is driven by the syntactic structure of the program as discovered by the parser

A simple syntax-directed

Question Bank. 10CS63:Compiler Design

UNIT IV INTERMEDIATE CODE GENERATION

LR Parsing LALR Parser Generators

intermediate-code Generation

Type Checking. CS308 Compiler Theory 1

Evaluation of Semantic Actions in Predictive Non- Recursive Parsing

5. Semantic Analysis. Mircea Lungu Oscar Nierstrasz

There is a level of correctness that is deeper than grammar. There is a level of correctness that is deeper than grammar

The Compiler So Far. Lexical analysis Detects inputs with illegal tokens. Overview of Semantic Analysis

CSE302: Compiler Design

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

COP4020 Spring 2011 Midterm Exam

10/26/17. Attribute Evaluation Order. Attribute Grammar for CE LL(1) CFG. Attribute Grammar for Constant Expressions based on LL(1) CFG

CS 314 Principles of Programming Languages

We now allow any grammar symbol X to have attributes. The attribute a of symbol X is denoted X.a

Symbol Tables. For compile-time efficiency, compilers often use a symbol table: associates lexical names (symbols) with their attributes

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Functional Programming. Pure Functional Programming

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Type Checking. Chapter 6, Section 6.3, 6.5

Compiler Lab. Introduction to tools Lex and Yacc

Transcription:

PART 4 - SYNTAX DIRECTED TRANSLATION F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 109 / 309

Setting Translation of context-free languages Information attributes of grammar symbols Values of attributes are defined by semantic rules 2 possibilities: Syntax directed definitions (high-level spec) Translation schemes (implementation details) Evaluation: (1) Parse input, (2) Generate parse tree, (3) Evaluate parse tree F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 110 / 309

Syntax directed definitions Generalization of context-fee grammars Each grammar symbol has a set of attributes Synthesized vs. inherited attributes Attribute: string, number, type, memory location,... Value of attribute is defined by semantic rules Synthesized: Value of child node in parse tree Inherited: Value of parent node in parse tree Semantic rules define dependencies between attributes Dependency graph defines calculation order of semantic rules Semantic rules can have side effects F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 111 / 309

Form of a syntax directed definition Grammar production: A α Associated semantic rule: b := f(c 1,..., c k ) f is a function Synthesized: b is a synthesized attribute of A and c 1,..., c k are grammar symbols of the production Inherited: b is an inherited attribute of a grammar symbol on the right side of the production and c 1,..., c k are grammar symbols of the production b depends on c 1,..., c k F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 112 / 309

Example Calculator -program: val is a synthesized attribute for nonterminals E, T and F Production L En E E 1 +T E T T T 1 *F T F F (E) F digit Semantic Rule print(e.val) E.val := E 1.val + T.val E.val := T.val T.val := T 1.val F.val T.val := F.val F.val := E.val F.val := digit.lexval F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 113 / 309

S-attributed grammar Attributed grammar exclusively using synthesized attributes Example-evaluation: 3*5+4n (annotated parse tree) L E.val=19 n E.val=15 + T.val=4 T.val=15 F.val=4 T.val=3 * F.val=5 digit.lexval=4 F.val=3 digit.lexval=5 digit.lexval=3 F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 114 / 309

Inherited attributes Definition of dependencies of program language constructs and their context Example: (type checking) Production D T L T int T real L L 1, id L id Semantic Rule L.in := T.type T.type := integer T.type := real L 1.in := L.in addtype(id.entry, L.in) addtype(id.entry, L.in) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 115 / 309

Inherited attributes Annotated parse tree real id 1, id 2, id 3 D T.type=real L.in=real real L.in=real, id 3 L.in=real, id 2 id 1 F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 116 / 309

Dependency graphs Show dependencies between attributes Each rule is represented in the form b := f(c 1,..., c k ) Nodes correspond to attributes; edges to dependencies Definition: for each node n in the parse tree do for each attribute a of the grammar symbol at n do construct a node in the dependency graph for a for each node n in the parse tree do for each semantic rule b := f(c 1,..., c k ) associated with the production used at n do for i := 1 to k do construct an edge from the node for c i to the node for b F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 117 / 309

Dependency graph Example D T type 4 5 in L 6 real in 7 L 8, id 3 3 entry in 9 L, 10 id 2 2 entry id 1 1 entry F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 118 / 309

Topological sort Arrangement of m 1,..., m k nodes in a directed, acyclic graph where edges point from smaller nodes to bigger nodes If m i m j is an edge, then the node m i is smaller than the node m j Important for order in which the attributes are calculated Example (cont.): 1 a 4 := real 2 a 5 := a 4 3 addtype(id 3.entry, a 5 ) 4 a 7 := a 5 5 addtype(id 2.entry, a 7 ) 6 a 9 := a 7 7 addtype(id 1.entry, a 9 ) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 119 / 309

Example - syntax trees Abstract syntax tree = simplified form of a parse tree Operators and keywords are supplied to intermediate nodes by leaf nodes Productions with only one element can collapse Examples: if-then-else + B S S 1 2 * 3 5 4 F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 120 / 309

Syntax trees Expressions Functions (return value: pointer to new node): mknode(op, left, right): node label op, 2 child nodes left, right mkleaf(id, entry): leaf id, entry in symbol table entry mkleaf(num, val): leaf num, value val Syntax directed definition: Production E E 1 + T E E 1 T E T T (E) T id T num Semantic Rule E.nptr := mknode( +, E 1.nptr, T.nptr) E.nptr := mknode(, E 1.nptr, T.nptr) E.nptr := T.nptr T.nptr := E.nptr T.nptr := mkleaf(id, id.entry) T.nptr := mkleaf(num, num.val) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 121 / 309

Syntax trees Expressions (ex.) Syntax tree for a-4+c E nptr E nptr + T nptr E - T nptr id + T nptr num id - id to entry for c id num 4 to entry for a F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 122 / 309

Evaluation of S-attributed definitions Attributed definition exclusively using synthesized attributes Evaluation using bottom-up parser (LR-parser) Idea: store attribute information on stack State Val Semantic rule:...... A.a := f(x.x, Y.y, Z.z) X X.x Production: A XY Z Y Y.y Before XY Z is reduced to A, value top Z Z.z of Z.z stored in val[top], Y.y stored...... in val[top 1], X.x in val[top 2] F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 123 / 309

Example - S-attributed evaluation Calculator -example: Production Code Fragment L En print(val[top 1]) E E 1 + T val[ntop] := val[top 2] + val[top] E T T T 1 F val[ntop] := val[top 2] val[top] T F F (E) val[ntop] := val[top 1] F digit Code executed before reduction ntop = top r + 1, after reduction: top := ntop F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 124 / 309

Result for 3*5+4n Input state val Production used 3*5+4n *5+4n 3 3 *5+4n F 3 F digit *5+4n T 3 T F 5+4n T * 3 +4n T * 5 3 5 +4n T * F 3 5 F digit +4n T 15 T T F +4n E 15 E T 4n E + 15 n E + 4 15 4 n E + F 15 4 F digit n E + T 15 4 T F n E 19 E E + T E n 19 L 19 L En F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 125 / 309

L-attributed definitions Definition: A syntax directed definition is L-attributed if each inherited attribute of X j, 1 j n, on the right side of A X 1,..., X n is only dependent on: 1 the attributes X 1,..., X j 1 to the left of X j and 2 the inherited attributes of A Each S-attributed grammar is a L-attributed grammar Evaluation using depth-first order procedure df visit(n : node) for each child m of n, from left to right do evaluate inherited attributes of m df visit(m) end evaluate synthesized attributes of n end F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 126 / 309

Translation schemes Translation scheme = context-free language with attributes for grammar symbols and semantic actions which are placed on the right side of a production between grammar symbols and are confined within {} Example: T T 1 F {T.val := T 1.val F.val} If only synthesized attributes are used, the action is always placed at the end of the right side of a production Note: Actions may not access attributes which are not calculated yet (limits positions of semantic actions) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 127 / 309

Translation schemes (cont.) If both inherited and synthesized attributes are used the following needs to be taken into consideration: 1 An inherited attribute of a symbol on the right side of a production has to be calculated in an action which is positioned to the left of the symbol 2 An action may not reference a synthesized attribute belonging to a symbol which is positioned to the right of the action 3 A synthesized attribute of a nonterminal on the left side can only be calculated if all referenced attributes have already been calculated actions like these are usually placed at the end of the right side F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 128 / 309

Example translation scheme S A 1 A 2 {A 1.in := 1; A 2.in := 2} A a {print(a.in)} Above grammar does not fulfill the three conditions for translation schemes The inherited attribute A.in is not yet defined at the point in time when it should be printed But: For each L-attributed grammar a translation scheme can be found which fulfills the three conditions, e.g.: S {A 1.in := 1} A 1 {A 2.in := 2} A 2 A a {print(a.in)} F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 129 / 309

Top-down translation Removal of left recursions in translation scheme is necessary E E 1 + T {E.val := E 1.val + T.val} E E 1 T {E.val := E 1.val T.val} Example: E T {E.val := T.val} T (E) {T.val := E.val} T num {T.val := num.val} F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 130 / 309

Example top-down translation E T {R.i := T.val} R {E.val := R.s} R + T {R 1.i := R.i + T.val} R 1 {R.s := R 1.s} R T {R 1.i := R.i T.val} R 1 {R.s := R 1.s} R ɛ {R.s := R.i} T ( E ) {T.val := E.val} T num {T.val := num.val} F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 131 / 309

Evaluation of 9-5+2 E T.val = 9 R.i = 9 num.val = 9 - T.val = 5 R.i = 4 num.val = 5 + T.val = 2 R.i = 6 num.val = 2 ε F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 132 / 309

Summary transformation Given translation scheme: A A 1 Y {A.a := g(a 1.a, Y.y)} A X {A.a := f(x.x)} After removal of left recursions: A XR R Y R ɛ Transformed scheme: A X {R.i := f(x.x)} R {A.a := R.s} R Y {R 1.i := g(r.i, Y.y)} R 1 {R.s := R 1.s} R ɛ {R.s := R.i} F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 133 / 309

Predictive parsing with schemes Input: syntax-directed translation scheme; Outp.: Syntax-directed translator 1 For each nonterminal A, construct a function that has a formal parameter for each inherited attribute of A and that returns the values of the synthesized attributes of A. This function has a local variable for each attribute of each grammar symbol that appears in a production for A. 2 As previously described (see predictive parsing), the code for nonterminal A decides what production to use based on the current input symbol. 3 The code for each production does the following (evaluation from left to right): 1 Token X with synthesized attribute x: Save the value of x in a variable X.x. Generate a call to match token X. 2 Nonterminal B: Generate c := B(b 1,..., b k ); b 1,..., b k variables for inherited attributes of B; c variable for synthesized attribute of B. 3 For an action, copy the code into the parser, replacing each reference to an attribute by the variable for that attribute. F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 134 / 309

Example - predictive parsing Grammar: E T {R.i := T.val} R {E.val := R.s} R op T {R 1.i := mknode(op.lexeme, R.i, T.nptr)} R 1 {R.s := R 1.s} R ɛ {R.s := R.i} T ( E ) {T.val := E.val} T num {T.val := num.val} Functions: function E : node function R(i : node) : node function T : node F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 135 / 309

Parsing procedure R Procedure without translation scheme procedure R() begin if lookahead = op then begin match(op); T (); return R() end else begin return; end end F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 136 / 309

Parsing function R function R (i: node) : node var nptr, i1, s1, s: node; oplexeme : char; begin if lookahead = op then begin oplexeme := lexval; match(op); nptr := T (); i1 := mknode(oplexeme,i,nptr); s1 := R(i1); s := s 1 end else s := i; return s end F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 137 / 309

Bottom-up with inherited attribute Implementation of L-attributed grammars in bottom-up parsers For LL(1)-grammars and many LR(1)-grammars Removal of embedding actions from translation schemes: Actions have to be placed at end of right side of a production Ensured by new marker nonterminals Example: E T R R +T {print( + )}R T {print( )}R ɛ T num{print(num.val)} E T R R +T MR T NR ɛ T num{print(num.val)} M ɛ{print( + )} N ɛ{print( )} F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 138 / 309

Inherited attributes on the stack Idea: Production A XY, synthesized attribute X.x and inherited attribute Y.y Before a reduction (of X Y ), X.x is on the stack In the case of Y.y = X.x (copy action), the value of X.x can be used whenever the value of Y.y is required Example: Parser for variable declarations real p,q,r F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 139 / 309

Variable declaration - example D T {L.in := T.type} L T int {T.type := integer} T real {T.type := real} L L 1 {L 1.in := L.in}, id {addtype(id.entry, L.in)} L id {addtype(id.entry, L.in)} T real type in D L p in L, in L, q r F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 140 / 309

Calculation using the stack Input state Production Used real p,q,r p,q,r real p,q,r T T real,q,r T p,q,r T L L id q,r T L,,r T L, q,r T L L L, id r T L, T L, r T L L L, id D D T L Implementation: Production Code Fragment D T L T int val[top] := integer T real val[top] := real L L, id addtype(val[top], val[top 3]) L id addtype(val[top], val[top 1]) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 141 / 309

Problems Positions of attributes on the stack need to be known When the reduction C c is conducted, it is unknown Production Semantic Rule whether the value of C.i is S aac C.i := A.s located in val[top 1] or in S aabc C.i := A.s val[top 2]! It depends on C c C.s := g(c.i) whether a B is located on the stack. Solution: Introduction of a marker M: S aac C.i := A.s S aabmc M.i := A.s; C.i := M.s C c C.s := g(c.i) M ɛ M.s := M.i F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 142 / 309

Problems (cont.) Simulation of semantic rules which are no copy actions Usage of marker! S aanc N.i := A.s; C.i := N.s S aac C.i := f(a.s) N ɛ N.s := f(n.i) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 143 / 309

Bottom-up parsing...... with calculation of inherited attributes Input: L-attributed definition (and LL(1)-grammar) Output: Parser, which calculates attribute values on stack 1 Assumptions: Each nonterminal A has an inherited attribute A.i, each grammar symbol X has a synthesized attribute X.s. If X is a terminal, then X.s is the lexical value of X (supplied by the lexical analyser). The values are stored on the stack in form of an array val. 2 For each production A X 1... X n create n new markers (nonterminals) M 1,..., M n and replace the production with A M 1 X 1... M n X n. Note: synthesized values for X i are stored in the val array entry, which belongs to X i. Inherited values X i.i are stored in entries which are associated to M i. 3 Invariant: The new inherited attribute A.i (if existing) is always directly beneath the position of M 1 within the val array. F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 144 / 309

Simplifications Reduction of markers: 1 If X j has no inherited attribute, then no marker M j is required positions of attributes on the stack are shifting! 2 If X 1.i exists and is calculated by X 1.i = A.i, then M 1 is not required F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 145 / 309

Removal of inherited attributes Replacement of inherited attributes by synthesized ones Not always possible Requires modification of grammar! Example: Declarations in Pascal D L : T T integer char L L, id id convert to: D idl L, idl : T T integer char F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 146 / 309

Difficult syntax directed definition The following definition cannot be processed by bottom-up parsers using current approaches S L L.count := 0 L L 1 1 L 1.count := L.count + 1 L ɛ print(l.count) Reason: L ɛ receives the number of 1s by means of inheritance However, as L ɛ is used in the reduction first, no value is specified yet! F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 147 / 309

Recursive evaluators Evaluation of attributes Based on parse tree Not possible in conjunction with parsing Order of nodes which are visited during evaluation is arbitrary For each nonterminal a translation function exists Extensions may visit nodes more than once Order of node visits needs to regard the following: 1 Each inherited attribute of a node has to be calculated before the node is visited 2 Synthesized attributes are calculated before the node is left (for the last time) Order is determined by dependencies F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 148 / 309

Example Recursive evaluators Production A LM A QR i A s Semantic Rules L.i := l(a.i) M.i := m(l.s) A.s := f(m.s) R.i := r(a.i) Q.i := q(r.s) A.s := f(q.s) i L s i M s i Q s i R s i A s function A(n, ai) if production(n) = A LM then li := l(ai) ls := L(child(n, 1), li) mi := m(ls) ms := M(child(n, 2), mi) return f(ms) if production(n) = A QR then ri := r(ai) rs := R(child(n, 2), ri) qi := q(rs) qs := Q(child(n, 1), qi) return f(qs) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 149 / 309

PART 5 - TYPE CHECKING F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 150 / 309

Static Program Checking Type Checks Check of the used type. Error if operands are incompatible with the used operator. Example: 1.2 + 2 (real + int). Flow-of-Control Checks Check if the transfer of the program execution is possible. Example: break needs an enclosing loop. goto label needs a defined label. Uniqueness Checks Check if an object has been defined exactly once. Example: In Pascal each identifier must be unique. Name-related Checks In some languages, names (e.g. for procedures) are used which need to occur at a different location (e.g. at the end of a procedure). F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 151 / 309

Tasks Check if the type system of the language is satisfied. Separate type checker is not always necessary. token stream parse tree parser type syntax tree checker intermediate code generator intermediate representation Typesystems (Examples): If both operands of the arithmetic operators of addition, subtraction and multiplication are of type integer, then the result is of type integer The result of the unary & operator is a pointer to the object referred to by the operand. If the type of the operand is..., the type of the result is pointer of.... F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 152 / 309

Type Expressions A type expression is: 1 a Basic Type integer, boolean, char, and real as well as a special Basic Type type error or void. 2 the Type Name 3 a composite type in the form of: 1 Arrays. array(i, T ); set of indexes I, type T 2 Products. T 1 T 2 3 Records. record((n 1 T 1 )... (N k T k )); name N i, types T i 4 Pointers. pointer(t ) 5 Functions. T 1 T 2 4 and type variables. F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 153 / 309

Types Examples type row = record address: integer; lexeme: array[1..15] of char end; var table: array[1..101] of row; row can be represented as record((address integer), (lexeme array(1..15, char))). function f(a,b: char): integer; is represented as: char char pointer(integer). F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 154 / 309

Graphical Representation of Types as DAG (Directed Acyclic Graph) pointer char or as a tree integer pointer char char integer F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 155 / 309

Typesystems Set of Rules specified using attributed grammars (or verbally) Static vs. Dynamic Checking of Types Sound Typesystem = static type checking is sufficient Language is strongly typed = the compiler guarantees that an accepted program runs without type errors. But some checks can only occur dynamically table: array[0..255] of char; i: integer; The correctness of the call table[i] in the program can not be checked by the compiler. Error Recovery is important (even for type errors) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 156 / 309

Type Checker Spec Language Definition: Example: P D ; E D D ; D id : T T char integer array [ num ] of T T E literal num id E mod E E[E] E key: integer ; key mod 1999 array [256] of char array(1... 256, char) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 157 / 309

1. Secure Type Info Production P D ; E D D ; D D id : T T char T integer T array [ num ] of T 1 T T 1 Semantic Rule {addtype( id.entry, T.type)} {T.type := char} {T.type := integer} {T.type := array(1... num.val, T 1.type)} {T.type := pointer(t 1.type)} F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 158 / 309

2. Type Checking Expressions Production Semantic Rule E literal E.type := char E num E.type := integer E id E.type := lookup( id.entry) if E 1.type = integer and E E 1 mod E 2 E.type := E 2.type = integer then integer else type error if E 1.type = array(s, t) and E E 1 [E 2 ] E.type := E 2.type = integer then t { else type error if E1.type = pointer(t) then t E E 1 E.type := else type error F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 159 / 309

3. Type Checking Statements Production Semantic { Rule if id.type = E.type then void S id := E S.type := { else type error if E.type = boolean then S1.type S if E then S 1 S.type := { else type error if E.type = boolean then S1.type S while E do S 1 S.type := else type error if S 1.type = void and S S 1 ; S 2 S.type := S 2.type = void then void else type error F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 160 / 309

4. Type Checking Functions T T T Syntax Extension: E E ( E ) Type Extraction + Type Checking: Definition FunctionCall Production Semantic Rule T T 1 T 2 T.type := T 1.type T 2.type if E 1.type = s t and E E 1 ( E 2 ) E.type := E 2.type = s then t else type error Example: root : ((real real) real) real F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 161 / 309

Type Equivalence When are types equivalent??? structural equivalence name equivalence F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 162 / 309

Structural Equivalence (1) function sequiv(s, t) : boolean; begin (2) if s and t are the same basic type then (3) return true (4) else if s = array(s 1, s 2 ) and t = array(t 1, t 2 ) then (5) return s 1 = t 1 and sequiv(s 2, t 2 ) (6) else if s = s 1 s 2 and t = t 1 t 2 then (7) return sequiv(s 1, t 1 ) and sequiv(s 2, t 2 ) (8) else if s = pointer(s 1 ) and t = pointer(t 1 ) then (9) return sequiv(s 1, t 1 ) (10) else if s = s 1 s 2 and t = t 1 t 2 then (11) return sequiv(s 1, t 1 ) and sequiv(s 2, t 2 ) (12) else return false end F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 163 / 309

Encoding of Type Expressions Expression as Bit vector (efficient storage and comparison) Example: Type Constructor Encoding Basic Type Encoding pointer 01 boolean 0000 array 10 char 0001 freturns 11 integer 0010 real 0011 Type expression Encoding char 00 00 00 0001 f returns(char) 00 00 11 0001 pointer(f returns(char)) 00 01 11 0001 array(pointer(f returns(char))) 10 01 11 0001 F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 164 / 309

Name vs. structural Equivalence Example (Pascal Programm) type link = cell; var next: link; last : link; p : cell; q,r : cell; Do all variables have the same type? Depends on the typesystem (and the compiler in pascal!) Implementation of the above example creates implicit types (e.g. type np : cell for variable p). F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 165 / 309

Cyclic Typedefinition (Example) type link = cell; cell = record info: integer; next: link; end; cell = record cell = record info integer next pointer info integer next pointer cell F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 166 / 309

Type conversion / Coercions Statement of the problem: x+i with x as a real- and i as an integer variable. There exist only operators for (real + real) or (int + int) Type conversion necessary! x = int2real(i) Implicit (by the compiler) or explicit (by the programmer) possible Implicit = Coercion Loss of information should be prevented (int real but not real int). Performance!!! for I := 1 to N do X[I] := int2real(1) (PASCAL; X is an array of reals) needs 48,4 µs for I := 1 to N do X[I] := 1.0 needs only 5,4 µs. F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 167 / 309

Type conversion - Semantic Rules (1) Production E id Semantic Rule E.type := lookup(id.entry) E.txt := id.entry E E 1 op E 2 E.type := if E 1.type = integer and E 2.type = integer then integer else if E 1.type = integer and E 2.type = real then real else if E 1.type = real and E 2.type = integer then real else if E 1.type = real and E 2.type = real then real else type error F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 168 / 309

Type conversion - Semantic Rules (2) Production Semantic Rule E E 1 op E 2 E.txt := if E 1.type = integer and E 2.type = integer then E 1.txt E 2.txt else if E 1.type = integer and E 2.type = real then int2real(e 1.txt) E 2.txt else if E 1.type = real and E 2.type = integer then E 1.txt int2real(e 2.txt) else if E 1.type = real and E 2.type = real then E 1.txt E 2.txt else type error E num E.type := integer E.txt := val E num.num E.type := real E.txt := val F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 169 / 309

Overloading Symbols with different meaning (dependent on application context) mathematics: + operator (integer, reals, complex numbers) ADA: ()-Expression for array access AND function calls Overloading is resolved, when the meaning is clear (operator identification) Overloading can often be resolved by the types of operands. F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 170 / 309

Overloading - Possible Types Example (ADA): function "*"(i,j: integer) return complex; function "*"(i,j: complex) return complex; integer integer integer Possible types for * are: integer integer complex complex complex complex Assumption: 2,3,5 are integer 3*5 is either integer or complex. So 2 * (3 * 5) must be of type integer. (3*5)*z is of type complex, if z is of type complex. F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 171 / 309

Handling Overloading Instead of a type the set of all possible types must be stored in an attribute. Attribute types! E E E id E E 1 ( E 2 ) Example: 3*5 E.types = E.types E.types = {lookup( id.entry)} E.types = {t s E 2.types s t E 1.types} E: {i,c} E: {i} *: E: {i} { i i i, i i c, 3: {i} 5: {i} c c c } F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 172 / 309

Uniqueness of Types Expressions may only have one type (otherwise type error) Production Semantic Rule E E E.types := E.types E.unique := if E.types = {t} then t else type error E id E.types := {lookup( id.entry)} E.types := {s s E 2.types (s s ) E 1.types} t := E.unique E E 1 ( E 2 ) S := {s s E 2.types s t E 1.types} E 2.unique := if S = {s} then s else type error E 1.unique := if S = {s} then s t else type error F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 173 / 309

Polymorphic Functions Polymorphic Function = Function, whose argument may have a arbitrary type Polymorphic refers to functions and operators Examples: Built-in operators for array-access, pointer manipulation Reason for polymorphism: Code can be used for various data structures Example: finding the length of lists (e.g. ML) fun length(lptr) = if null(lptr) then 0 else length(tl(lptr)) length([sun,mon,tue]), length([1,2,3,4]) not possible in PASCAL! F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 174 / 309

Type Variables Variables, that allow us to talk about unknown types Note: Type Variables as greek letters α, β,.... Type Inference = problem of deciding the type of an expression taking into account the application (of the expression). Example type link cell; procedure mlist ( lptr : link; procedure p) begin while lptr <> nil do begin p(lptr); lptr := lptr.next end end; mlist: link procedure void p: link void F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 175 / 309

Example - Type Inference Program: function deref(p); begin return p end; Derivation: 1 Type of p is β (Assumption) 2 From p follows that p must be a pointer. Therefore it holds: β = pointer(α). 3 Furthermore, we know that the type of p must be α. 4 Therefore, it follows: α : pointer(α) α is the type of the function deref. F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 176 / 309

Language for Polymorphism Type expression of the form α.e(α) denotes a polymorph type. Language definition: P D ; E D D ; D id : Q Q type variable. Q T T T T T T (T ) unary constructor ( T ) basic type type variable E E ( E ) E, E id F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 177 / 309

Example deref : α.pointer(α) α ; q : pointer(pointer(integer)) ; deref(deref(q)) apply : α 0 deref 0 : pointer(α 0) α 0 apply : α i deref i : pointer(α i) α i q : pointer(pointer(integer)) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 178 / 309

Differences in Type Handling In distinction from former type handling (without polymorphism): 1 Arguments of polymorph functions in an expression may have different types. 2 The concept of type equivalence is different. pointer(α) = pointer(pointer(integer))??? 3 Calculated Types must be used in further consequence. The effect of the unification of two expressions must be preserved. α is assigned the type t. If α is referenced elsewhere, t must be used! Terms: Substitution, Instances, Unification F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 179 / 309

Substitution, Instances Substitution = function that maps type variables to type expressions. S : type variables type expressions Example: α pointer(integer) Application of a substitution: function subst(t : type expression) : type expression begin if t is a basic type then return t else if t is a variable then return S(t) else if t is t 1 t 2 then return subst(t 1 ) subst(t 2 ) end S(t)... Instance. We write s < t s is instance of t. F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 180 / 309

Examples Instances: pointer(integer) < pointer(α) pointer(real) < pointer(α) integer integer < α α pointer(α) < β α < β No Instances: integer real substitution on Basic Types not possible integer real α α inconsistent replacement of α integer α α α all occurrences must be replaced F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 181 / 309

Unification 2 types t 1, t 2 are unifiable if there exists a substitution S, so that S(t 1 ) = S(t 2 ) holds. In praxis, we are interested in the Most General Unifier (MGU). 1 S(t 1 ) = S(t 2 ) 2 Every substitution S with S (t 1 ) = S (t 2 ) must be an instance of S. F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 182 / 309

Checking Polymorphic Functions 2 Functions: 1 f resh(t) replaces all Variables in the type expression t with new variables. A pointer to the node representing the new expression is returned. 2 unify(m, n) unifies the two expressions m and n. As a side effect the substitution is performed. Translation Schema: Production Semantic Rule p := mkleaf(newtypevar); E E 1 ( E 2 ) unify(e 1.type, mknode(, E 2.type, p)); E.type := p E E 1, E 2 E.type := mknode(, E 1.type, E 2.type) E id E.type := f resh( id.type) F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 183 / 309

Example Type Checking apply : α 0 deref 0 : pointer(α 0 ) α 0 apply : α i deref i : pointer(α i ) α i q : pointer(pointer(integer)) β Summary (Bottom-up type detection): Expression : Type Substitution q : pointer(pointer(integer)) deref i : pointer(α i ) α i deref i (q) : pointer(integer) α i = pointer(integer) deref 0 : pointer(α 0 ) α 0 deref 0 (deref i (q)) : integer α 0 = integer F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 184 / 309

Unification Algorithm Input. A graph and a pair of nodes m and n, which should be unified. Output. True, if the nodes can be unified, False otherwise. Method. A node is represented by the record [constructor, lef t, right, set], where set is the Set of equivalent nodes. A node of set is chosen as representative of this set. In the beginning, each set contains only the node itself. find(n) returns the representative node union(m,n) merges the equivalence sets. The new representative node is a node which does not correspond with a variable. If there exists no such node, a former representative node is chosen as the new one. F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 185 / 309

Algorithm - Pseudocode function unify(m, n : node) : boolean begin s := find(m); t := find(n); if s = t then return true else if s and t are nodes that represent the same basic type then return true else if s is an op-node with children s 1, s 2 and t is an op-node with children t 1, t 2 then begin union(s, t); return unify(s 1, t 1) and unify(s 2, t 2) end else if s or t represents a variable then begin union(s, t) return true end else return false end F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 186 / 309

Example Unification Type expression: ((α 1 α 2 ) list(α 3 )) list(α 2 ) ((α 3 α 4 ) list(α 3 )) α 5.: 1 : 9 : 2 list : 8 : 10 α 5 : 14 : 3 list : 6 : 11 list : 13 α 1 : 4 α 2 : 5 α 3 : 7 α 4 : 12 Question: unif y(1, 9) =? F. Wotawa (IST @ TU Graz) Compiler Construction Summer term 2016 187 / 309