Writing Evaluators MIF08. Laure Gonnord

Writing Evaluators MIF08 Laure Gonnord Laure.Gonnord@univ-lyon1.fr

Evaluators, what for? Outline 1 Evaluators, what for? 2 Implementation Laure Gonnord (Lyon1/FST) Writing Evaluators 2 / 21

Evaluators, what for? Analysis Phases source code lexical analysis sequence of lexems (tokens) syntactic analysis (Parsing) abstract syntax tree (AST ) semantic analysis abstract syntax (+ symbol table) Laure Gonnord (Lyon1/FST) Writing Evaluators 3 / 21

Evaluators, what for? Until now We have parsed, and evaluate in semantic actions. But we want: more structure. an easier way to perform actions (not in the.g4 file). Laure Gonnord (Lyon1/FST) Writing Evaluators 4 / 21

Evaluators, what for? Notion of Abstract Syntax Tree = int y + 12 * 4 x int AST: memory representation of a program; Node: a language construct; Sub-nodes: parameters of the construct; Leaves: usually constants or variables. Laure Gonnord (Lyon1/FST) Writing Evaluators 5 / 21

Evaluators, what for? Separation of concerns The semantics of the program could be defined in the semantic actions (of the grammar). Usually though: Syntax analyzer only produces the AST; The rest of the compiler directly works with this AST. Why? Manipulating a tree (AST) is easy (recursive style); Separate language syntax from language semantics; During later compiler phases, we can assume that the AST is syntactically correct simplifies the rest of the compilation. Laure Gonnord (Lyon1/FST) Writing Evaluators 6 / 21

Evaluators, what for? Running example : Numerical expressions This is an abstract syntax (no more parenthesis,... ): Let us construct an AST to: e ::= c constant x variable e + e add e e mult... Evaluate this expression (by tree traversal) Later: generate code for these expressions (by tree traversal) Laure Gonnord (Lyon1/FST) Writing Evaluators 7 / 21

Implementation Outline 1 Evaluators, what for? 2 Implementation Old-school way Evaluators with visitors Laure Gonnord (Lyon1/FST) Writing Evaluators 8 / 21

Implementation Old-school way Outline 1 Evaluators, what for? 2 Implementation Old-school way Evaluators with visitors Laure Gonnord (Lyon1/FST) Writing Evaluators 9 / 21

Implementation Old-school way Explicit construction of the AST Declare a type for the abstract syntax. Construct instances of these types during parsing (trees). Evaluate with tree traversal. Laure Gonnord (Lyon1/FST) Writing Evaluators 10 / 21

Implementation Old-school way Example in Java 1/3 AST definition in Java: one class per language construct. public class APlus extends AExpr { AExpr e1, e2 ; public APlus ( AExpr e1, AExpr e2 ) { this. e1=e1 ; this. e2=e2 ; } } public class AMinus extends AExpr {... Laure Gonnord (Lyon1/FST) Writing Evaluators 11 / 21

Implementation Old-school way Example in Java 2/3 The parser builds an AST instance using AST classes defined previously. ArithExprASTParser.g4 parser grammar A r i t h E x p r A S T P a r s e r ; options { tokenvocab = ArithExprASTLexer; } prog returns [ AExpr e ] : expr EOF { $e= $expr. e; } ; // We create an AExpr instead of computing a value expr returns [ AExpr e ] : LPAR x= expr RPAR { $e=$x. e; } INT { $e = new AInt ( $INT. int ) ; } e1 = expr PLUS e2 = expr { $e = new APlus ( $e1. e,$e2. e ) ; } e1 = expr MINUS e2 = expr { $e = new AMinus ( $e1. e,$e2. e ) ; } ; Laure Gonnord (Lyon1/FST) Writing Evaluators 12 / 21

Implementation Old-school way Example in Java 3/3 Evaluation is an eval function per class: AExpr.java p u b l i c a b s t r a c t class AExpr { a b s t r a c t i n t eval ( ) ; / / need to provide semantics } APlus.java public class APlus extends AExpr { AExpr e1, e2 ; public APlus ( AExpr e1, AExpr e2 ) { this. e1=e1 ; this. e2=e2 ; } / / semantics below i n t eval ( ) { return ( e1. eval ( ) + e2. eval ( ) ) ; } } Laure Gonnord (Lyon1/FST) Writing Evaluators 13 / 21

Implementation Evaluators with visitors Outline 1 Evaluators, what for? 2 Implementation Old-school way Evaluators with visitors Laure Gonnord (Lyon1/FST) Writing Evaluators 14 / 21

Implementation Evaluators with visitors Principle - OO programming The visitor design pattern is a way of separating an algorithm from an object structure on which it operates.[...] In essence, the visitor allows one to add new virtual functions to a family of classes without modifying the classes themselves; instead, one creates a visitor class that implements all of the appropriate specializations of the virtual function. https://en.wikipedia.org/wiki/visitor_pattern Laure Gonnord (Lyon1/FST) Writing Evaluators 15 / 21

Implementation Evaluators with visitors Application Designing evaluators / tree traversal in ANTLR-Python The ANTLR compiler generates a Visitor class. We override this class to traverse the parsed instance. Laure Gonnord (Lyon1/FST) Writing Evaluators 16 / 21

Implementation Evaluators with visitors Example with ANTLR/Python 1/3 AritParser.g4 expr: expr mdop expr # m u l t i p l i c a t i o n E x p r expr pmop expr # a d d i t i v e E x p r atom # atomexpr ; atom : INT # int ID # id '( ' expr ') ' # parens compilation with -Dlanguage=Python3 -visitor Laure Gonnord (Lyon1/FST) Writing Evaluators 17 / 21

Implementation Evaluators with visitors Example with ANTLR/Python 2/3 -generated file class A r i t V i s i t o r ( P a r s e T r e e V i s i t o r ) :... # V i s i t a parse t r e e produced by A r i t P a r s e r # m u l t i p l i c a t i o n E x p r. def v i s i t M u l t i p l i c a t i o n E x p r ( s e l f, c t x ) : r e t u r n s e l f. v i s i t C h i l d r e n ( c t x ).. # V i s i t a parse t r e e produced by A r i t P a r s e r #atomexpr. def visitatomexpr ( s e l f, c t x ) : r e t u r n s e l f. v i s i t C h i l d r e n ( c t x ) Laure Gonnord (Lyon1/FST) Writing Evaluators 18 / 21

Implementation Evaluators with visitors Example with ANTLR/Python 3/3 Visitor class overriding to write the evaluator: MyAritVisitor.py class M y A r i t V i s i t o r ( A r i t V i s i t o r ) : # V i s i t a parse t r e e produced by A r i t P a r s e r # i n t. def v i s i t I n t ( s e l f, c t x ) : value = i n t ( c t x. gettext ( ) ) ; r e t u r n value ; def v i s i t M u l t i p l i c a t i o n E x p r ( s e l f, c t x ) : l e f t v a l = s e l f. v i s i t ( c t x. expr ( 0 ) ) r i g h t v a l = s e l f. v i s i t ( c t x. expr ( 1 ) ) myop = s e l f. v i s i t ( c t x. mdop ( ) ) i f ( myop == ) : r e t u r n l e f t v a l r i g h t v a l else : r e t u r n l e f t v a l / r i g h t v a l Laure Gonnord (Lyon1/FST) Writing Evaluators 19 / 21

Implementation Evaluators with visitors Nice Picture (Lab#3) Arit.g4 Tree.py antlr -visitor inherits from AritParser.py AritVisitor.py inherits from MyAritVisitor.py Laure Gonnord (Lyon1/FST) Writing Evaluators 20 / 21

Implementation Evaluators with visitors From grammars to evaluators - summary The meaning of each operation/grammar rule is now given by the implementation of the associated function in the visitor. The visitor performs a tree traversal on the structure of the parse tree. Laure Gonnord (Lyon1/FST) Writing Evaluators 21 / 21

Types, Typing MIF08 Laure Gonnord Laure.Gonnord@univ-lyon1.fr

Typing + Laure Gonnord (Lyon1/FST) Typing (simple) programs 2 / 18

Typing / Laure Gonnord (Lyon1/FST) Typing (simple) programs 3 / 18

Typing If you write: "5" + 37 what do you want to obtain a compilation error? (OCaml) an exec error? (Python) the int 42? (Visual Basic, PHP) the string "537"? (Java) anything else? and what about 37 / "5"? Typing: an analysis that gives a type to each subexpression, and reject incoherent programs. Laure Gonnord (Lyon1/FST) Typing (simple) programs 4 / 18

When Dynamic typing (during exec): Lisp, PHP, Python Static typing (at compile time): C, Java, OCaml Here: the second one. Laure Gonnord (Lyon1/FST) Typing (simple) programs 5 / 18

Typing objectives well typed programs do not go wrong Should be decidable. It should reject programs like (1 2) in OCaml, or 1+"toto" in C before an actual arror in the evaluation of the expression: this is safety. The type system should be expressive enough and not reject too many programs. (expressivity) Laure Gonnord (Lyon1/FST) Typing (simple) programs 6 / 18

Several solutions All sub-expressions are anotated by a type fun (x : int) let (y : int) = (+ :)(((x : int), (1 : int)) : int int) in easy to verify, but tedious for the programmer Annotate only variable declarations (Pascal, C, Java,... ) fun (x : int) let (y : int) = +(x, 1) in y Only annotate function parameters fun (x : int) let y = +(x, 1) in y Do nothing : complete inference : Ocaml, Haskell,... Laure Gonnord (Lyon1/FST) Typing (simple) programs 7 / 18

Simple Type Checking for mini-while, theory Outline 1 Simple Type Checking for mini-while, theory 2 A bit of implementation (for expr) Laure Gonnord (Lyon1/FST) Typing (simple) programs 8 / 18

Simple Type Checking for mini-while, theory Mini-While Syntax Expressions: Mini-while: e ::= c constant x variable e + e addition e e multiplication... S(Smt) ::= x := expr assign skip do nothing S 1 ; S 2 sequence if b then S 1 else S 2 test while b do S done loop Laure Gonnord (Lyon1/FST) Typing (simple) programs 9 / 18

Simple Type Checking for mini-while, theory Typing judgement We will define how to compute typing judgements denoted by: Γ e : τ and means in environment Γ, expression e has type τ Γ associates a type Γ(x) to all free variables x in e (in this course, computed from the variable declarations). Here types are basic types: Int String Bool Laure Gonnord (Lyon1/FST) Typing (simple) programs 10 / 18

Simple Type Checking for mini-while, theory Typing rules for expr (or bool,... ) Γ x : Γ(x) Γ n : int Γ e 1 : int Γ e 2 : int Γ e 1 + e 2 : int An expression is well typed if there is a proof tree for it with regular applications of the rules, and whose leaves are axioms. Laure Gonnord (Lyon1/FST) Typing (simple) programs 11 / 18

Simple Type Checking for mini-while, theory Hybrid expressions What if we have 1.2 + 42? reject? compute a float! This is type coercion. Laure Gonnord (Lyon1/FST) Typing (simple) programs 12 / 18

Simple Type Checking for mini-while, theory More complex expressions What if we have types pointer of bool or array of int? We might want to check equivalence (for addition... ). This is called structural equivalence (see Dragon Book, type equivalence ). This is solved by a basic graph traversal. Laure Gonnord (Lyon1/FST) Typing (simple) programs 13 / 18

Simple Type Checking for mini-while, theory Typing rules for statements - See TD2 Idea: the type is void otherwise typing error Γ e : t Γ(x) : t t {int, bool} Γ x := e : void Γ b : bool Γ S : void Γ while b do S done : void A program is well typed if there is a proof tree with regular applications of the rules whose leaves are axioms. Laure Gonnord (Lyon1/FST) Typing (simple) programs 14 / 18

A bit of implementation (for expr) Outline 1 Simple Type Checking for mini-while, theory 2 A bit of implementation (for expr) Laure Gonnord (Lyon1/FST) Typing (simple) programs 15 / 18

A bit of implementation (for expr) Principle of type checking Gamma is constructed with lexing information or parsing (variable declaration with types). Rules are semantic actions. The semantic actions are responsible for the evaluation order, as well as typing errors. Laure Gonnord (Lyon1/FST) Typing (simple) programs 16 / 18

A bit of implementation (for expr) Type Checking (here): visitor (Lab3) MyMuTypingVisitor.py def v i s i t A d d i t i v e E x p r ( s e l f, c t x ) : l v a l t y p e = s e l f. v i s i t ( c t x. expr ( 0 ) ) r v a l t y p e = s e l f. v i s i t ( c t x. expr ( 1 ) ) op = s e l f. v i s i t ( c t x. oplus ( ) ) i f l v a l t y p e == r v a l t y p e : r e t u r n l v a l t y p e e l i f { l v a l t y p e, r v a l t y p e } == { BaseType. Integer, BaseType. F l o a t } : r e t u r n BaseType. F l o a t e l i f op == u + and any ( v t == BaseType. S t r i n g f o r v t i n ( r v a l t y p e, l v a l t y p e ) ) : r e t u r n BaseType. S t r i n g else : r a i s e SyntaxError ( " I n v a l i d type f o r a d d i t i v e operand " ) Laure Gonnord (Lyon1/FST) Typing (simple) programs 17 / 18

A bit of implementation (for expr) Typing is more than type checking. Sometimes we want this information during code generation AST decorated with types (but not in this course) And we want informative errors: Type error at line 42 is not sufficient! Laure Gonnord (Lyon1/FST) Typing (simple) programs 18 / 18

Code Generation MIF08 Laure Gonnord Laure.Gonnord@univ-lyon1.fr oct 2017

Big picture source code lexical+syntactic analysis + typing decorated AST code production (numerous phases) assembly language Laure Gonnord (Lyon1/FST) Code Generation 2017 2 / 21

Rules of the Game here For this code generation: Still no functions and no non-basic types. (mini-while) Syntax-directed: one grammar rule a set of instructions. Code redundancy. No register reuse: everything will be stored on the stack. The Target Machine : LEIA (course #1) Laure Gonnord (Lyon1/FST) Code Generation 2017 3 / 21

3-address syntax-directed Code Generation Outline 1 3-address syntax-directed Code Generation Rules 2 Memory allocation 3 Toward a more efficient Code Generation Laure Gonnord (Lyon1/FST) Code Generation 2017 4 / 21

3-address syntax-directed Code Generation A first example (1/4) How do we translate: var x;y:int; x=4; y=12+x; Variable decl s visitor gives a place to each variable: x place0, y place1. Compute 4, store somewhere, then copy in x s place. Compute 12 + x : 12 in place1, copy the value of x in place2, then add, store in place3, then copy into y s place. the code generator will use a place generator called newtmp() Laure Gonnord (Lyon1/FST) Code Generation 2017 5 / 21

3-address syntax-directed Code Generation A first example: 3@code (2/4) Compute 4 and store in x (temp0) :.let temp2 4 copy temp0 temp2 Laure Gonnord (Lyon1/FST) Code Generation 2017 6 / 21

3-address syntax-directed Code Generation Objective 3-address LEIA Code Generation for the Mini-While language: All variables are int/bool. All variables are global. No functions with syntax-directed translation. Implementation in Lab. This is called three-adress code generation Laure Gonnord (Lyon1/FST) Code Generation 2017 7 / 21

3-address syntax-directed Code Generation Rules Outline 1 3-address syntax-directed Code Generation Rules 2 Memory allocation 3 Toward a more efficient Code Generation Laure Gonnord (Lyon1/FST) Code Generation 2017 8 / 21

3-address syntax-directed Code Generation Rules Code generation utility functions We will use: A new (fresh) temporary can be created with a newtemp() function. A new fresh label can be created with a newlabel() function. The generated instructions are closed to the LEIA ones (except for snif) Laure Gonnord (Lyon1/FST) Code Generation 2017 9 / 21

3-address syntax-directed Code Generation Rules Abstract Syntax Expressions: and statements: e ::= c constant x variable e + e addition e or e boolean or e < e less than... S(Smt) ::= x := expr assign skip do nothing S 1 ; S 2 sequence if b then S 1 else S 2 test while b do S done loop Laure Gonnord (Lyon1/FST) Code Generation 2017 10 / 21

3-address syntax-directed Code Generation Rules Code generation for expressions, example e ::= c (cte expr) dr <-newtemp() code.add(instructionlet(dr, c)) return dr this rule gives a way to generate code for any constant. Laure Gonnord (Lyon1/FST) Code Generation 2017 11 / 21

3-address syntax-directed Code Generation Rules Code generation for a boolean expression, example e ::= e 1 < e 2 dr <-newtemp() t1 <- GenCodeExpr(e1) t2 <- GenCodeExpr(e2) dr <- newtemp() endrel <- newlabel() code.add(instructionlet(dr, 0)) #if t1>=t2 jump to endrel code.add(instructioncondjump(endrel, t1, ">=", t2) code.add(instructionlet(dr, 1)) code.addlabel(endrel) return dr integer value 0 or 1. Laure Gonnord (Lyon1/FST) Code Generation 2017 12 / 21

3-address syntax-directed Code Generation Rules Code generation for commands, example if b then S1 else S2 lelse,lendif <-newlabels() t1 <- GenCodeExpr(b) #if the condition is false, jump to else code.add(instructioncondjump(lelse, t1, "=", 0)) GenCodeSmt(S1) #then code.add(instructionjump(lendif)) code.addlabel(lelse) GenCodeSmt(S2) #else code.addlabel(lendif) Laure Gonnord (Lyon1/FST) Code Generation 2017 13 / 21

Memory allocation Outline 1 3-address syntax-directed Code Generation 2 Memory allocation 3 Toward a more efficient Code Generation Laure Gonnord (Lyon1/FST) Code Generation 2017 14 / 21

Memory allocation A first example: from 3@ code to valid LC-3 (3/5) 3@code is not valid LEIA code! 3 kinds of allocation : All in registers (but?) place i register All in memory (here!) place i memory Something in the middle (later!) Laure Gonnord (Lyon1/FST) Code Generation 2017 15 / 21

Memory allocation A stack, why? Store constants, strings,... Provide an easy way to communicate arguments values (see later) Give place to store intermediate values (here) Laure Gonnord (Lyon1/FST) Code Generation 2017 16 / 21

Memory allocation LEIA stack emulation - from the archi course r 6 is initialised to the stack address. addresses will be computed from this base. The stack grows in the dir. of decreasing addresses!. R0 R1 ins1 ins2 x0000 R7 R8 x30fe x3000 x3001 x30ff stackend: 000000 111111x3100 000000 111111 xffff Nice picture by N. Louvet Laure Gonnord (Lyon1/FST) Code Generation 2017 17 / 21

Memory allocation A first example: prelude/postlude 4/5 Here store r 1 on the stack! [init r6].let r1 4 sub r0 r6 1 ; first dec from r6 (and store some info!) wmem r1 [r0] ; now r1 can be recycled Laure Gonnord (Lyon1/FST) Code Generation 2017 18 / 21

Memory allocation A first example: prelude/postlude 5/5 The rest of the code generation:.set r6 stack [...] jump 0.align16 stackend:.reserve 42 stack: This is valid LEIA code that can be assembled and executed Laure Gonnord (Lyon1/FST) Code Generation 2017 19 / 21

Toward a more efficient Code Generation Outline 1 3-address syntax-directed Code Generation 2 Memory allocation 3 Toward a more efficient Code Generation Laure Gonnord (Lyon1/FST) Code Generation 2017 20 / 21

Toward a more efficient Code Generation Drawbacks of the former translation Drawbacks: redundancies (constants recomputations,... ) memory intensive loads and stores. we need a more efficient data structure to reason on: the control flow graph (CFG). (see next course) Laure Gonnord (Lyon1/FST) Code Generation 2017 21 / 21