More On Syntax Directed Translation 1
Types of Attributes We have productions of the form: A X 1 X 2 X 3... X n with semantic rules of the form: b:= f(c 1, c 2, c 3,..., c n ) where b and the c s are attributes of the grammar symbols b is called a synthesized attribute if: b is an attribute of A (i.e. the LHS), and the c s are all attributes of the X s (symbols on the RHS) 35
Synthesized Attributes b:= f(c 1, c 2, c 3 ) A b X 1 X 2 X 3 c 1 c 2 c 3 Information to compute b is passed up the parse tree 36
Inherited Attributes A X 1 X 2 X 3... X n b:= f(c 1, c 2, c 3,..., c n ) b is an inherited attribute if b is an attribute of one of the X s (i.e. RHS) and the c s are attributes of A and/or one or more of the other X s which means they are beside or above where b is needed 37
Inherited Attributes b:= f(c 1, c 2, c 3 ) A c 2 X 1 X 2 X 3 b c 1 c 3 Information to compute b must be passed down the parse tree Note that X 2 is only associated with X 1 and X 3 via their appearance on the RHS of a production for A -so information must flow through A in the tree 38
S-Attributed Definitions A syntax directed definition is S-attributed if it uses only synthesized attributes Which implies that the definition can be annotated by evaluating semantic rules for nodes bottom-up which fits naturally with bottom-up parsers and can also be evaluated with top down parsers since both types perform depth-first traversals But the presence of inherited attributes poses a problem. 39
Inherited Attributes Example A definition for C-style declarations: Production Semantic Rules D T L L.type := T.type (inherited attribute passed down to L) T int T.type := INTEGER T float T.type := FLOAT L L 1, id L 1.type := L.type (inherited from LHS) settype(id.entry, L.type) L id settype(id.entry, L.type) 40
Inherited Attributes in C-Declarations D T T.type = INT L L.type = INT settype(y.entry,int) int L L.type = INT settype(x.entry,int), id (y) id (x) 41
Computing Synthesized and Inherited Attributes Synthesized Attributes Natural fit for bottom up parsers Yacc: $$ = $1 + $3; etc. Can use parsing function return value in recursive descent Inherited Attributes Natural for top-down parsers Recursive descent: parameters in parsing function call Quite troublesome for bottom-up parsers Especially getting at the attribute of the left-hand-side Tricks such as reaching under the stack Sometimes can be done with embedded actions But usually dealt with later during traversal of the AST 42
L-attributed Definitions a syntax directed definition is L-attributed if every inherited attribute of some symbol X i on the RHS of some production A X 1..X i-1 X i.. X n depends only on the attributes of A (the LHS), and on the attributes of the symbols X 1..X i-1 to the left of symbol X i in the production This implies that the definition can be annotated by evaluating semantic rules for nodes in a depth first, left-to-right traversal of the tree 43
Evaluation of L-Attributed Definitions An L-attributed definition can be evaluated in a depth-first tree traversal as follows: procedure dfvisit(n:node); begin for each child m of n, from left to right do begin evaluate inherited attributes of m; dfvisit(m) end; evaluate synthesized attributes of n end which means it can be evaluated on the fly driven by a parser Note that every S-attributed definition is also L- attributed 44
Translation Schemes A CFG with attributes associated with grammar symbols and semantic actions enclosed in braces { } embedded within the RHS s to indicate the time during the processing when the action should be executed to evaluate its attribute 45
Translation Schemes An action can only be executed when all the attributes it refers to have already been evaluated For a synthesized attribute: the action can simply be placed at the end of the production For an inherited attribute: an inherited attribute for a symbol on the RHS must be evaluated in an action before that symbol (and then passed down), and an action cannot refer to a synthesized attribute to the right of the action 46
Pascal-Style Declarations The C declaration example was L-Attributed..but the following obvious grammar is not: D var id IDLIST : T IDLIST, id IDLIST ε T integer real (Symbol T is to the right of the IDLIST in the first production, so the type cannot be passed down the tree during a left-to-right traversal) 47
Rewriting Grammars to Facilitate Translation The Pascal declaration grammar can be re-written to permit the use of only synthesized attributes: D var id LIS { settype(id.entry, LIS.type) } LIS, id LIS 1 { settype(id.entry, LIS 1.type) LIS.type := LIS 1.type } : T { LIS.type := T.type } T integer { T.type := INTEGER } real { T.type := FLOAT } 48
Attribute Evaluation with Revised Grammar D settype(a.entry,integer) var ID LIS settype(b.entry,integer) LIS.type := INTEGER a, ID LIS settype(c.entry,integer) LIS.type := INTEGER b, ID LIS LIS.type := INTEGER c : T T.type := INTEGER integer 49
Symbol Tables & Abstract Syntax Trees 50
Symbol Tables Symbol tables can take many forms We have seen the simple linked list form as used in the type checking example For a language like Java there are typically several tables such as: A linked list of CLASS descriptors For each CLASS, a linked list of METHOD and FIELD descriptors For each method, a list of formal parameters, local variables and a pointer to an AST structure for the actual code 51
Tree Structures as Intermediate Code Representation Production compilers usually build some form of intermediate code during parsing, postponing target code generation until later after optimizations can be performed The intermediate language is generally quite independent from the nitty-gritty details of any particular ISA of real machines A common form of intermediate code is a tree structure This cleanly separates the front end (source language analysis) from the back end (code generation for a specific target machine) 52
Parse Trees and Abstract Syntax Trees Parse trees could be an intermediate form, but are cumbersome E E + E E E op E E * E c ( E ) - E ID a ( E ) String: a * ( -b ) + c - E Only 3 operations, But 7 interior nodes! b 53
Abstract Syntax Trees Abstract Syntax Trees eliminate the clutter and capture meaning in a minimal form E E op E String: ( E ) - E ID a * ( -b ) + c a + * c - b 3 interior nodes 54
Translation Scheme for Abstract Syntax Trees NODE *MakeNODE(op, left, right) NODE *MakeUNARY(op, arg) NODE *MakeLEAF(id) E E 1 op E 2 { E.n=MakeNODE(op, E 1.n, E 2.n) } ( E 1 ) { E.n = E 1.n } - E 1 { E.n = MakeUNARY( -, E 1.n) } ID { E.n = MakeLEAF(id) } AST is a very convenient representation for machineindependent optimizations Generate target code later via post-order tree traversal a a * ( -b ) + c + * c - b 55
Abstract Syntax Trees The AST is simply a tree structure that is a simplification of a parse tree that contains only the significant information without the syntactic sugar. The symbol table is actually just another form of AST, which captures the relevant information about classes, attributes, and methods, while ignoring the syntactic details of how these are declared. It usually also contains other information fields not necessarily filled in at parse time for use in semantic analysis or code generation AST s are actually very straightforward to construct in a parser. 56
Expression Tree (AST) (Examples are from an AST-based type checker) Binary operator case: struct AST {int opr; struct AST *left; struct AST *right; }; An operator, and pointers to AST nodes for the left and right operands. 57
Creating an AST node in a parser exp : exp '+' exp { $$=make_binary ('+', $1, $3); } with similar semantic actions creating variants of the AST node for different syntactic structures We pass the pointer up the tree via $$, so when the final reduction to the start symbol occurs, it gets a pointer to the root of the whole tree 58
AST Node types There are different operators (binary, unary ) and different language structures (if, while ), so we actually need many different kinds of nodes. We can create a unique data structure for each node type This would lead to a large number of unique data structures to keep track of So we use a more general purpose structure with variants If coding in an object-oriented language, we can use subclasses of a generic node class In C, we have to use a "general purpose" structure, perhaps using a C union to deal with special cases 59
General Purpose AST Node #define MAXCHILDREN 2 typedef enum { binary_exp, int_const, bool_const, var_exp, assign_ast, prog_ast} NodeKind; typedef struct AST { struct AST * child[maxchildren]; struct AST * next; int lineno; NodeKind nodekind; union { int op; int val; char * name; struct VAR *vartable; } attr; Type type; /* for type checking of exps */ } AST; Example: If we have an AST pointer a and know that the "nodekind" is binary_exp, we can refer to the leftoperand of the operator as: a->child[0] the right-operand as a->child[1] and the operator will be a->attr.op etc. 60
Node Creation #define NEW(type) (type *) calloc(1,sizeof(type)) AST *make_binary(int opt, AST *left, AST *right) { AST * e = NEW(AST); e->nodekind = binary_exp; e->attr.op = opt; e->child[0] = left; e->child[1] = right; e->lineno = lineno; /* from global variable in scanner */ return e; }...and one of these for each AST subtype 61
Type Checking by AST Traversal Assume a two-pass type checker: 1. First it parses the input and builds an AST and a symbol table as it goes 2. Then we traverse the AST recursively computing the types of expressions and checking all semantic rules 62
Expression Productions e : e '=' e { $$ = make_binary ('=', $1, $3); } e '+' e { $$ = make_binary ('+', $1, $3); } e tor e { $$ = make_binary (tor, $1, $3);} '(' e ')' { $$ = $2; } NUM { $$ = make_int($1); } ttrue { $$ = make_bool(1); } tfalse { $$ = make_bool(0); } ID { $$ = make_var($1); } ; 64
Tree Traversal to Decorate Tree and Check Type Rules Type type_check(ast *a) { Type t1,t2; int op; switch (a->nodekind) { case binary_exp: t1 = type_check(a->child[0]); t2 = type_check(a->child[1]); op = a->attr.op; if (op=='+') { /* arithmetic */ ASSERT(t1==typeINT,"Left operand not Int",a->lineno); ASSERT(t2==typeINT,"Right operand not Int",a->lineno); a->type = typeint; } else if (op== xxx) ( for each of the other operators)... break; case... (for all other AST variants) } /* end of switch */ return a->type; } 65
AST Typecheck Demo 1: program test; 2: var a: integer; 3: b: integer; 4: c: boolean; 5: begin 6: a := c; 7: c := a=b; 8: a := true; 9: x := y+3; 10: b := a or c; 11: end. % typecheck < test.p line 6: Assignment type mismatch line 8: Assignment type mismatch line 9: Undeclared identifier line 9: LHS of assignment not declared line 9: Undeclared identifier line 9: Left operand not Int line 9: Assignment type mismatch line 10: Left operand not boolean line 10: Assignment type mismatch % 67