Lesson 10 CDT301 Compiler Theory, Spring 2011 Teacher: Linus Källberg
Outline Flex Bison Abstract syntax trees 2
FLEX 3
Flex Tool for automatic generation of scanners Open-source version of Lex Takes regular expressions as input Outputs a C (or C++) file for the scanner 4
Flex mylexer.l mylexer.c mylexer.obj Regexps Flex intyylex() C compiler 0110100011 0101010 5
The input fileto Flex Definitions %% Rules %% User code 6
The definitions section Macro definitions: Specify a letter: letter [A-Za-z] Specify a delimiter: delimiter [,:;.] Specify a digit: digit [0-9] Specify an identifier: id letter(letter digit)* 7
The definitions section User code: %{ #include <stdio.h> int a_nice_global_variable = 0; int my_favourite_function(void) {return 42;} %} 8
The rulessection Rule = regexp+ C code Longest matching pattern is used If two equally long patterns match, the first one in the file is used Examples: = >=? <(= >)? { return RELOP; } {id} { return ID; } 9
The regexplanguageof Flex? Previous regexp is optional {}Macro expansion (defined in the definitions section). Matches any character that is not end of line $ Matches the end of a line ^ Matches the beginning of a line [] Matches any enclosed character 10
The [] syntax Similar to but more powerful Example: digit [0123456789] is the same as digit 0 1 2 3 4 5 6 7 8 9 Special characters inside the brackets: and ^ digit [0-9] letter [A-Za-z] non_digit[^0-9] 11
The user code section Only C code valid here Will be copied unchanged to the generated C file 12
The generatedscanner By default, a function called yylex() is defined Works similar to your GetNextToken() from lab 1 The name can be changed with options Some globalsare defined as well (can be changed into local variables with options): yyin The file to read from yytext The matched lexeme (char*) yyleng The length of yytext yylineno Line number of the match 13
The yywrap() function Called upon end-of-file Shouldbe suppliedby the user Suppressed with %option noyywrap or --noyywrap 14
Scanner statesin Flex Affects what tokens should be recognized Example from the language ALF: { fref 32 DEADC0DE } <- Identifier { hex_val DEADC0DE } <- Hex constant 15
Scanner statesin Flex Declare state: %x READ_HEX Usethe stateto make rulesconditional: hex_val { BEGIN(READ_HEX); return HEX_VAL_KW; } [a-za-z_][a-za-z0-9_]* { returnid; } <READ_HEX>[0-9a-fA-F]+{ BEGIN(INITIAL); returnnum; } 16
Online resources http://flex.sourceforge.net/manual/index.html 17
BISON 18
Bison Tool for automatic generation of parsers Open-source alternative to Yacc Takes an SDT scheme as input Outputs C (or C++) source code for an LALR parser Commonly used together with Flex 19
Bison myparser.c myparser.y intparse() myparser.obj SDT scheme Bison myparser.h C compiler 0110100011 0101010 Token definitions 20
The input file to Bison Definitions %% SDT scheme %% User code 21
Definitions section Define tokens Define operator precedence Define operator associativity Definethe typesof grammarsymbol attributes WriteC codebetween%{ and %} Issue certain commands to Bison 22
Token definition Normal case: %token IDENTIFIER %token WHILE Token, precedence, associativity, and type: %left <Operator> RELOP %left <Operator> MINUSOP PLUSOP %right <Operator> NOTOP Enables use of ambiguous grammars! 23
Definingtypes Just enter the type inside <> before the list of tokens: %left <Operator> RELOP %left <Operator> MULOP %right <Operator> NOTOP UNOP %token <String> ID STRING Or the same for non-terminals: %type <Node> stmnt expr actuals exprs 24
The variable yylval Used by the lexical analyzer to store token attributes Default type is int May be given another type(s) using %union: %union { int Operator; char *String; NODE_TYPE Node; } The type (member name) is then used like this: %token <String> ID STRING 25
Code provided by the user yyerror(char* msg) Function called on syntax errors yylex() Function called to get the next token 26
Options to Bison Given on the command line or in the grammar file --defines or %defines: Output a C header file with definitions useful to a scanner Tokens (#defines) and the type on yylval %error-verbose: More detailed error messages --name-prefix or %name-prefix: Change the default yy prefix on all names %define api.pure: Do not use globals --verboseor %verbose: Writedetailedinformation to extra output file 27
Translationschemesection decl : BASIC_TYPE idents ';' ; idents : idents ',' ident ident ; ident : ID ; 28
Semanticactions Written in C Executed when the production is used in a reduction $$, $1, $2, etc. refer to the attributes of the grammar symbols Can be used as regular C variables $$ refer to the attribute of the head, $1 to the attribute of the first symbol in the body, etc. E : E '+' T { $$ = $1 + $3; } ; 29
Default actions: Using ambiguous grammars in Bison Reduce/reduce: choose first rule in file Shift/reduce: always shift With explicit precedence and associativity: Shift/reduce: Compareprec/assof rulewith that of lookahead token 30
The %expectdeclaration To suppress shift/reduce warnings: %expect n wheren is the exactnr of conflicts 31
Contextualprecedence Same token mighthavedifferent precedence depending on context: expr expr expr expr* expr expr id Stack expr Input * expr 32
Contextualprecedence Define dummy token: %left'-' %left'*' %left UMINUS Use the %prec modifier: expr expr%precuminus 33
Examples of parser configurations Stack Input Action if(cond) stmt else shift Stack Input Action expr+ expr * shift Stack Input Action expr* expr + red. expr expr* expr Stack Input Action expr* expr * red. expr expr* expr 34
Online resources http://www.gnu.org/software/bison/manual/html_node/index.html 35
ABSTRACT SYNTAX TREES 36
Abstract syntax trees AST or just syntax tree E + E a + E E E a * 5 * b 5 b 37
Syntax treesvs. parsetrees Parse trees: Interior nodes are nonterminals, leaves are terminals Rarely constructed as an explicit data structure Represents the concrete syntax Syntax trees: Interior nodes are operators, leaves are operands Commonly constructed as an explicit data structure Represents the abstract syntax 38
Whysyntax trees? Simplifies subsequent analyses Independent on the parsing strategy Makes it easier to add new analysis passes without having to modify the parser More compact representation than parse trees 39
Syntax treeexample if(a < 1) b = 2 + 3; else{ c = d * 4; e(f, 5); } if null < = null = calle null a 1 b + c * f 5 null 2 3 d 4 40
Exercise(1) Draw an abstract syntax treefor the statement while(i < 100) { x = 2 * x; i = i + 1; } 41
Constructinga syntax treein Bison expr : expr'+' expr { $$ = createopnode($1, '+',$3);} expr'*' expr { $$ = createopnode($1, '*',$3);} ID { $$ = createidnode($1.name); } ; 42
Constructinga syntax treein Bison stmt : RETURN expr';' { $$ = mreturn($2, $1); } ; stmts: stmtsstmt { $$ = connectstmts($1, $2); } { $$ = NULL; } ; 43
Conclusion Flex generates C source code for a scanner given a set of regular expressions Bison generates C source code for a bottomup parser given a syntax-directed translation scheme Building syntax trees simplifies subsequent analyses of the program Syntax trees can be built in semantic actions 44
Nexttime Syntax-directed definitions and translation schemes Semantic analysis and type analysis 45