Syntax Analysis Part IV Chapter 4: Bison Slides adapted from : Robert van Engelen, Florida State University
Yacc and Bison Yacc (Yet Another Compiler Compiler) Generates LALR(1) parsers Bison Improved version of Yacc
Creating an LALR(1) Parser with Yacc/Bison yacc specification yacc.y Bison compiler yacc.tab.c yacc.tab.c C compiler a.out input stream a.out output stream
Bison Specification A Bison specification consists of three parts: Bison declarations, and C declarations within %{ %} %% translation rules %% user-defined auxiliary procedures The translation rules are CFG productions with actions: production 1 { semantic action 1 } production 2 { semantic action 2 } production n { semantic action n }
Writing a Grammar in Yacc Productions in Yacc are of the form Nonterminal: tokens/nonterminals { action } tokens/nonterminals { action } ; Tokens that are single characters can be used directly within productions, e.g. + Named tokens must be declared first in the declaration part using %token TokenName
Synthesized Attributes Semantic actions may refer to attributes of terminals and nonterminals in a production: X : Y 1 Y 2 Y 3 Y n { action } $$ refers to the value of the attribute of X $i refers to the value of the attribute of Y i Example : factor : ( expr ) { $$=$2; } factor.val=x ( expr.val=x ) $$=$2
Example I : Incomplete Code %{ #include <ctype.h> %} %token DIGIT %% line : expr \n { printf( %d\n, $1); } ; expr : expr + term { $$ = $1 + $3; } term { $$ = $1; } ; term : term * factor { $$ = $1 * $3; } factor { $$ = $1; } ; factor : ( expr ) { $$ = $2; } DIGIT { $$ = $1; } ; %% int yylex() { int c = getchar(); if (isdigit(c)) { yylval = c- 0 ; return DIGIT; } return c; } Attribute of token (stored in yylval) Attribute of expr (parent) Attribute of expr (child) Attribute of term (child) Example of a very crude lexical analyzer invoked by the parser
Example I : Complete Code %{ #include <ctype.h> #include <stdio.h> int yylex(); void yyerror(char *s); %} %token DIGIT %% line : expr '\n' { printf("%d\n", $1); } ; expr : expr '+' term { $$ = $1 + $3; } term { $$ = $1; } ; term : term '*' factor { $$ = $1 * $3; } factor { $$ = $1; } ; factor : '(' expr ')' { $$ = $2; } DIGIT { $$ = $1; } ;
Example I : Complete Code %% int yylex() { int c = getchar(); if (isdigit(c)) { yylval = c-'0'; return DIGIT; } return c; } int main() { if (yyparse()!= 0 ) fprintf(stderr, "Abnormal exit\n"); return 0; } void yyerror(char *s) { fprintf(stderr, "Error: %s\n", s); } bison calculator.y gcc calculator.tab.c./a.out 5+8*(9+2)*3 269 ^D
Tokens Two types of tokens: literal and symbolic Literal tokens represented using the corresponding C character constant (ASCII code) Symbolic tokens represented as numbers higher than any possible character s code, so they will not conflict with any literal tokens Numbers can be forced by declaration %token NUMBER 621
Tokens When using symbolic tokens, run Bison with d option to create a C header file with definitions If Bison is combined with Flex, add #include xxx.tab.h in lexer file, where xxx.y is the source file
Symbol Values Both tokens and nonterminals have an associated semantic value For token, the value is stored in the C variable yylval For nonterminals, the value is stored in the $$, $1, pseudo-variables The associated semantic value is of type YYSTYPE, declared as int by default YYSTYPE can be redefined using the C instruction #define
Dealing with Ambiguous Grammars A description of parsing action conflicts can be obtained using the -v option, which produces an additional file y.output Reduce/reduce conflicts solved by using the conflicting production listed first Shift/reduce conflicts resolved in favor of shift
Dealing with Ambiguous Grammars We can also deal with ambiguous grammars by defining operator precedence levels and left/right associativity of the operators Example: %left + - %left * / %right UMINUS
Dealing with Ambiguous Grammars Productions are also given precedence and associativity, inherited from their rightmost nonterminal Example: item [E E + E ] and lookahead + resolved with reduction (+ left associative) Example: item [E E + E ] and lookahead * resolved with shift (* higher precedence)
Dealing with Ambiguous Grammars Can force different precedence by attaching to a production %prec terminal Symbol terminal can be a placeholder: this terminal is never used by lexical analyzer, but indicates a precedence (see later examples)
Combining Bison with Flex Yacc or Bison specification yacc.y Lex or Flex specification lex.l and token definitions y.tab.h lex.yy.c y.tab.c Yacc or Bison compiler Lex or Flex compiler C compiler y.tab.c y.tab.h lex.yy.c a.out input stream a.out output stream
Example II : Bison %{ #include <ctype.h> #include <stdio.h> #define YYSTYPE double %} %token NUMBER %left + - %left * / %right UMINUS double type for nonterminal attributes and yylval terminal placeholder
Example II : Bison %% lines : lines expr \n { printf( %g\n, $2); } lines \n /* empty */ ; expr : expr + expr { $$ = $1 + $3; } expr - expr { $$ = $1 - $3; } expr * expr { $$ = $1 * $3; } expr / expr { $$ = $1 / $3; } ( expr ) { $$ = $2; } - expr %prec UMINUS { $$ = -$2; } NUMBER ;
Example II : Bison %% int main() { if (yyparse()!= 0) fprintf(stderr, Abnormal exit\n ); return 0; } void yyerror(char *s) { fprintf(stderr, Error: %s\n, s); } Run the parser Invoked by parser to report parse errors
Example II : Flex %option noyywrap %{ #include example.tab.h extern YYSTYPE yylval; %} number [0-9]+\.? [0-9]*\.[0-9]+ Generated by Bison, contains #define NUMBER xxx Defined in example.tab.c
Example II : Flex %% [ \t] { /* skip blanks */ } {number} { sscanf(yytext, %lf, &yylval); return NUMBER; } \n. { return yytext[0]; } bison d example.y flex example.l gcc example.tab.c lex.yy.c./a.out 2.4*(6+13) 45.6 ^D
Error Recovery in Yacc %{ %} %% lines : lines expr \n { printf( %g\n, $2; } lines \n /* empty */ error \n { yyerror( reenter last line: ); yyerrok; } ; Error production: set error mode and skip input until newline Reset parser to normal mode
Symbol Values If several types are needed for grammar symbols, a union type must be defined The %union directive identifies all of the possible C types that a symbol value can have The field declarations are copied verbatim into a C union declaration of the type YYSTYPE In the absence of a %union declaration, Bison defines YYSTYPE to be int
Symbol Values Associate the types declared in %union with specific grammar symbols using the %type declaration for nonterminal the %token declaration for tokens
Example %union { double dval; char *sval; }... %token <dval> REAL %token <sval> STRING %type <dval> expr // token // token // nonterminal
Exercise III Write an interpreter for combination of arithmetic expressions and Boolean expressions Use %union directive to deal with different data types
Exercise III : Flex %option noyywrap %{ %} #include <stdlib.h> #include "boolean.tab.h" fract [0-9]+\.? [0-9]*\.[0-9]+
Exercise III : Flex %% [ \t] { /* skip blanks */ } "&&" { return AND; } " " { return OR; } "!" { return NOT; } "<" { return LT; } "<=" { return LE; } ">" { return GT; } ">=" { return GE; } "==" { return EQ; } {fract} { yylval.val = atof(yytext); return FRACT; } \n. { return yytext[0]; }
Exercise III : Bison %{ #include <ctype.h> #include <stdio.h> int yylex(); void yyerror(char *s); %} %union { double val; int bool; // 1 == true, 0 == false }
Exercise III : Bison %token <val> FRACT %type <val> expr %type <bool> comp %type <bool> bexpr %left %left %right %left %left %right '+' '-' '*' '/' UMINUS OR AND NOT %nonassoc EQ LT GT LE GE
Exercise III : Bison %% lines : lines bexpr '\n' { printf("%d\n", $2); } lines '\n' /* empty */ ; bexpr : bexpr OR bexpr { if ($1 == 1 $3 == 1) $$ = 1; else $$ = 0; } bexpr AND bexpr { if ($1 == 1 && $3 == 1) $$ = 1; else $$ = 0; } NOT bexpr { if ($2 == 1 ) $$ = 0; else $$ = 1; } '(' bexpr ')' { $$ = $2; } comp { $$ = $1; } ;
Exercise III : Bison comp : expr LT expr { if ($1 < $3) $$ = 1; else $$ = 0; } expr LE expr { if ($1 <= $3) $$ = 1; else $$ = 0; } expr GE expr { if ($1 >= $3) $$ = 1; else $$ = 0; } expr GT expr { if ($1 > $3) $$ = 1; else $$ = 0; } expr EQ expr { if ($1 == $3) $$ = 1; else $$ = 0; } // '(' comp ')' { $$ = $2; } ;
Exercise III : Bison expr : expr '+' expr { $$ = $1 + $3; } expr '-' expr { $$ = $1 - $3; } expr '*' expr { $$ = $1 * $3; } expr '/' expr { $$ = $1 / $3; } '(' expr ')' { $$ = $2; } '-' expr %prec UMINUS { $$ = -$2; } FRACT ;
Exercise III : Bison %% int main() { if (yyparse()!= 0) fprintf(stderr, "Abnormal exit\n"); return 0; } void yyerror(char *s) { fprintf(stderr, "Error: %s\n", s); }
Summary of Commands To run Bison + Flex : bison d o parser.c parser.y flex o scanner.c scanner.l gcc o parser parser.c scanner.c./parser < input > output