Master s Degree Course in Computer Engineering Formal Languages FORMAL LANGUAGES AND COMPILERS PRACTICAL CLASS: Flex & Bison Eliana Bove eliana.bove@poliba.it
Install On Linux: install with the package manager of your distribution On Windows: Install flex.exe [DL from http://gnuwin32.sourceforge.net/packages/flex.htm] Install bison.exe [DL from http://gnuwin32.sourceforge.net/packages/bison.htm] Warning 1: On Windows it is better to change the installation path from the default (C:\Program Files (x86)\gnuwin32) to C:\GnuWin32, as Bison has issues with spaces in directory names. Warning 2: a C compiler is required For example Dev-C++ in C:\Dev-Cpp Include in the PATH environment variable the bin subdirectories of the compiler, Flex and Bison (;C:\Dev-Cpp\bin;C:\GnuWin32\bin)
Lexical analysis: Flex Flex source program lex.l Flex compiler lex.yy.c lex.yy.c C compiler a.out Input stream a.out Sequence of tokens
Lexical analysis: input file A LEX/Flex input file is composed of three different sections, separated by the %% symbol Section 1 %{ #include constant definition scanner macro % basic definitions It may be empty Between characters %{ and %, it contains library #include, customized constant and/or macro definitions for the user C program; this part of text will be literally copied into the generated C program; Basic definitions describe regular expressions used in the second section. Section 2 %% Token definitions and actions Contains the definition of patterns with associated actions to execute, as pairs pattern action Action must start on the same line where the pattern regular expression ends, separated by spaces or tabulations. Section 3 %% Support procedures C user code It may be empty; if it is, the %% separator is omitted. It contains the support routines the programmer intends to use in actions described in the second sections.
Lexical analysis: exercise 1 Exercise 1 : Create a scanner to recognize the following tokens: Lexemes Token name Attribute value any whitespace - - if if - then then - else else - any id id pointer any number number pointer < relop LT <= relop LE = relop EQ <> relop NE > relop GT >= relop GE
Lexical analysis: exercise 1 Exercise 1: Flex source program ex1.l %{ /* definitions of manifest constants*/ #define YYSTYPE int YYSTYPE yylval; #define LT 1 #define LE 2 #define EQ 3 #define NE 4 #define GT 5 #define GE 6 #define IF 7 #define THEN 8 #define ELSE 9 #define ID 10 #define NUMBER 11 #define REL0P 12 %
Lexical analysis: exercise 1 Exercise 1: Flex source program ex1.l /* regular definitions */ delim [ \t\n] ws {delim+ letter [A-Za-z] digit [0-9] id {letter({letter {digit)* number {digit+(\.{digit+)?(e[+-]?{digit+)? %% {ws {/* no action and no return */ if {return(if); then {return(then) ; else {return(else) ; {id {yylval = (int) installid(); return(id); {number {yylval = (int) installnum() ; return(number) ; "<" {yylval = LT; return(relop) ; "<=" {yylval = LE; return(relop) ;
Lexical analysis: exercise 1 Esercizio 1 : flex source program ex1.l "=" {yylval = EQ ; return(relop) ; "<>" {yylval = NE; return(relop); ">" {yylval = GT; return(relop); ">=" {yylval = GE; return(relop); %% int installid() { int installnum() { /* function to install the lexeme, whose first character is pointed to by yytext, and whose length is yyleng, into the symbol table and return a pointer thereto */ printf ("Installing %s of length %d as id\n", yytext, yyleng); return 1; /* similar to installid, but puts numerical constants into a separate table */ printf ("Installing %s of length %d as num\n", yytext, yyleng); return 1;
Lexical analysis: exercise 1 1. Open shell 2. Go to directory where the file.l Flex input file is stored 3. Run: flex ex1.l ( produces lex.yy.c) gcc lex.yy.c lfl ( generates scanner a.exe (a.out)) The library libfl.a is needed to compile. Its path depends on the Flex install directory (gcc lex.yy.c L C:\GnuWin32\lib lfl) a.exe < t1.txt (run on t1.txt input file) (in Linux./a.out)
Lexical analysis: exercise 2 Exercise 2 Write a Flex program which, given a C program in input, produces in output an equivalent one without comments. Exercise 2: Flex specification ex2.l %{ % /* define comment state */ %x comm %% "/*" BEGIN(comm); <comm>[^*\n]* /* eat anything that's not a '*' */ <comm>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */ <comm>\n /* possible new lines */ <comm>"*"+"/" BEGIN(INITIAL); %%
Lexical analysis: exercise 2 1. Open shell 2. Go to directory where the Flex input file is located. 3. Run: flex ex2.l ( produces lex.yy.c) gcc lex.yy.c lfl ( generates the scanner a.exe (a.out)) a.exe < t2.c (run on t2.c input file) (in Linux./a.out < t2.txt)
Syntax analysis: Bison YACC specification translate.y Bison compiler y.tab.c y.tab.c C compiler a.out input a.out output
Syntax analysis: input file A Bison input file is composed of three different sections, separated by the %% symbol Prologue %{ % #include constant definition basic declarations Optional Between %{ and % symbols it contains the library #include directives, definitions of any entity used in rules in the second section or routines in the third section. The contents are copied at the beginning of the parser. It contains Bison declarations, i.e. names of terminal and nonterminal symbols of the grammar, and rules for precedence/associativity between symbols. Precedence/associativity rules are expressed with the %left, %right or %nonassoc operators. Grammar symbols can be denoted in three ways: named tokens; every token name (by definition, in upper case for terminals and lower case for nonterminals) must be defined with a %token declaration literal token referring to a single character ( + ) string token referring to a sequence of characters ( <- )
Syntax analysis: input file Rules %% Translation rules Contains grammar rules described in a BNF-derived form. Here the whole grammar is described and actions to be executed are defined and associated to the various grammar productions. <head> : <body> 1 {<semantic action> 1 <body> 2 {<semantic action> 2 <body> n {<semantic action> n a semantic action is a sequence of C statements; actions can appear in any place within the production body and must be executed in place; actions can exchange values with the parser through pseudo-variables introduced by the $ symbol: pseudo-variable $$ refers to the left hand side of the production, while the pseudo-variable $n refers to the token in place n on the right hand side of the production if unspecified, the default action is {$$ = $1; Epilogue %% Support C routines Optional Contains any useful code, including that of functions of declared in the prologue. All contents are copied to the end of the parser file.
Syntax analysis: exercise 3 Exercise 3: Bison specification ex3.y Build a calculator starting from the following grammar: E E + T T T T * F F F (E) digit digit is a single digit between 0 and 9 Exercise 3: Bison specification ex3.y %{ % #include <stdio.h> #include <ctype.h> %token DIGIT %%
Syntax analysis: exercise 3 Exercise 3: Bison specification ex3.y input: /* empty string */ input line /* with this left-recursive rule, we can parse consecutive lines */ ; line: '\n' expr '\n' { printf ("%d\n", $1); ; expr : expr '+' term { $$ = $1 + $3; term ; term : term '*' factor { $$ = $1 * $3; factor ; factor : '(' expr ')' { $$ = $2; DIGIT ; %% int main (void) { return yyparse(); int yyerror (const char *s) { printf ("%s\n", s);
Syntax analysis: exercise 3 Exercise 3: Bison specification ex3.y yylex() { int c; c = getchar(); if(isdigit(c)) { yylval = c - 0'; return DIGIT; return c; 1. Open shell and go to the directory where the Bison specification file is located. 2. Run: bison ex3.y ( produces ex3.tab.c) gcc ex3.tab.c ( generates the parser a.exe (a.out)) a.exe (in Linux./a.out)
Syntax analysis: exercise 4 Exercise 4: Bison specification ex4.y Create a calculator supporting more complicated expressions (sum, multiplication, subtraction, division, exponentiation). Watch out for operator precedence! Exercise 4: Bison specification ex4.y %{ % #define YYSTYPE double #include <math.h> #include <stdio.h> #include <ctype.h> /* BISON Declarations */ %token NUM %left '-' '+' %left '*' '/' %left NEG /* negation--unary minus */ %right '^' /* exponentiation */
Syntax analysis: exercise 4 Exercise 4: Bison specification ex4.y %% input: /* empty string */ input line ; line: ; '\n' exp '\n' { printf ("\t%.10g\n", $1); exp: NUM { $$ = $1; exp '+' exp { $$ = $1 + $3; exp '-' exp { $$ = $1 - $3; exp '*' exp { $$ = $1 * $3; exp '/' exp { $$ = $1 / $3; '-' exp %prec NEG { $$ = -$2; /* %prec tells the parser to use the precedence of the NEG token, not of the literal - token declared before*/ exp '^' exp { $$ = pow ($1, $3); '(' exp ')' { $$ = $2; ;
Syntax analysis: exercise 4 Exercise 4: Bison specification ex4.y %% int yylex (void){ int c; /* Skip white space. */ while((c = getchar()) == ' ' c == '\t'){ continue; /* Process numbers. */ if (c == '.' isdigit(c)){ ungetc(c, stdin); scanf("%lf", &yylval); return NUM; /* Return end-of-input. */ if(c == EOF){ return 0; /* Return a single char. */ return c;
Syntax analysis: exercise 4 Exercise 4: Bison specification ex4.y int yyerror(const char *s) { printf ("%s\n", s); int main (void) { return yyparse (); 1. Open shell and go to the directory where the Bison specification file is located. 2. Run: bison ex4.y ( produces ex4.tab.c) gcc ex4.tab.c -lm ( generates the parser a.exe (a.out); -lm links the C math library libm) a.exe (in Linux./a.out)
Flex + Bison bas.y Bison compiler y.tab.c source y.tab.h C compiler bas.exe bas.l Lex compiler lex.yy.c compiled output
Lexical + syntax analysis: exercise 5 Exercise 5: Solve exercise 4 generating the lexical analyzer with Flex. (Combined Flex + Bison use) Exercise 5: Bison specification ex5.y %{ % #define YYSTYPE double #include <math.h> #include <stdio.h> /* BISON Declarations */ %token NUM %token PLUS MINUS TIMES DIVIDE POWER %token LEFT RIGHT %token END %left MINUS PLUS %left TIMES DIVIDE %left NEG %right POWER
Lexical + syntax analysis: exercise 5 Exercise 5: Bison specification ex5.y %% input: /* empty string */ input line ; line: END exp END { printf ("\t%.10g\n", $1); ; exp: NUM { $$ = $1; exp PLUS exp { $$ = $1 + $3; exp MINUS exp { $$ = $1 - $3; exp TIMES exp { $$ = $1 * $3; exp DIVIDE exp { $$ = $1 / $3; MINUS exp %prec NEG { $$ = -$2; exp POWER exp { $$ = pow ($1, $3); LEFT exp RIGHT { $$ = $2; ; %% int yyerror(char *s) { printf("%s\n", s); int main (void){ return yyparse ();
Lexical + syntax analysis: exercise 5 Exercise 5: Flex specification ex5.l %{ #define YYSTYPE double #include "parser.tab.h" % /* regular definitions */ delim [ \t] ws {delim+ digit [0-9] number {digit+(\.{digit+)?(e[+-]?{digit+)? %% {ws {/* no action and no return */ {number {yylval = atof(yytext); return NUM ; "+" {return PLUS; "-" {return MINUS; "*" {return TIMES; "/" {return DIVIDE; "^" {return POWER; "(" {return LEFT; ")" {return RIGHT; "\n" {return END; %%
Lexical + syntax analysis: exercise 5 1. Open shell and go to the directory there the Flex and Bison specification files are located. 2. Run: bison d ex5.y ( produces ex5.tab.c and ex5.tab.h) Notice: the Bison specification file is compiled with the d in order to generate a header file (ex5.tab.h) containing macro definitions for token names defined in the grammar. flex ex5.l ( produces lex.yy.c) gcc ex5.tab.c lex.yy.c lfl -lm ( generates the parser a.exe (a.out); we must link also the libfl Flex library, which defines the yywrap function) a.exe (in Linux./a.out) flag