(F)lex & Bison/Yacc Language Tools for C/C++ CS 550 Programming Languages Alexander Gutierrez
Lex and Flex Overview Lex/Flex is a scanner generator for C/C++ It reads pairs of regular expressions and code to create a lexical analyzer (scanner) written in C/C++ Lex was the original generator written under proprietary license Flex was a separate project to recreate lex as an open source program Lex was originally the standard program, but Flex is now the preferred version They both are practically the same and Lex is harder to get, so we will refer to Flex 2
Yacc and Bison Overview Yacc/Bison is a parser generator for C/C++ As a compiler-compiler (parser generator), it is used to create a parser It reads a LALR grammar and creates a parser This parser is can be used as a component of a compiler by feeding in tokens generated by a lexical analyzer In this case, we will use Flex to generate the tokens for Bison Similar to Lex/Flex history, Bison was created as an open source version of Yacc We will refer to Bison for this presentation 3
Which to use? We will use Flex & Bison http://flex.sourceforge.net/ http://www.gnu.org/software/bison/ These are freely available (BSD,GNU), Lex & Yacc are not (AT&T proprietary) Lex & Yacc formerly were standard on machines, but are now basically superseded by Flex & Bison Since they re basically the same, we only really care about Flex & Bison Flex & Bison are on tux 4
Lex/Flex on tux.cs.drexel.edu Only Flex is available Command name: flex Why am I able to type lex and it seems to work? On tux, it is symlinked lex -> flex 5
Yacc/Bison on tux.cs.drexel.edu Only Bison is available Command name: bison Typing yacc seems to work? It is just a symlink, too? Mostly, yes On tux, invoking yacc calls a script that runs bison in yacc-compatibility mode Why is there a yacc-compatibility mode for bison if they are basically the same? To account for some POSIX differences and minor quirks that we don t really care about Just use bison 6
The Bigger Picture We can use Flex and Bison to (relatively) easily implement our own programming language To do this, we need to make the instruction manuals for Flex and Bison For Flex, we need to determine what tokens our language consists of and how each token can be described using a regular expression For Bison, we need to create a (LALR) grammar that takes these tokens and turns it into machine code Both Flex and Bison will produce a piece of C/C++ code which we can compile using an appropriate C/C++ compiler 7
Balanced Parentheses Example The code for this example can be found at: https://www.cs.drexel.edu/~jjohnson/2012-13/spring/cs550/programs/grammars/ Files paren.l paren.y This example looks at the language of balanced parentheses First, we will look at the regular expression file we give to Flex Next, we will look at the grammar we give to Bison Finally, we will compile and test our compiler 8
paren.l %{ #include "paren.tab.h" %} %% \( { return LEFTPAREN; } \) { return RIGHTPAREN; }. \n { return 0; } %% 9
paren.y %{ #include <string.h> #include <stdio.h> %} %token LEFTPAREN RIGHTPAREN %% S0: S1 S0 { printf("s0 => S1 S0\n"); } S1 { printf("s0 => S1\n"); } ; S1: LEFTPAREN S2 RIGHTPAREN { printf("s1 => (S2)\n"); } LEFTPAREN RIGHTPAREN { printf("s1 => ()\n"); } ; S2: S1 S2 { printf("s2 => S1 S2\n"); } S1 { printf("s2 => S1\n"); } %% 10
Compiling on tux All we need are these two files, paren.l and paren.y, in our directory: $ ls paren.l paren.y We can compile using the following sequence of commands (NOTE: ORDER IS VERY IMPORTANT) $ bison -d paren.y $ flex paren.l $ gcc paren.tab.c lex.yy.c -ly -lfl Further explanation follows... 11
Running Bison The reason we use bison first is to produce information about tokens that it accepts, which we can feed to flex to create our lexical analyzer $ bison -d paren.y The -d option for bison creates header files which enable us to feed this information to flex Remember this line in paren.l : #include "paren.tab.h" paren.tab.h is a header file that bison creates with this option Our directory now looks like: $ ls paren.l paren.tab.c paren.tab.h paren.y 12
Running Flex Now we can simply run flex to produce our lexical analyzer: $ flex paren.l This produces another piece of code, lex.yy.c : $ ls lex.yy.c paren.l paren.tab.c paren.tab.h paren.y Next we can compile the whole thing and try it out. 13
Compiling the compiler Now, we use the last command mentioned earlier: $ gcc paren.tab.c lex.yy.c -ly -lfl Here, we are using gcc to compile the code using the bison (yacc) and flex libraries. The order of the options are actually important in order to make the resulting compiler work. As usual with the GNU C/C++ compilers, the result is an executable named a.out by default 14
Using Our New Language We can test to make sure it works by running the executable and giving it input. $./a.out (()) S1 => () S2 => S1 S1 => (S2) S0 => S1 I entered in a string that is in the language, (()), and it executes the associated code. In this case, the code that is instructed to run by the language were the printf statements we saw earlier in the grammar. In other words, the function of this interpreter is to display its own parsing via its grammar rules. 15
Using Our New Language (cont.) Another example input: $./a.out (()()( S1 => () S1 => () syntax error In this case, I gave it a malformed program. The input was not in the recognized language due to imbalanced parentheses and therefore gave a syntax error. The grammar that we gave it is being enforced. 16
Summary Use flex and bison on tux (already installed) Design your own language by creating tokenization instructions via regular expressions for Flex and a grammar for Bison Implement the language by giving Flex and Bison these instructions to generate a lexical analyzer and parser respectively Compile with a C/C++ compiler to realize your very own programming language 17
Reference John R. Levine, flex & bison, O'Reilly & Associates. This book can be found through Drexel s library website for free. flex & bison is basically an updated version of the old lex & yacc book because they are practically the same utilities. 18