Lexical Analysis - Flex CMPSC 470 Lecture 03 Topics: Flex / JFlex A. Lex/Flex Lex and flex (fast lex) are programs that 1. Take, as input, a program containing regular expressions (describing patterns of lexemes of tokens) and their actions. 2. Transform the input regular expressions into a transition diagram (using table driven implementation of DFA), and 3. Generate a C program (lex.yy.c) that simulates this transition diagram. How to use lex or flex 1. 2. 3. Format of lex input file
B. jflex jflex is a lexical analyzer generator for java, written in java. Format requirement of jflex is little bit different to that of lex or flex. jflex 1.6.1 is available at http://jflex.de Steps to use jflex 1. Go to http://jflex.de 2. Go to download 3. Download jflex-1.6.1.zip (or jflex-1.6.1.tar.gz) 4. Unzip it in your working directory 5. Find jflex-1.6.1/lib/jflex-1.6.1.jar 6. Compile your input file as follows: 7. It generates java source containing lexical analyzer from regular expression and its actions described in your input file. C. jflex input format and output format Format of jflex input file User code before class definition Such as import Options and macros % User code inside of class % Declarations Transition rules, such that pattern1 action 1 pattern2 action 2 Format of output java file
D. Example: TestLexer.flex import static java.lang.math.*; %class TestLexer %byaccj %int % % Object obj; public TestLexer(java.io.Reader r, Object obj) this(r); this.obj = obj; // "public TestLexer(java.io.Reader in)" will be generated as default digit = [0-9] number = digit+ real = number(.number)?(e[+-]?number)? letter = [A-Za-z] newline = \n "+" String lexeme = yytext(); return TestMain.PLUS; "if" String lexeme = yytext(); return TestMain.IF; number String lexeme = yytext(); return TestMain.NUM; real String lexeme = yytext(); return TestMain.REAL; letter(letter digit)+ String lexeme = yytext(); return TestMain.WORD; newline String lexeme = yytext(); System.out.print("((newline)), \n"); /* skip */ [ \t\r]+ String lexeme = yytext(); System.out.print("((whitespace "+lexeme+")), "); /* skip */ /* error fallback */ [^] System.err.println("Error: unexpected character '"+yytext()+"'"); return -1;
Notes %class TestLexer %byaccj %int yytext() Regular definitions are defined in Declaration part. In pattern part, you should use the token pattern using regular expression.
Generate lexer code Lexer code will be generated using the following command: It generate TestLexer.java that contains the following codes: /* The following code was generated by JFlex 1.6.1 */ import static java.lang.math.*; /** * This class is a scanner generated by * <a href="http://www.jflex.de/">jflex</a> 1.6.1 * from the specification file <tt>testlexer.flex</tt> */ class TestLexer...... /* user code: */ Object obj; public TestLexer(java.io.Reader r, Object obj) this(r); this.obj = obj; // "public TestLexer(java.io.Reader in)" will be generated as default if (zzinput == YYEOF && zzstartread == zzcurrentpos) zzateof = true; zzdoeof(); return 0; else switch (zzaction < 0? zzaction : ZZ_ACTION[zzAction]) case 1: System.err.println("Error: unexpected character '"+yytext()+"'"); return -1; case 9: case 2: String lexeme = yytext(); return TestMain.NUM; case 10: case 3: String lexeme = yytext(); System.out.print("((newline)), \n"); /* skip */ case 11: case 4: String lexeme = yytext(); System.out.print("((whitespace "+lexeme+")), "); /* skip */...
TestMain.java class TestMain public static final int NUM = 10; public static final int REAL = 11; public static final int WORD = 12; public static final int PLUS = 13; public static final int IF = 14; public static void main(string[] args) throws Exception java.io.reader r = new java.io.stringreader ("main\n" +"123\n" +"1.23 123e1\n" ); //if(args.length < 0) // return; //java.io.reader r = new java.io.filereader(args[0]); Object o = new Object(); TestLexer lex = new TestLexer(r, o); while(true) int token = lex.yylex(); if(token == 0) // end of input if(token == -1) // error switch(token) case NUM : System.out.print("<NUM, " + lex.yytext() + ">"); case REAL: System.out.print("<REAL, " + lex.yytext() + ">"); case WORD: System.out.print("<WORD, " + lex.yytext() + ">");
E. Extension of regular expression in lex/flex In expression, c represents a character, s represents a string, r represents a regular expression. Expression Matches Example C The one non-operator character c \c Character c literally s String s literally. Any character but new line ^ Beginning of a line $ End of line [s] [^s] Any one of the character in string s Any one character not in string s r* Kleene closure. Zero or more matching r r+ Positive closure. One or more string matching r r? Zero or one r rm,n Between mm and nn occurrence of r r 1r 2 An r 1 followed by r 2 r 1 r 2 An r 1 or r 2 (r) Same as r r 1/r 2 r 1 when followed by r 2 Example)