Lecture 6: Lexical Analysis and jflex Dr Kieran T. Herley Department of Computer Science University College Cork 2017-2018 KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 1 / 1
Summary Lexical analysis. Automated production of lexcical analyzer. Jflex. Simple examples. KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 2 / 1
What is Lexical Analysis? Lexical analysis first stage of compilation Decomposes source into tokens: words (identifiers, reserved words), numbers(123), symbols(+, <=) (Typically skips whitespace and comments) Example Source if (x <= 123) { /* a comment */ sum = sum + x; } Token Stream if ( x <= 123 ) { sum = sum + x ; } KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 3 / 1
Developing a Lexical Analyser Write C/Java program by hand Tiresome and tricky Use lexical analyizer generator: generates analyzer automatically from descriptions (regular expressions) of tokens in the programming language Examples: lex/flex for C jflex for Java KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 4 / 1
(jf)lex Input description of token structure (regular expressions) info. on how to process different tokens Output implemetation of NFA-based function that recognizes tokens (as specfied by RE rules) processes them (as specified by actions) KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 5 / 1
jflex Program Format /* User code */ /* Options and declarations */ /* Lexical Rules */ Lexical Rules Rule = Pattern + Action Pattern = Regular Expression Action = Snippet of Java code (Actions triggered whenever pattern matched) User Code e.g. import statements, included top of generated Java; often empty Options etc. Marcos (named REs); code to be spliced into generated Java class KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 6 / 1
Pattern Searching Example /* * Search through source flagging any * occurrances of pattern (a b)*abb * found. */ %standalone (a b)*abb { System.out.println("*** found match\n");} \n { /* do nothing */}. { /* do nothing */} KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 7 / 1
The Main Pattern (a b)*abb { System.out.println("*** found match\n");} Pattern (a b) abb Action {System.out.println...} Operation For each match detected (i.e. string consisting of as and bs that ends abb), the action (i.e. println) is performed KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 8 / 1
Pattern Searching Example cont d %standalone (a b)*abb {System...} \n {/* do nothing */}. {/* do nothing */} By default, unmatched fragments of input are echoed verbatim to output final two rules surpress this (by gobbling up every char not matched by main rule) RE. (dot) matches any char except newline Note: With %standalone option, generated code includes a main method KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 9 / 1
Using jflex Illustration $ jflex search.flex $ javac Yylex.java $ java Yylex searchtest.txt Notes jfex generates Java file YYlex.java Executing Yylex.class reads searchtest.txt and prints *** match... once for each occurrence of a string matching RE (a b) abb, i.e. flags each ocurence of pattern. KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 10 / 1
A More Polished Version %class Search %standalone %line %column (a b)*abb {System.out.printf( "*** found match [%s] at l%d, c%d\n ", yytext(), yyline, yycolumn); } \n { /* do nothing */}. { /* do nothing */} KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 11 / 1
Notes %class option creates Search.java insead of Yylex.java For each match found, prints text mattched and its position within the input file yylex() the text fragments that matches the pattern yyline the line number (need %line option to enable this) KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 12 / 1
Getting jflex Can download jflex package and documentation from www.jflex.de KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 13 / 1
jflex RE Syntax pattern meaning a character a a character a (even special chars.) abc a followed by b followed by c (no explicit concat. symbol)) a b a or b a zero or more rep. of a a+ one or more rep. of a a? optional a (a) a itself [abc] any (one) of a or b or c [ abc] any char. except a, b or c. any char. except newline KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 14 / 1
Example Pattern ("+" "-")?[0-9]+("."[0-9]+)? Meaning (Optional sign) One or more digits (Optional decimal point one or more digits ) KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 15 / 1
Notes jflex s special characters e.g. ( ) - + ^ [ ] * must be quoted if they appear as themselves: a+b "a+b". Use backslash to quote a single symbol e.g. \* KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 16 / 1
Example 1 %class Classify %standalone Digit = [0-9] Letter = [a-za-z] Whitespace = [ \t\n]+ {Whitespace} {/* Do nothing! */} {Digit}+ {System.out.printf("number [%s]\n", yytext());} {Letter}({Letter} {Digit})* {System.out.printf("word [%s]\n", yytext());}. {System.out.printf("symbol [%s]\n", yytext());} KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 17 / 1
Notes Can attach names (e.g. Whitespace) to REs for brevity/clarity. Note {Whitespace} in lexical rules. When multiple rules apply (i) take longest match (maximum munch), (ii) use order of rule appearance to break ties KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 18 / 1
Something to Think About Readability Metric Measure of readability of English text Score (ARI Metric) ( ) ( ) #characters # words 4.71 + 0.5 21.43 #words # sentences Characters means letters, digits and punctuation KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 19 / 1
Something to Think About Readability Metric Measure of readability of English text Score (ARI Metric) ( ) ( ) #characters # words 4.71 + 0.5 21.43 #words # sentences Characters means letters, digits and punctuation Implementation? KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 19 / 1
Example 2 %class LineCounter %standalone %{ int lineno = 1; %} line =.*\n {line} { System.out.printf("[%5d]: %s", lineno++, yytext()); } Echoes input line by line, with each line preceded with its line number. Material enclosed in %{ and }%. is included directly in LineCount.java. KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 20 / 1
Example 3 %class Dec2HexConvertor %standalone number = [0-9]+ {number} { int n = Integer.parseInt(yytext()); System.out.printf("%X (hex)", n); } Filters input file; translates all numbers into hexadecimal. KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 21 / 1
Notes on Example 2 Only numbers are matched (and translated into hex) Unmatched portions of the input are copied unaltered into output KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 22 / 1
Example 4 %class Upper2Lower %standalone COMMENT = "/*" [^*] ~"*/" "/*" "*"+ "/" [A-Z] { System.out.print( Character.toLowerCase( yytext().charat(0))); } {COMMENT} { /* Do nothing */ } KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 23 / 1
Notes Translates upper to lower-case, except inside C-style comments. Notes: / matches anything that does not contain, but ends in /* ; (Second clause captures comments like / /) KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 24 / 1