Lexical Analysis and jflex

Similar documents
Simple Lexical Analyzer

CSC 467 Lecture 3: Regular Expressions

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

JFlex Regular Expressions

Lecture Outline. COMP-421 Compiler Design. What is Lex? Lex Specification. ! Lexical Analyzer Lex. ! Lex Examples. Presented by Dr Ioanna Dionysiou

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

Module 8 - Lexical Analyzer Generator. 8.1 Need for a Tool. 8.2 Lexical Analyzer Generator Tool

Lecture 12: Parser-Generating Tools

Structure of Programming Languages Lecture 3

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

POLITECNICO DI TORINO. Formal Languages and Compilers. Laboratory N 1. Laboratory N 1. Languages?

JFlex. Lecture 16 Section 3.5, JFlex Manual. Robb T. Koether. Hampden-Sydney College. Mon, Feb 23, 2015

Formal Languages and Compilers

Lecture 5: Regular Expression and Finite Automata

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

An introduction to Flex

Week 2: Syntax Specification, Grammars

Gechstudentszone.wordpress.com

Flex and lexical analysis

Ray Pereda Unicon Technical Report UTR-02. February 25, Abstract

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

CS4850 SummerII Lex Primer. Usage Paradigm of Lex. Lex is a tool for creating lexical analyzers. Lexical analyzers tokenize input streams.

Using Lex or Flex. Prof. James L. Frankel Harvard University

LECTURE 6 Scanning Part 2

Chapter 3 Lexical Analysis

Program Fundamentals

Compiler course. Chapter 3 Lexical Analysis

Ulex: A Lexical Analyzer Generator for Unicon

CS143 Handout 04 Summer 2011 June 22, 2011 flex In A Nutshell

TDDD55- Compilers and Interpreters Lesson 2

Lexical and Syntax Analysis

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

LECTURE 7. Lex and Intro to Parsing

Outline CS4120/4121. Compilation in a Nutshell 1. Administration. Introduction to Compilers Andrew Myers. HW1 out later today due next Monday.

LECTURE 11. Semantic Analysis and Yacc

Parsing and Pattern Recognition

An Introduction to LEX and YACC. SYSC Programming Languages

Lexical Analysis - Flex

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

UNIT - 7 LEX AND YACC - 1

Edited by Himanshu Mittal. Lexical Analysis Phase

Java Bytecode (binary file)

Implementation of Lexical Analysis

Lexical Analysis. Textbook:Modern Compiler Design Chapter 2.1

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Lexical Analysis. Textbook:Modern Compiler Design Chapter 2.1.

Figure 2.1: Role of Lexical Analyzer

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

JavaCC: SimpleExamples

Lexical Analysis. Lecture 3-4

Full file at

More Examples. Lex/Flex/JLex

Compiler Construction

Handout 7, Lex (5/30/2001)

Università degli Studi di Bologna Facoltà di Ingegneria. Principles, Models, and Applications for Distributed Systems M

LEX/Flex Scanner Generator

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

Lex & Yacc. By H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Alternation. Kleene Closure. Definition of Regular Expressions

Lex & Yacc. by H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Group A Assignment 3(2)

Lexical Analyzer Scanner

A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994

Using JFlex. "Linux Gazette...making Linux just a little more fun!" by Christopher Lopes, student at Eastern Washington University April 26, 1999

Implementation of Lexical Analysis

PROGRAMMING FUNDAMENTALS

A program that performs lexical analysis may be termed a lexer, tokenizer, or scanner, though scanner is also a term for the first stage of a lexer.

Introduction to Lex & Yacc. (flex & bison)

The structure of a compiler

Lexical Analysis. Chapter 2

The MaSH Programming Language At the Statements Level

Compiler Construction

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 3

Lecture 4: Basic I/O

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Lexical Analysis. Lecture 2-4

Parser and syntax analyzer. Context-Free Grammar Definition. Scanning and parsing. How bottom-up parsing works: Shift/Reduce tecnique.

Full file at C How to Program, 6/e Multiple Choice Test Bank

Scanning. COMP 520: Compiler Design (4 credits) Alexander Krolik MWF 13:30-14:30, MD 279

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process.

Lecture 05 I/O statements Printf, Scanf Simple statements, Compound statements

Scanning. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Oct Decision Structures cont d

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

Recognition of Tokens

12/22/11. Java How to Program, 9/e. Help you get started with Eclipse and NetBeans integrated development environments.

CS 541 Spring Programming Assignment 2 CSX Scanner

Full file at

Lexical Analyzer Scanner

Assoc. Prof. Dr. Marenglen Biba. (C) 2010 Pearson Education, Inc. All rights reserved.

The Language for Specifying Lexical Analyzer

Program Development Tools. Lexical Analyzers. Lexical Analysis Terms. Attributes for Tokens

ML 4 A Lexer for OCaml s Type System

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

CS 2210 Programming Project (Part I)

Introduction to Lexical Analysis

Transcription:

Lecture 6: Lexical Analysis and jflex Dr Kieran T. Herley Department of Computer Science University College Cork 2017-2018 KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 1 / 1

Summary Lexical analysis. Automated production of lexcical analyzer. Jflex. Simple examples. KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 2 / 1

What is Lexical Analysis? Lexical analysis first stage of compilation Decomposes source into tokens: words (identifiers, reserved words), numbers(123), symbols(+, <=) (Typically skips whitespace and comments) Example Source if (x <= 123) { /* a comment */ sum = sum + x; } Token Stream if ( x <= 123 ) { sum = sum + x ; } KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 3 / 1

Developing a Lexical Analyser Write C/Java program by hand Tiresome and tricky Use lexical analyizer generator: generates analyzer automatically from descriptions (regular expressions) of tokens in the programming language Examples: lex/flex for C jflex for Java KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 4 / 1

(jf)lex Input description of token structure (regular expressions) info. on how to process different tokens Output implemetation of NFA-based function that recognizes tokens (as specfied by RE rules) processes them (as specified by actions) KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 5 / 1

jflex Program Format /* User code */ /* Options and declarations */ /* Lexical Rules */ Lexical Rules Rule = Pattern + Action Pattern = Regular Expression Action = Snippet of Java code (Actions triggered whenever pattern matched) User Code e.g. import statements, included top of generated Java; often empty Options etc. Marcos (named REs); code to be spliced into generated Java class KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 6 / 1

Pattern Searching Example /* * Search through source flagging any * occurrances of pattern (a b)*abb * found. */ %standalone (a b)*abb { System.out.println("*** found match\n");} \n { /* do nothing */}. { /* do nothing */} KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 7 / 1

The Main Pattern (a b)*abb { System.out.println("*** found match\n");} Pattern (a b) abb Action {System.out.println...} Operation For each match detected (i.e. string consisting of as and bs that ends abb), the action (i.e. println) is performed KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 8 / 1

Pattern Searching Example cont d %standalone (a b)*abb {System...} \n {/* do nothing */}. {/* do nothing */} By default, unmatched fragments of input are echoed verbatim to output final two rules surpress this (by gobbling up every char not matched by main rule) RE. (dot) matches any char except newline Note: With %standalone option, generated code includes a main method KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 9 / 1

Using jflex Illustration $ jflex search.flex $ javac Yylex.java $ java Yylex searchtest.txt Notes jfex generates Java file YYlex.java Executing Yylex.class reads searchtest.txt and prints *** match... once for each occurrence of a string matching RE (a b) abb, i.e. flags each ocurence of pattern. KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 10 / 1

A More Polished Version %class Search %standalone %line %column (a b)*abb {System.out.printf( "*** found match [%s] at l%d, c%d\n ", yytext(), yyline, yycolumn); } \n { /* do nothing */}. { /* do nothing */} KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 11 / 1

Notes %class option creates Search.java insead of Yylex.java For each match found, prints text mattched and its position within the input file yylex() the text fragments that matches the pattern yyline the line number (need %line option to enable this) KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 12 / 1

Getting jflex Can download jflex package and documentation from www.jflex.de KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 13 / 1

jflex RE Syntax pattern meaning a character a a character a (even special chars.) abc a followed by b followed by c (no explicit concat. symbol)) a b a or b a zero or more rep. of a a+ one or more rep. of a a? optional a (a) a itself [abc] any (one) of a or b or c [ abc] any char. except a, b or c. any char. except newline KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 14 / 1

Example Pattern ("+" "-")?[0-9]+("."[0-9]+)? Meaning (Optional sign) One or more digits (Optional decimal point one or more digits ) KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 15 / 1

Notes jflex s special characters e.g. ( ) - + ^ [ ] * must be quoted if they appear as themselves: a+b "a+b". Use backslash to quote a single symbol e.g. \* KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 16 / 1

Example 1 %class Classify %standalone Digit = [0-9] Letter = [a-za-z] Whitespace = [ \t\n]+ {Whitespace} {/* Do nothing! */} {Digit}+ {System.out.printf("number [%s]\n", yytext());} {Letter}({Letter} {Digit})* {System.out.printf("word [%s]\n", yytext());}. {System.out.printf("symbol [%s]\n", yytext());} KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 17 / 1

Notes Can attach names (e.g. Whitespace) to REs for brevity/clarity. Note {Whitespace} in lexical rules. When multiple rules apply (i) take longest match (maximum munch), (ii) use order of rule appearance to break ties KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 18 / 1

Something to Think About Readability Metric Measure of readability of English text Score (ARI Metric) ( ) ( ) #characters # words 4.71 + 0.5 21.43 #words # sentences Characters means letters, digits and punctuation KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 19 / 1

Something to Think About Readability Metric Measure of readability of English text Score (ARI Metric) ( ) ( ) #characters # words 4.71 + 0.5 21.43 #words # sentences Characters means letters, digits and punctuation Implementation? KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 19 / 1

Example 2 %class LineCounter %standalone %{ int lineno = 1; %} line =.*\n {line} { System.out.printf("[%5d]: %s", lineno++, yytext()); } Echoes input line by line, with each line preceded with its line number. Material enclosed in %{ and }%. is included directly in LineCount.java. KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 20 / 1

Example 3 %class Dec2HexConvertor %standalone number = [0-9]+ {number} { int n = Integer.parseInt(yytext()); System.out.printf("%X (hex)", n); } Filters input file; translates all numbers into hexadecimal. KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 21 / 1

Notes on Example 2 Only numbers are matched (and translated into hex) Unmatched portions of the input are copied unaltered into output KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 22 / 1

Example 4 %class Upper2Lower %standalone COMMENT = "/*" [^*] ~"*/" "/*" "*"+ "/" [A-Z] { System.out.print( Character.toLowerCase( yytext().charat(0))); } {COMMENT} { /* Do nothing */ } KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 23 / 1

Notes Translates upper to lower-case, except inside C-style comments. Notes: / matches anything that does not contain, but ends in /* ; (Second clause captures comments like / /) KH (03/10/17) Lecture 6: Lexical Analysis and jflex 2017-2018 24 / 1