Lex Spec Example. Int installid() {/* code to put id lexeme into string table*/}

Similar documents
Chapter 3 Lexical Analysis

Compiler course. Chapter 3 Lexical Analysis

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

The Java program that uses the scanner is: import java.io.*;

The JLex specification file is:

Implementation of Lexical Analysis

Implementation of Lexical Analysis

Overlapping Definitions

Lexical Analysis. Implementing Scanners & LEX: A Lexical Analyzer Tool

CSE302: Compiler Design

Figure 2.1: Role of Lexical Analyzer

PRACTICAL CLASS: Flex & Bison

Module 8 - Lexical Analyzer Generator. 8.1 Need for a Tool. 8.2 Lexical Analyzer Generator Tool

Character Classes. Our specification of the reserved word if, as shown earlier, is incomplete. We don t (yet) handle upper or mixedcase.

The character class [\])] can match a single ] or ). (The ] is escaped so that it isn t misinterpreted as the end of character class.

Potential Problems in Using JLex. JLex Examples. The following differences from standard Lex notation appear in JLex:

Introduction to Lexical Analysis

Regular Expressions in JLex. Character Classes. Here are some examples: "+" {return new Token(sym.Plus);}

Lexical Analyzer Scanner

Lexical Analyzer Scanner

Lexical Analysis. Implementation: Finite Automata

Lexical Analysis - Flex

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Implementation of Lexical Analysis

Alternation. Kleene Closure. Definition of Regular Expressions

Implementation of Lexical Analysis

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Lexical Analysis. Chapter 2

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Finite Automata and Scanners

CS143 Handout 04 Summer 2011 June 22, 2011 flex In A Nutshell

Implementation of Lexical Analysis. Lecture 4

Automated Tools. The Compilation Task. Automated? Automated? Easier ways to create parsers. The final stages of compilation are language dependant

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 3

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Lexical Analysis. Lecture 2-4

Languages and Compilers

Lexical Analysis 1 / 52

Compiler phases. Non-tokens

Preparing for the ACW Languages & Compilers

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Parsing and Pattern Recognition

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

Compiler Construction

Monday, August 26, 13. Scanners

Compiler Construction

Wednesday, September 3, 14. Scanners

CSc 453 Lexical Analysis (Scanning)

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

CS 403: Scanning and Parsing

Lexical Analysis. Lecture 3-4

LEX/Flex Scanner Generator

Typical tradeoffs in compiler design are: speed of compilation size of the generated code speed of the generated code Speed of Execution Foundations

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

Scanning. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

Using Lex or Flex. Prof. James L. Frankel Harvard University

An Introduction to LEX and YACC. SYSC Programming Languages

Principles of Compiler Design Presented by, R.Venkadeshan,M.Tech-IT, Lecturer /CSE Dept, Chettinad College of Engineering &Technology

An introduction to Flex

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

CSC 467 Lecture 3: Regular Expressions

Formal Languages and Compilers Lecture VI: Lexical Analysis

Introduction to Lex & Yacc. (flex & bison)

Zhizheng Zhang. Southeast University

Chapter 3 -- Scanner (Lexical Analyzer)

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Compiler Lab. Introduction to tools Lex and Yacc

Chapter 2 :: Programming Language Syntax

Scanning. COMP 520: Compiler Design (4 credits) Alexander Krolik MWF 13:30-14:30, MD 279

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Flex and lexical analysis

LECTURE 6 Scanning Part 2

Lexical Analysis. Finite Automata. (Part 2 of 2)

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

Lexical Analysis (ASU Ch 3, Fig 3.1)

Lexical Analysis. Textbook:Modern Compiler Design Chapter 2.1.

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process.

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Administrivia. Lexical Analysis. Lecture 2-4. Outline. The Structure of a Compiler. Informal sketch of lexical analysis. Issues in lexical analysis

CSE302: Compiler Design

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

CSCI Compiler Design

Part 5 Program Analysis Principles and Techniques

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Lexical and Parser Tools

UNIT -2 LEXICAL ANALYSIS

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Lexical Analysis. Textbook:Modern Compiler Design Chapter 2.1

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

CSCI312 Principles of Programming Languages!

Transcription:

Class 5

Lex Spec Example delim [ \t\n] ws {delim}+ letter [A-Aa-z] digit [0-9] id {letter}({letter} {digit})* number {digit}+(\.{digit}+)?(e[+-]?{digit}+)? %% {ws} {/*no action and no return*?} if {return(if);} then {return(then);} {id} {yylval=(int) installid(); return(id);} {number} {yylval=(int) installnum(); return(number);} %% Int installid() {/* code to put id lexeme into string table*/} Int installnum() {/* code to put number constants into constant table*/}

Some Notes on Lex yylval global integer variable to pass additional information about the lexeme yyline line number in input file yytext returns the lexeme matched Turtle.l Lex spec Lex/flex scanner generator lex.yy.c c/c++ compiler Test.tlt scanner Token stream

Form of a JLex Spec File user code %% JLex directives %% regular expression rules in the form of: reg expr {action} reg expr {action}

JLex Spec Example class Token { String text; Token(String t){text = t;} } %% Digit=[0-9] AnyLet=[A-Za-z] Others=[0-9 &.] WhiteSp=[\040\n] // Tell JLex to have yylex() return a Token %type Token // Tell JLex what to return when eof of file is hit %eofval{ return new Token(null); %eofval} %% [Pp]{AnyLet}{AnyLet}{AnyLet}[Tt]{WhiteSp}+ {return new Token(yytext());} ({AnyLet} {Others})+{WhiteSp}+ {/*skip*/}

Some Notes on JLex yychar character count matched yyline line number in input file where matched yytext returns the lexeme matched MeggyJava.lex Lex spec Jlex scanner generator yylex MeggyJava compiler scanner MeggyJava program Token stream

A Java driver program that uses the scanner is: import java.io.*; class Main { public static void main(string args[]) }} throws java.io.ioexception { Yylex lex = new Yylex(System.in); Token token = lex.yylex(); while ( token.text!= null ) { System.out.print("\t"+token.text); } token = lex.yylex(); //get next token

Handling Ambiguities What if x 1 x i L(R) and also x 1 x K L(R) Some examples? Which token is used? How designated?

More Ambiguities What if x1 xi L(Rj) and also x1 xi L(Rk)? Which token is used?

Lexical Error Detection and Handling No rule matches a prefix of input? Problem: Compiler can t just get stuck

You should Do Some More Practice with reading and writing lex specs

How does the Scanner work under the Hood?

From Specification to Scanning

What is a Finite Automata? Regular expressions = specification Finite automata = implementation A finite automaton consists of An input alphabet Σ A set of states S A start state n A set of accepting states F S A set of transitions state input state

From Reg Expr to NFA How do we build an NFA for: a? Concatenation? ab Alternation? a b Closure? a*

Scanning as a Finite Automaton

Understanding FA

DFA vs NFA? What is allowed? Which can be much bigger in size? Which is simpler? Which is faster to run?

Comparison by size

Implementing a DFA A DFA can be implemented by a 2D table T One dimension is states Other dimension is input symbol For every transition S a S define T[i,a] = k i k DFA execution If in state S i Very efficient and input a, read T[i,a] = k and skip to state S k

However, 3 Major Ways to Build Scanners ad-hoc semi-mechanical pure DFA (usually realized as nested case statements) table-driven DFA Ad-hoc generally yields the fastest, most compact code by doing lots of specialpurpose things, though good automaticallygenerated scanners come very close

Manually written scanner code

In summary, Scanner is the only phase to see the input file, so The scanner is responsible for what?

In summary, Scanner is the only phase to see the input file, so The scanner is responsible for: tokenizing source removing comments saving text of identifiers, numbers, strings saving source locations (file, line, column) for error messages

Why separate phases?

More Details on Lex/Flex (for your own reading pleasure)

A Makefile for the scanner eins.out: eins.tlt scanner scanner < eins.tlt > eins.out lex.yy.o: lex.yy.c token.h symtab.h gcc -c lex.yy.c lex.yy.c: turtle.l flex turtle.l scanner: lex.yy.o symtab.c gcc lex.yy.o symtab.c -lfl -o scanner

A typical token.h file #define SEMICOLON 274 #define PLUS 275 #define MINUS 276 #define TIMES 277 #define DIV 278 #define OPEN 279 #define CLOSE 280 #define ASSIGN 281 /*for all tokens*/ typedef union YYSTYPE { int i; node *n; double d;} YYSTYPE; YYSTYPE yylval;

%% A typical driver for testing the main(){ int token; scanner without a parser while ((token = yylex())!= 0) { switch (token) { case JUMP : printf("jump\n"); break; /*need a case here for every token possible, printing yylval as needed for those with more than one lexeme per token*/ default: printf("illegal CHARACTER\n"); break; } } }