CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 4: Lexical Analyzers 28 Jan 08

Similar documents
CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

Topic 2: Lexing and Flexing

Fig.25: the Role of LEX

Definition of Regular Expression

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

Reducing a DFA to a Minimal DFA

Finite Automata. Lecture 4 Sections Robb T. Koether. Hampden-Sydney College. Wed, Jan 21, 2015

Lexical Analysis: Constructing a Scanner from Regular Expressions

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata

Dr. D.M. Akbar Hussain

Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) *

CS 430 Spring Mike Lam, Professor. Parsing

CSCE 531, Spring 2017, Midterm Exam Answer Key

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Lexical analysis, scanners. Construction of a scanner

Assignment 4. Due 09/18/17

Should be done. Do Soon. Structure of a Typical Compiler. Plan for Today. Lab hours and Office hours. Quiz 1 is due tonight, was posted Tuesday night

Compilation

Principles of Programming Languages

Lexical Analysis and Lexical Analyzer Generators

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 3b Lexical Analysis Elias Athanasopoulos

Lexical Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Deterministic. Finite Automata. And Regular Languages. Fall 2018 Costas Busch - RPI 1

Scanner Termination. Multi Character Lookahead. to its physical end. Most parsers require an end of file token. Lex and Jlex automatically create an

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

Compiler Construction D7011E

Fall Compiler Principles Lecture 1: Lexical Analysis. Roman Manevich Ben-Gurion University

Fall Compiler Principles Lecture 1: Lexical Analysis. Roman Manevich Ben-Gurion University of the Negev

CS 340, Fall 2014 Dec 11 th /13 th Final Exam Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string.

CSE 401 Midterm Exam 11/5/10 Sample Solution

CMPSC 470: Compiler Construction

Example: Source Code. Lexical Analysis. The Lexical Structure. Tokens. What do we really care here? A Sample Toy Program:

ECE 468/573 Midterm 1 September 28, 2012

CS 340, Fall 2016 Sep 29th Exam 1 Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string.

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

Scanner Termination. Multi Character Lookahead

Compilers Spring 2013 PRACTICE Midterm Exam

COMP 423 lecture 11 Jan. 28, 2008

CMPT 379 Compilers. Lexical Analysis

Midterm I Solutions CS164, Spring 2006

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011

Some Thoughts on Grad School. Undergraduate Compilers Review and Intro to MJC. Structure of a Typical Compiler. Lexing and Parsing

Operator Precedence. Java CUP. E E + T T T * P P P id id id. Does a+b*c mean (a+b)*c or

12 <= rm <digit> 2 <= rm <no> 2 <= rm <no> <digit> <= rm <no> <= rm <number>

Theory of Computation CSE 105

CS 241 Week 4 Tutorial Solutions

Applied Databases. Sebastian Maneth. Lecture 13 Online Pattern Matching on Strings. University of Edinburgh - February 29th, 2016

From Dependencies to Evaluation Strategies

Java CUP. Java CUP Specifications. User Code Additions. Package and Import Specifications

LEX5: Regexps to NFA. Lexical Analysis. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class

CS 321 Programming Languages and Compilers. Bottom Up Parsing

LR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table

Regular Expression Matching with Multi-Strings and Intervals. Philip Bille Mikkel Thorup

acronyms possibly used in this test: CFG :acontext free grammar CFSM :acharacteristic finite state machine DFA :adeterministic finite automata

2014 Haskell January Test Regular Expressions and Finite Automata

TO REGULAR EXPRESSIONS

Quiz2 45mins. Personal Number: Problem 1. (20pts) Here is an Table of Perl Regular Ex

Sample Midterm Solutions COMS W4115 Programming Languages and Translators Monday, October 12, 2009

10/12/17. Motivating Example. Lexical and Syntax Analysis (2) Recursive-Descent Parsing. Recursive-Descent Parsing. Recursive-Descent Parsing

ASTs, Regex, Parsing, and Pretty Printing

CS 241. Fall 2017 Midterm Review Solutions. October 24, Bits and Bytes 1. 3 MIPS Assembler 6. 4 Regular Languages 7.

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016

CS201 Discussion 10 DRAWTREE + TRIES

Lexical Analysis. Role, Specification & Recognition Tool: LEX Construction: - RE to NFA to DFA to min-state DFA - RE to DFA

cisc1110 fall 2010 lecture VI.2 call by value function parameters another call by value example:

Eliminating left recursion grammar transformation. The transformed expression grammar

CMSC 331 First Midterm Exam

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

stack of states and grammar symbols Stack-Bottom marker C. Kessler, IDA, Linköpings universitet. 1. <list> -> <list>, <element> 2.

What are suffix trees?

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFORMATICS 1 COMPUTATION & LOGIC INSTRUCTIONS TO CANDIDATES

Algorithm Design (5) Text Search

Mid-term exam. Scores. Fall term 2012 KAIST EE209 Programming Structures for EE. Thursday Oct 25, Student's name: Student ID:

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence

Pointers and Arrays. More Pointer Examples. Pointers CS 217

Homework. Context Free Languages III. Languages. Plan for today. Context Free Languages. CFLs and Regular Languages. Homework #5 (due 10/22)

Recognition of Tokens

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have

Lists in Lisp and Scheme

PPS: User Manual. Krishnendu Chatterjee, Martin Chmelik, Raghav Gupta, and Ayush Kanodia

Regular Expressions and Automata using Miranda

LING/C SC/PSYC 438/538. Lecture 21 Sandiway Fong

Data sharing in OpenMP

Context-Free Grammars

Scanning Theory and Practice

Lecture T1: Pattern Matching

CS311H: Discrete Mathematics. Graph Theory IV. A Non-planar Graph. Regions of a Planar Graph. Euler s Formula. Instructor: Işıl Dillig

Context-Free Grammars

Suffix trees, suffix arrays, BWT

Lecture T4: Pattern Matching

COS 333: Advanced Programming Techniques

CS 236 Language and Computation. Alphabet. Definition. I.2.1. Formal Languages (10.1)

Topic: Software Model Checking via Counter-Example Guided Abstraction Refinement. Having a BLAST with SLAM. Combining Strengths. SLAM Overview SLAM

box Boxes and Arrows 3 true 7.59 'X' An object is drawn as a box that contains its data members, for example:

2 Computing all Intersections of a Set of Segments Line Segment Intersection

Alignment of Long Sequences. BMI/CS Spring 2012 Colin Dewey

COMPUTER SCIENCE 123. Foundations of Computer Science. 6. Tuples

Transcription:

CS412/413 Introduction to Compilers Tim Teitelum Lecture 4: Lexicl Anlyzers 28 Jn 08

Outline DFA stte minimiztion Lexicl nlyzers Automting lexicl nlysis Jlex lexicl nlyzer genertor CS 412/413 Spring 2008 Introduction to Compilers 2

Finite Automt Finite utomt: Sttes, trnsitions etween sttes Initil stte, set of finl sttes DFA: Deterministic Finite Automton Ech trnsition consumes n input chrcter Ech trnsition is uniquely determined y the input chrcter NFA: Non-deterministic Finite Automton ε-trnsitions, which do not consume input chrcters Multiple trnsitions from the sme stte on the sme input chrcter CS 412/413 Spring 2008 Introduction to Compilers 3

From RE to DFA Two steps: Convert the regulr expression to n NFA Convert the resulting NFA to DFA The generted DFAs my hve lrge numer of sttes Stte Minimiztion is n optimiztion tht converts DFA to nother DFA tht recognizes the sme lnguge nd hs minimum numer of sttes CS 412/413 Spring 2008 Introduction to Compilers 4

Stte Minimiztion Exmple: DFA1: 0 1 2 3 4 DFA2: 0 1 2 Both DFAs ccept: ** CS 412/413 Spring 2008 Introduction to Compilers 5

Stte Minimiztion Step1. Prtition sttes of originl DFA into mximlsized groups of equivlent sttes S = {G 1,,G n } 1 0 2 3 4 Step 2. Construct the minimized DFA such tht there is stte for ech group G i CS 412/413 Spring 2008 Introduction to Compilers 6

DFA Minimiztion (Equivlence) All sttes in group G i re equivlent iff for ny two sttes p nd q in G i, nd for every symol σ, trnsition(p,σ) nd trnsition(q,σ) re either oth Error, or re sttes in the sme group G j (possily G i itself). c G i G j p G (or Error) k q c r c CS 412/413 Spring 2008 Introduction to Compilers 7

DFA Minimiztion (Equivlence) All sttes in group G i re equivlent iff for ny two sttes p nd q in G i, nd for every symol σ, trnsition(p,σ) nd trnsition(q,σ) re either oth Error, or re sttes in the sme group G j (possily G i itself). c G i G j G (or Error) k CS 412/413 Spring 2008 Introduction to Compilers 8

DFA Minimiztion Step1. Prtition sttes of originl DFA into mximlsized groups of equivlent sttes Step 1. Discrd sttes not rechle from strt stte Step 1. Initil prtition is S = {Finl, Non-finl} Step 1c. Repetedly refine the prtition {G 1,,G n } while some group G i contins sttes p nd q such tht for some symol σ, trnsitions from p nd q on σ re to different groups G j G i G (or Error) k p j k q CS 412/413 Spring 2008 Introduction to Compilers 9

DFA Minimiztion Step1. Prtition sttes of originl DFA into mximlsized groups of equivlent sttes Step 1. Discrd sttes not rechle from strt stte Step 1. Initil prtition is S = {Finl, Non-finl} Step 1c. Repetedly refine the prtition {G 1,,G n } while some group G i contins sttes p nd q such tht for some symol σ, trnsitions from p nd q on σ re to different groups G i G j p G (or Error) k j k q G i CS 412/413 Spring 2008 Introduction to Compilers 10

Optimized Acceptor Regulr Expression R RE NFA NFA DFA Minimize DFA Input String w DFA Simultion Yes, if w L(R) No, if w L(R) CS 412/413 Spring 2008 Introduction to Compilers 11

Lexicl Anlyzers vs Acceptors Lexicl nlyzers use the sme mechnism, ut they: Hve multiple RE descriptions for multiple tokens Output sequence of mtching tokens (or n error) Alwys return the longest mtching token For multiple longest mtching tokens, use rule priorities CS 412/413 Spring 2008 Introduction to Compilers 12

Lexicl Anlyzers REs for Tokens R 1 R n RE NFA NFA DFA Minimize DFA Chrcter Strem progrm DFA Simultion Token strem (nd errors) CS 412/413 Spring 2008 Introduction to Compilers 13

Hndling Multiple REs Construct one NFA for ech RE Associte the finl stte of ech NFA with the given RE Comine NFAs for ll REs into one NFA Convert NFA to minimized DFA, ssociting ech finl DFA stte with the highest priority RE of the corresponding NFA sttes NFAs keywords Minimized DFA ε ε ε ε whitespce identifier numer CS 412/413 Spring 2008 Introduction to Compilers 14

Scnning Algorithm Scn input nd simulte DFA until no further trnsition is possile keeping trck of most recently visited finl stte F Roll input ck to position t the time F ws entered Emit token ssocited with F For ech successive token, scn remining input nd simulte DFA from the strt stte, i.e., scnner is stteless (NB. this is to e chnged elow.) CS 412/413 Spring 2008 Introduction to Compilers 15

Exmple of Roll Bck Consider three REs: { ] nd input: 0 1 2 3 4 5 6 Rech stte 3 with no trnsition on next chrcter Roll input ck to position on entering stte 2 (i.e., hving red ) Emit token for On next cll to scnner, strt in stte 0 gin with input CS 412/413 Spring 2008 Introduction to Compilers 16

Automting Lexicl Anlysis All of the lexicl nlysis process cn e utomted RE NFA DFA Minimized DFA Minimized DFA Lexicl Anlyzer (DFA Simultion Progrm) We only need to specify: Regulr expressions for the tokens Rule priorities for multiple longest mtch cses CS 412/413 Spring 2008 Introduction to Compilers 17

Lexicl Anlyzer Genertors REs for Tokens lex.l Jlex Compiler lex.jv jvc Compiler Chrcter Strem progrm lex.clss Token strem (nd errors) CS 412/413 Spring 2008 Introduction to Compilers 18

Jlex Specifiction File Jlex = Lexicl nlyzer genertor written in Jv genertes Jv lexicl nlyzer Hs three prts: Premle, which contins pckge/import declrtions Definitions, which contins regulr expression revitions Regulr expressions nd ctions, which contins: the list of regulr expressions for ll the tokens Corresponding ctions for ech token (Jv code to e executed when the token is recognized) CS 412/413 Spring 2008 Introduction to Compilers 19

Exmple Specifiction File Pckge Prse; Import Error.LexiclError; %% digits = 0 [1-9][0-9]* letter = [A-Z-z] identifier = {letter}({letter} [0-9_])* whitespce = [\ \t\n\r]+ %% {whitespce} {/* discrd */} {digits} { return new Token(INT, Integer.vlueOf(yytext()); } if { return new Token(IF, null); } while { return new Token(WHILE, null); } {identifier} { return new Token(ID, yytext()); }. { ErrorMsg.error( illegl chrcter ); } CS 412/413 Spring 2008 Introduction to Compilers 20

Strt Sttes Mechnism tht specifies stte in which to strt the execution of the DFA Declre sttes in the second section %stte STATE Use sttes s prefixes of regulr expressions in the third section: <STATE> regex {ction} Set current stte in the ctions yyegin(state) There is pre-defined initil stte: YYINITIAL CS 412/413 Spring 2008 Introduction to Compilers 21

Exmple INITIAL STRING. if %% %stte STRING %% <YYINITIAL> if { return new Token(IF, null); } <YYINITIAL> \ { yyegin(string); } <STRING> \ { yyegin(yyinitial); } <STRING>. { } CS 412/413 Spring 2008 Introduction to Compilers 22

Strt Sttes nd REs The use of strt sttes llows the lexer to recognize more thn regulr expressions (or DFAs) Reson: the lexer cn jump cross different sttes in the semntic ctions using yyegin(state) Exmple: nested comments Increment glol vrile on open prentheses nd decrement it on close prentheses When the vrile gets to zero, jump to YYINITIAL The glol vrile essentilly models n infinite numer of sttes! CS 412/413 Spring 2008 Introduction to Compilers 23

Conclusion Regulr expressions: concise wy of specifying tokens Cn convert RE to NFA, then to DFA, then to minimized DFA Use the minimized DFA to recognize tokens in the input strem Automte the process using lexicl nlyzer genertors Write regulr expression descriptions of tokens Automticlly get lexicl nlyzer progrm which identifies tokens from n input strem of chrcters CS 412/413 Spring 2008 Introduction to Compilers 24