Lexical Analysis - 2

Similar documents
Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

Lexical Analysis. Prof. James L. Frankel Harvard University

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Lexical Analysis. Implementation: Finite Automata

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 5

Lexical Error Recovery

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

CSE302: Compiler Design

Compiler phases. Non-tokens

[Lexical Analysis] Bikash Balami

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

CSE450. Translation of Programming Languages. Automata, Simple Language Design Principles

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Lexical Error Recovery

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

2. Syntax and Type Analysis

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Lexical Analyzer Scanner

Dr. D.M. Akbar Hussain

Lexical Analyzer Scanner

Implementation of Lexical Analysis

Syntax and Type Analysis

UNIT -2 LEXICAL ANALYSIS

Compiler Construction

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

CT32 COMPUTER NETWORKS DEC 2015

Zhizheng Zhang. Southeast University

CS308 Compiler Principles Lexical Analyzer Li Jiang

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

2010: Compilers REVIEW: REGULAR EXPRESSIONS HOW TO USE REGULAR EXPRESSIONS

Lexical Analysis. Lecture 3-4

Converting a DFA to a Regular Expression JP

Implementation of Lexical Analysis

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

Monday, August 26, 13. Scanners

CS 310: State Transition Diagrams

Wednesday, September 3, 14. Scanners

Languages and Compilers

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

Writing a Lexical Analyzer in Haskell (part II)

Non-deterministic Finite Automata (NFA)

Front End: Lexical Analysis. The Structure of a Compiler

Theory of Computations Spring 2016 Practice Final Exam Solutions

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

6 NFA and Regular Expressions

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

Compiler Construction

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

Lexical Analysis. Chapter 2

Implementation of Lexical Analysis

Optimizing Finite Automata

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

Formal Languages and Compilers Lecture VI: Lexical Analysis

2. Lexical Analysis! Prof. O. Nierstrasz!

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Languages, Automata, Regular Expressions & Scanners. Winter /8/ Hal Perkins & UW CSE B-1

Introduction to Lexical Analysis

CMSC 350: COMPILER DESIGN

CSE 105 THEORY OF COMPUTATION

Implementation of Lexical Analysis. Lecture 4

3.15: Applications of Finite Automata and Regular Expressions. Representing Character Sets and Files

Alternation. Kleene Closure. Definition of Regular Expressions

UNIT II LEXICAL ANALYSIS

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Regular Languages and Regular Expressions

CSE 105 THEORY OF COMPUTATION

Theory of Computation Dr. Weiss Extra Practice Exam Solutions

Theory of Programming Languages COMP360

Structure of Programming Languages Lecture 3

Lexical Analysis. Lecture 2-4

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

Week 2: Syntax Specification, Grammars

lec3:nondeterministic finite state automata

Automating Construction of Lexers

CS415 Compilers. Lexical Analysis

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

Lexical Analysis. Introduction

Decision Properties of RLs & Automaton Minimization

Lecture 4: Syntax Specification

PRINCIPLES OF COMPILER DESIGN UNIT II LEXICAL ANALYSIS 2.1 Lexical Analysis - The Role of the Lexical Analyzer

Chapter 3 Lexical Analysis

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States

Compiler course. Chapter 3 Lexical Analysis

Finite Automata and Scanners

DFA: Automata where the next state is uniquely given by the current state and the current input character.

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

CS2 Language Processing note 3

Figure 2.1: Role of Lexical Analyzer

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 3

Formal Languages. Formal Languages

Lexical Analysis. Finite Automata. (Part 2 of 2)

Midterm Exam II CIS 341: Foundations of Computer Science II Spring 2006, day section Prof. Marvin K. Nakayama

Transcription:

Lexical Analysis - 2 More regular expressions Finite Automata NFAs and DFAs Scanners JLex - a scanner generator 1

Regular Expressions in JLex Symbol - Meaning. Matches a single character (not newline) * Matches 0 or more copies of preceding RE + Matches 1 or more copies of preceding RE? Matches 0 or 1 occurrence of an RE ^ Everything it quotes is matches EXACTLY Matches the beginning of a line $ Matches the end of a line [ ] Character class = matches any character listed; [^ ] implies a match of any character NOT listed ( ) Groups a series of REs into a new RE 2

REs in JLex Symbol - Meaning { } Control for repeated matching a specific number of times; a{1,3} means match 1,2, or 3 instances of a \ Used to match a metacharacter or control character; \n matches newline; \* is the character * Means match either the RE proceeding it OR the RE following it RE1/RE2 Means match RE1 but only when followed by RE2; 0/1 will match the 0 in 01 but not in 02 3

Exercise Problem: write an RE to match a quoted string such as Hello. Need to decide if a quoted string can go across more than 1 line of text Note: JLex REs are line-input-oriented, unlike formal REs Also, JLex REs make the longest matches possible within a string of characters If multiple REs are given to JLex and several match the same longest expression, the first matching RE is used. 4

Finite Automata Automata that recognize strings defined by a regular expression (States, Input symbols, Transitions, Start_state, Set of Final_states) Transitions between states occur on specific input symbols Deterministic automata have only 1 transition per state on a specific input and do not allow transition on the empty string 5

Finite Automata Language recognized by automaton is set of strings it accepts by starting in the start state, using transitions corresponding to input symbols in the input string, and processing all input and finishing in a final state. 0 1 9 0 1 9 RE for integers: A B [ 0-9 ] + Start state Final state 6

FAs Nondeterministic finite automaton <{states}, {input symbols} (terminal symbols of a grammar) Transition function ((state,input)--> state), Start state {Final states}> NFA allows more than 1 transition on the same input symbol and/or transitions on ε Deterministic FA allows only 1 transition per input symbol and no ε transitions 7

Theoretical results: FAs Set of languages recognizable by NFAs is same as those recognizable by DFAs. There is an algorithm to check for equivalence of two languages recognized by 2 different FAs. RE for reals: ([0-9] + \. [0-9]*) ([0-9] * \. [0-9] + ) Shown: [0-9] +. 0,1,,9 0,1,,9. 0,1,,9 0,1,,9. NFA DFA 8

Practical FAs Encode transitions as a table Each column is an input symbol Each row is a state Entry at (s1,i1) is state to transition to when in state s1 and see input i1 Scanner has to try to find longest match in input to a possible token May have to look beyond end of token to do this! 9

RE to NFA Conversion Straightforward translation using composition operators of REs For RE ε, s ε f For RE a, terminal symbol, s a f For w,t REs with corresponding NFAs N(w), N(t), w t yields, s ε ε s N(w) s N(t) f f ε ε f 10

RE to NFA Conversion For w,t REs with corresponding NFAs N(w), N(t), w t yields, For w RE, w* yields, For (w) RE, use N(w). s s f f N(w) N(t) ε ε ε N(w) f s f ε 11

RE to NFA Conversion These drawings follow the Aho, Sethi, Ullman Compiler text and are equivalent to those in Appel For w + use fact that w + = w w* For w? use fact that w? = w ε [abc] = a b c For abc use fact that abc = a b c 12

How does an NFA compute? Start off in the start state Compute set S of all states reachable on ε transitions. Given next input symbol is a, calculate set of states T, reachable as transition(s,a) where s S Repeat steps 2,3 until input is exhausted. If final set of states contains a final state, then string has been recognized. 13

NFA to DFA Conversion Deterministic computation is desirable if we want a write a scanner as a program Need to convert NFA to equivalent DFA Then can simulate DFA recognition process using tables in program to describe transitions If process ends up in a final state, a token has been recognized 14

NFA to DFA Conversion Intuition: whenever there is an ε transition out of a state s, the NFA may go to any of the states reachable in this manner without consuming any input symbols. Call these states the ε-closure of state s. By looking at ε-closures, we form sets of related states in the NFA; these become states in the corresponding DFA Edges in the DFA correspond to sets of edges in the NFA (connecting different ε-closure sets of states) 15

NFA to DFA Conversion DFA derived is not the most efficient (smallest possible), but is usually of practical size There are ways of obtaining an optimal DFA by minimizing the numbers of states 16

Conversion Algorithm Need two primitive functions ε closure(t), for T a set of states in the NFA Returns a set of NFA states reachable from state s T by ε-transitions move(t, a), for T a set of states in the NFA returns a set of NFA states to which there is a transition on a from some NFA state s T Build set of states (D) and transitions (Dtrans) for the DFA 17

Conversion Algorithm Assume all states in NFA are unmarked initially. Let S = ε-closure(start state of NFA). Let D = {S}. while an unmarked state T D do Mark T; input symbols a do { U = ε-closure (move (T, a)); if U D then {add unmarked U to D}; Dtrans(T,a) = U; } endwhile 18

Possible Problems Theoretically, for NFA having n states can get DFA with 2 n states, but this doesn t happen in practice. Token is recognized if the ending state of the DFA contains an original final state of the NFA. In case of choice, use final state which represents the earliest rule in the list of productions for tokens 19

Optimal DFA There are algorithms for constructing the minimal (smallest) DFA Idea: Assume every state can transition on every input (can create an error state to do this). Try to prove that computation starting at states T and S differs on at least 1 input. If cannot find such an input can merge S and T. Resulting state has the union of their transitions. (ASU, pp141ff) 20