CSE 431S Scanning. Washington University Spring 2013

Similar documents
Chapter Seven: Regular Expressions

Chapter Seven: Regular Expressions. Formal Language, chapter 7, slide 1

Automating Construction of Lexers

Regular Languages and Regular Expressions

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Automata Theory CS S-FR Final Review

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Ambiguous Grammars and Compactification

CSE 105 THEORY OF COMPUTATION

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Lexical Analysis. Implementation: Finite Automata

Non-deterministic Finite Automata (NFA)

JNTUWORLD. Code No: R

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

QUESTION BANK. Unit 1. Introduction to Finite Automata

ECS 120 Lesson 16 Turing Machines, Pt. 2

Formal Languages and Automata

Finite Automata Part Three

CMPSCI 250: Introduction to Computation. Lecture 20: Deterministic and Nondeterministic Finite Automata David Mix Barrington 16 April 2013

Lexical Analysis. Prof. James L. Frankel Harvard University

CHAPTER TWO LANGUAGES. Dr Zalmiyah Zakaria

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

CSE 105 THEORY OF COMPUTATION

Regular Expressions & Automata

1. (10 points) Draw the state diagram of the DFA that recognizes the language over Σ = {0, 1}

8 ε. Figure 1: An NFA-ǫ

CS 314 Principles of Programming Languages. Lecture 3

Multiple Choice Questions

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Lecture 3: Lexical Analysis

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

Languages and Compilers

6 NFA and Regular Expressions

CSE 105 THEORY OF COMPUTATION

2. Lexical Analysis! Prof. O. Nierstrasz!

UNION-FREE DECOMPOSITION OF REGULAR LANGUAGES

CMPSCI 250: Introduction to Computation. Lecture #28: Regular Expressions and Languages David Mix Barrington 2 April 2014

Compiler Construction

General Overview of Compiler

Compiler Design. 2. Regular Expressions & Finite State Automata (FSA) Kanat Bolazar January 21, 2010

lec3:nondeterministic finite state automata

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Implementation of Lexical Analysis. Lecture 4

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

HKN CS 374 Midterm 1 Review. Tim Klem Noah Mathes Mahir Morshed

Theory of Computation Dr. Weiss Extra Practice Exam Solutions

Lexical Analysis - 2

CMSC 330: Organization of Programming Languages

Languages and Finite Automata

Decision Properties of RLs & Automaton Minimization

Regular Expressions. Chapter 6

Limits of Computation p.1/?? Limits of Computation p.2/??

NFAs and Myhill-Nerode. CS154 Chris Pollett Feb. 22, 2006.

CMSC 132: Object-Oriented Programming II

Chapter 4: Regular Expressions

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Finite Automata. Dr. Nadeem Akhtar. Assistant Professor Department of Computer Science & IT The Islamia University of Bahawalpur

CMSC 330: Organization of Programming Languages

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Formal Languages. Formal Languages

CS402 - Theory of Automata FAQs By

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

1.0 Languages, Expressions, Automata

Lexical Analysis 1 / 52


Lexical Analysis. Lecture 3-4

Theory of Programming Languages COMP360

Converting a DFA to a Regular Expression JP

Lexical Analyzer Scanner

Lexical Analysis. Chapter 2

Lexical Analyzer Scanner

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

ECS 120 Lesson 7 Regular Expressions, Pt. 1

CT32 COMPUTER NETWORKS DEC 2015

QUESTION BANK. Formal Languages and Automata Theory(10CS56)

CSE 105 THEORY OF COMPUTATION

CSE 401 Midterm Exam 11/5/10

Zhizheng Zhang. Southeast University

CSE 105 THEORY OF COMPUTATION

Decision Properties for Context-free Languages

KHALID PERVEZ (MBA+MCS) CHICHAWATNI

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 3

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Compiler Construction

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

14.1 Encoding for different models of computation

CMSC 330: Organization of Programming Languages. Context Free Grammars

1. Which of the following regular expressions over {0, 1} denotes the set of all strings not containing 100 as a sub-string?

Theory of Computations Spring 2016 Practice Final Exam Solutions

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np

CS415 Compilers. Lexical Analysis

CS 314 Principles of Programming Languages

CSE 105 THEORY OF COMPUTATION

Learn Smart and Grow with world

SWEN 224 Formal Foundations of Programming

Transcription:

CSE 431S Scanning Washington University Spring 2013

Regular Languages Three ways to describe regular languages FSA Right-linear grammars Regular expressions

Regular Expressions A regular expression is a concise description of a regular set. Finite description of possibly infinite set. Given an alphabet, valid regular expressions consist of: The null expression The empty string Characters from Compound expressions built using the operators: alternation, concatenation, Kleene closure

Regular Expressions Alternation One subexpression or another The expression: a b Results in the set { a, b } Note there are two strings in the set Concatenation Build a larger string by concatentation the results of two subexpressions The expression: ab Results in the set: { ab } Note there is one string in the set

Regular Expressions Kleene closure Zero or more concatenations of a subexpression The expression: a* Results in the set: {, a, aa, aaa, aaaa, } Note the set is infinite, though each string is finite

Regular Expressions Operator precedence Kleene closure > concatenation > alternation Parentheses can be used to group subexpressions So the expression: a bc* Is equivalent to: (a (b (c*)))

Regular Expression Build up expressions from the inside out What is: a bc* Equivalent to: (a (b (c*))) c* = {, c, cc, ccc, cccc, } bc* = { b, bc, bcc, bccc, bcccc, } a bc* = { a, b, bc, bcc, bccc, bcccc, } What is: (a b) c*

Regular Expression Build up expressions from the inside out What is: a bc* Equivalent to: (a (b (c*))) c* = {, c, cc, ccc, cccc, } bc* = { b, bc, bcc, bccc, bcccc, } a bc* = { a, b, bc, bcc, bccc, bcccc, } What is: (a b) c* { a, ac, acc, accc,, b, bc, bcc, bccc, }

Regular Expressions Expression a b a b ab a* (ab)* Set { } { a } { b } { a, b } { ab } {, a, aa, aaa, aaaa, } {, ab, abab, ababab, }

FSA FSA can be Deterministic For every (state, symbol) pair there is exactly one target state. δ: S Σ S Nondeterministic There exists a (state, symbol) pair for which there is more than one target state. δ: S Σ P(S)

FSA Deterministic All transitions are of the form: Nondeterministic Some transitions are of the form: P a Q a Q P a R

FSA NFA and DFA have the same expressive power. For any NFA n there exists a DFA d such that L(n) = L(d) And vice versa.

RE to NFA Expression NFA q 0 q 0 or q 0 a q 0 a

RE to NFA s t m(s) q 0 m(t) st q 0 m(s) m(t) s* q 0 m(s)

RE to NFA Example String that ends in a digit seen before: (1 2 3)*1(1 2 3)*1 (1 2 3)*2(1 2 3)*2 (1 2 3)*3(1 2 3)*3

Machine M 123 for (1 2 3) A 1 B G C 2 D H E 3 F

Machine (M 123 )* for (1 2 3)* A 1 B I G C 2 D H J E 3 F

(M 123 )* 1 1 K (M 123 )* L M

((M 123 *)1)(M 123 *) N M 123 *1 M 123 * P

((M 123 *)1)(M 123 *)1 1 Q 1 ((M 123 )*1) (M 123 )* R S 1

((M 123 *)2)(M 123 *)2 2 Q 2 ((M 123 )*2) (M 123 )* R S 2

((M 123 *)3)(M 123 *)3 3 Q 1 ((M 123 )*3) (M 123 )* R S 1

Combine the machines ((M 123 )*1) (M 123 )*1 T ((M 123 )*2) (M 123 )*2 U ((M 123 )*3) (M 123 )*3

G A 1 B C 2 D E 3 F H I J G A 1 B C 2 D E 3 F H I J G A 1 B C 2 D E 3 F H I J G A 1 B C 2 D E 3 F H I J G A 1 B C 2 D E 3 F H I J G A 1 B C 2 D E 3 F H I J T U 1 2 3 1 2 3

NFA GOTO Table 1,2,3 1,2,3 S A 1 1 1,2,3 2 B 2 F 3 1,2,3 C 3 1 2 3 S {S,A} {S,B} {S,C} A {A,F} {A} {A} B {B} {B,F} {B} C {C} {C} {C,F} F Ø Ø Ø Entries are now sets of states.

Traversal of 12321 (S,12321) (S,2321) (A,2321) (S,321) (B,321) (A,321) (S,21) (C,21) (B,21) (A,21) (S,1) (B,1) (C,1) (B,1) (F,1) (A,1) (S,) (A, ) (B, ) (C, ) (B, ) (A, ) (F, )

NFA to DFA Two steps Eliminate transitions Eliminate remaining nondeterminism

-Closure The -Closure of a state A is the set of all states reachable from A while consuming no input. This includes A itself. closure A = {A} closure(b) A B Iterative algorithm to determine max fixed point.

-Closure Max Fixed Point Algorithm: Write down equations for all states based on formal definition. Start by assuming -Closure(P) is for all P. Iterate over equations using the most recently calculated value for the -Closure of each state. Eventually will reach a point where further iteration causes no change. Guaranteed to terminate. (Why?)

-Closure A B C -Closure(A) = {A} U -Closure(B) -Closure(B) = {B} U -Closure(C) -Closure(C) = {C} U -Closure(B)

-Closure(A) = {A} U -Closure(B) -Closure(B) = {B} U -Closure(C) -Closure(C) = {C} U -Closure(B) -Closure To start: assume -Closure(P) = for all P. Iteration 1: -Closure(A) = {A} U -Closure(B) = {A} U = {A} -Closure(B) = {B} U -Closure(C) = {B} U = {B} -Closure(C) = {C} U -Closure(B) = {C} U {B} = {B, C} Iteration 2: -Closure(A) = {A} U -Closure(B) = {A} U {B} = {A, B} -Closure(B) = {B} U -Closure(C) = {B} U {B, C} = {B, C} -Closure(C) = {C} U -Closure(B) = {C} U {B, C} = {B, C} Iteration 3: -Closure(A) = {A} U -Closure(B) = {A} U {B, C} = {A, B, C} -Closure(B) = {B} U -Closure(C) = {B} U {B, C} = {B, C} -Closure(C) = {C} U -Closure(B) = {C} U {B, C} = {B, C} Further iteration yields no change.

Eliminating Calculate -Closure for all states. Remove transitions and add corresponding non- transitions. A B x C D becomes x x x A B C D x

0*1*2* 0 1 2 A B C -Closure(A) = {A, B, C} -Closure(B) = {B, C} -Closure(C) = {C} From State Coast (on ) Consume Coast (on ) A A 0 A {A, B, C} B 1 B {B, C} C 2 C {C} B B 1 B {B, C} C 2 C {C} C C 2 C {C}

0*1*2* 0 1 2 0,1 1,2 A B C 0,1,2 Any state that had an original accepting state in its -Closure is now an accepting state.

Eliminating Nondeterminism a Q P a R becomes P a [QR]

Eliminating Nondeterminism 0 1 2 0,1 1,2 A B C 0,1,2 State 0 1 2 [A] [ABC] [BC] [C] [B] Ø [BC] [C] [C] Ø Ø [C] [ABC] [ABC] [BC] [C] [BC] Ø [BC] [C]

0*1*2* 2 1 2 A 1 BC 2 C 0 1 1 B 2 2 Note that B is now unreachable and can be removed. ABC 0 Any state formed from an originally accepting state is now also an accepting state.