CSE 431S Scanning Washington University Spring 2013
Regular Languages Three ways to describe regular languages FSA Right-linear grammars Regular expressions
Regular Expressions A regular expression is a concise description of a regular set. Finite description of possibly infinite set. Given an alphabet, valid regular expressions consist of: The null expression The empty string Characters from Compound expressions built using the operators: alternation, concatenation, Kleene closure
Regular Expressions Alternation One subexpression or another The expression: a b Results in the set { a, b } Note there are two strings in the set Concatenation Build a larger string by concatentation the results of two subexpressions The expression: ab Results in the set: { ab } Note there is one string in the set
Regular Expressions Kleene closure Zero or more concatenations of a subexpression The expression: a* Results in the set: {, a, aa, aaa, aaaa, } Note the set is infinite, though each string is finite
Regular Expressions Operator precedence Kleene closure > concatenation > alternation Parentheses can be used to group subexpressions So the expression: a bc* Is equivalent to: (a (b (c*)))
Regular Expression Build up expressions from the inside out What is: a bc* Equivalent to: (a (b (c*))) c* = {, c, cc, ccc, cccc, } bc* = { b, bc, bcc, bccc, bcccc, } a bc* = { a, b, bc, bcc, bccc, bcccc, } What is: (a b) c*
Regular Expression Build up expressions from the inside out What is: a bc* Equivalent to: (a (b (c*))) c* = {, c, cc, ccc, cccc, } bc* = { b, bc, bcc, bccc, bcccc, } a bc* = { a, b, bc, bcc, bccc, bcccc, } What is: (a b) c* { a, ac, acc, accc,, b, bc, bcc, bccc, }
Regular Expressions Expression a b a b ab a* (ab)* Set { } { a } { b } { a, b } { ab } {, a, aa, aaa, aaaa, } {, ab, abab, ababab, }
FSA FSA can be Deterministic For every (state, symbol) pair there is exactly one target state. δ: S Σ S Nondeterministic There exists a (state, symbol) pair for which there is more than one target state. δ: S Σ P(S)
FSA Deterministic All transitions are of the form: Nondeterministic Some transitions are of the form: P a Q a Q P a R
FSA NFA and DFA have the same expressive power. For any NFA n there exists a DFA d such that L(n) = L(d) And vice versa.
RE to NFA Expression NFA q 0 q 0 or q 0 a q 0 a
RE to NFA s t m(s) q 0 m(t) st q 0 m(s) m(t) s* q 0 m(s)
RE to NFA Example String that ends in a digit seen before: (1 2 3)*1(1 2 3)*1 (1 2 3)*2(1 2 3)*2 (1 2 3)*3(1 2 3)*3
Machine M 123 for (1 2 3) A 1 B G C 2 D H E 3 F
Machine (M 123 )* for (1 2 3)* A 1 B I G C 2 D H J E 3 F
(M 123 )* 1 1 K (M 123 )* L M
((M 123 *)1)(M 123 *) N M 123 *1 M 123 * P
((M 123 *)1)(M 123 *)1 1 Q 1 ((M 123 )*1) (M 123 )* R S 1
((M 123 *)2)(M 123 *)2 2 Q 2 ((M 123 )*2) (M 123 )* R S 2
((M 123 *)3)(M 123 *)3 3 Q 1 ((M 123 )*3) (M 123 )* R S 1
Combine the machines ((M 123 )*1) (M 123 )*1 T ((M 123 )*2) (M 123 )*2 U ((M 123 )*3) (M 123 )*3
G A 1 B C 2 D E 3 F H I J G A 1 B C 2 D E 3 F H I J G A 1 B C 2 D E 3 F H I J G A 1 B C 2 D E 3 F H I J G A 1 B C 2 D E 3 F H I J G A 1 B C 2 D E 3 F H I J T U 1 2 3 1 2 3
NFA GOTO Table 1,2,3 1,2,3 S A 1 1 1,2,3 2 B 2 F 3 1,2,3 C 3 1 2 3 S {S,A} {S,B} {S,C} A {A,F} {A} {A} B {B} {B,F} {B} C {C} {C} {C,F} F Ø Ø Ø Entries are now sets of states.
Traversal of 12321 (S,12321) (S,2321) (A,2321) (S,321) (B,321) (A,321) (S,21) (C,21) (B,21) (A,21) (S,1) (B,1) (C,1) (B,1) (F,1) (A,1) (S,) (A, ) (B, ) (C, ) (B, ) (A, ) (F, )
NFA to DFA Two steps Eliminate transitions Eliminate remaining nondeterminism
-Closure The -Closure of a state A is the set of all states reachable from A while consuming no input. This includes A itself. closure A = {A} closure(b) A B Iterative algorithm to determine max fixed point.
-Closure Max Fixed Point Algorithm: Write down equations for all states based on formal definition. Start by assuming -Closure(P) is for all P. Iterate over equations using the most recently calculated value for the -Closure of each state. Eventually will reach a point where further iteration causes no change. Guaranteed to terminate. (Why?)
-Closure A B C -Closure(A) = {A} U -Closure(B) -Closure(B) = {B} U -Closure(C) -Closure(C) = {C} U -Closure(B)
-Closure(A) = {A} U -Closure(B) -Closure(B) = {B} U -Closure(C) -Closure(C) = {C} U -Closure(B) -Closure To start: assume -Closure(P) = for all P. Iteration 1: -Closure(A) = {A} U -Closure(B) = {A} U = {A} -Closure(B) = {B} U -Closure(C) = {B} U = {B} -Closure(C) = {C} U -Closure(B) = {C} U {B} = {B, C} Iteration 2: -Closure(A) = {A} U -Closure(B) = {A} U {B} = {A, B} -Closure(B) = {B} U -Closure(C) = {B} U {B, C} = {B, C} -Closure(C) = {C} U -Closure(B) = {C} U {B, C} = {B, C} Iteration 3: -Closure(A) = {A} U -Closure(B) = {A} U {B, C} = {A, B, C} -Closure(B) = {B} U -Closure(C) = {B} U {B, C} = {B, C} -Closure(C) = {C} U -Closure(B) = {C} U {B, C} = {B, C} Further iteration yields no change.
Eliminating Calculate -Closure for all states. Remove transitions and add corresponding non- transitions. A B x C D becomes x x x A B C D x
0*1*2* 0 1 2 A B C -Closure(A) = {A, B, C} -Closure(B) = {B, C} -Closure(C) = {C} From State Coast (on ) Consume Coast (on ) A A 0 A {A, B, C} B 1 B {B, C} C 2 C {C} B B 1 B {B, C} C 2 C {C} C C 2 C {C}
0*1*2* 0 1 2 0,1 1,2 A B C 0,1,2 Any state that had an original accepting state in its -Closure is now an accepting state.
Eliminating Nondeterminism a Q P a R becomes P a [QR]
Eliminating Nondeterminism 0 1 2 0,1 1,2 A B C 0,1,2 State 0 1 2 [A] [ABC] [BC] [C] [B] Ø [BC] [C] [C] Ø Ø [C] [ABC] [ABC] [BC] [C] [BC] Ø [BC] [C]
0*1*2* 2 1 2 A 1 BC 2 C 0 1 1 B 2 2 Note that B is now unreachable and can be removed. ABC 0 Any state formed from an originally accepting state is now also an accepting state.