Finite automata We have looked at using Lex to build a scanner on the basis of regular expressions. Now we begin to consider the results from automata theory that make Lex possible. Recall: An alphabet Σ is a finite set of symbols. A string over Σ is a finite sequence of symbols from Σ. A language over Σ is a set of strings over Σ. A recognizer for a language L over Σ takes as input a string x over Σ and answers yes if x is in L and no otherwise. Lex scanners are based on an implementation of Kleene s Theorem: The regular languages are exactly the languages that can be recognized by a finite automaton. BTW The textbook gives a nonstandard definition of the set of regular languages, neglecting to include the empty language. So a correct statement of Kleene s Theorem in the context of the textbook is: The regular languages are exactly the nonempty languages that can be recognized by a finite automaton. 1 Regular languages can be recognized by finite automata. In fact, for every regular language, there is a finite automaton that recognizes it, and, moreover, every finite automaton recognizes a regular language. (Well, as I mentioned, there s an unfortunate exception for us, because your book does not count among the regular languages.) Finite automata (FA s) can be deterministic or not. We look first at nondeterministic finite automata (NFAs), because it is particularly easy to transform regular expressions into NFAs, and we can understand deterministic finite automata (DFAs) as a special case of NFAs. 2
Here is a diagram (a transition graph ) representing an NFA that recognizes the language (a b) abb (fig 3.19) NFAs and their transition graph representations Definition An NFA is a 5-tuple (S, Σ,move, s 0, F) where S is a finite set (of states) Σ is an alphabet (the input alphabet) move is a function from S (Σ {ǫ}) to 2 S (the powerset of S, that is, the set of all subsets of S) The set of states of the NFA is {0, 1, 2, 3}. The input alphabet is {a, b}. The start state is 0. There is only one accepting state: 3. The transition function for this NFA is represented by the table INPUT STATE a b ǫ 0 {0, 1} {0} 1 {2} 2 {3} 3 s 0 S (the start state) F S (the set of final, or accepting, states) An NFA is often represented as a transition graph in which: states are the nodes, represented as circles, the start state is indicated by an incoming arrow with no source, final states are indicated by a second, concentric circle, there is an arrow from state s to state t, labeled σ, if t move(s, σ). 3 4
String acceptance, language recognition, and an example An NFA M accepts a string x if there is a path in the transition graph of M from the start state to an accepting state, such that the labels along this path spell out x. (That is, the concatenation of the labels is x.) An NFA M accepts or recognizes, a language L if it accepts all, and only, strings from L. So, how many languages can a given NFA recognize? Deterministic Finite Automata (DFAs) A deterministic finite automaton is an NFA in which no state has an ǫ-transition (that is, in the transition graph, no node has an outgoing edge labeled ǫ) for each state s and input symbol a there is at most one outgoing edge labeled a (in the transition graph). Here is a DFA that accepts (a b) abb (fig 3.23) Here s a diagram of an NFA accepting the language a + b + : (fig 3.21) 5 6
A DFA has at most one transition from each state on any input, so it is easy to simulate. Let s look at a way to do this... First if there is any state s and input symbol a for which s does not have an outgoing edge labeled a, add a new state s d to S, and for every s and a for which move(s, a) =, let move(s, a) = {s d }. (And let move(s d, a) = {s d } for all a.) Now for every s and a, move(s, a) is a set with one state so let s instead understand move as the corresponding function that takes each state and input symbol to a state. With this slight adjustment to the DFA and its transition function, we can decide whether an input string x (terminated with eof) belongs to the language of the DFA, as follows: Converting an NFA into a DFA While DFAs are easy to simulate, NFAs are easier to obtain: 1. Easier to write directly. 2. Easy to construct on the basis of regular expressions. So we ll want an algorithm for converting any NFA into a DFA recognizing the same language... Let s start with a special case NFAs without ǫ-transitions. And let s begin with an example. (fig 3.19) s := s 0 ; c := nextchar; while c eof do s := move(s, c); c := nextchar; ; if s is in F then return yes else return no 7 8
Algorithm for reducing NFA with no ǫ-transitions to DFA: Dstates will be a set of subsets of S. (So each state in the DFA corresponds to a set of states in the NFA.) The reduction is slightly more complicated when the NFA has ǫ-transitions. Let s try an example first: an NFA for a(ab a ) b. Start state is {s 0 }. Dfinal = {T Dstates T F }. For each T Dstates, and each input symbol a Dmove(T, a) = move(s, a) s T It remains only to compute Dstates, as follows. initially, {s 0 } is the only element of Dstates, and it is unmarked while there is an unmarked state T in Dstates do begin mark T; for each input symbol a do begin U := Dmove(T, a); if U / Dstates then add U as an unmarked element of Dstates; 9 10
For each state s of an NFA, let s write ǫ-closure(s) to denote the set of all states reachable from s by a path with each transition labeled with ǫ. Notice that, for all s, s ǫ-closure(s) since you can reach s from s by the empty path (path with no transitions trivially, all of its transitions are labeled with ǫ). For every set T of states of an NFA, let ǫ-closure(t) = ǫ-closure(s). Now we can specify the general reduction of NFAs to DFAs, much as before... 11 s T Algorithm for reducing NFA to DFA: Dstates will again be a set of subsets of S. Start state is ǫ-closure({s 0 }). (Notice use of ǫ-closure.) Dfinal = {T Dstates T F }. (As before). For each T Dstates, and each input symbol a Dmove(T, a) = move(s, a) (Also as before). We ll up computing another function Dtrans as the transition function for the DFA. It remains to compute Dstates and Dtrans, as follows. initially, ǫ-closure({s 0 }) is the only element of Dstates, and it is unmarked while there is an unmarked state T in Dstates do begin mark T; for each input symbol a do begin U := ǫ-closure(dmove(t, a)); if U / Dstates then add U as an unmarked element of Dstates; Dtrans(T, a) := U; 12 s T
initially, ǫ-closure({s 0 }) is the only element of Dstates, and it is unmarked while there is an unmarked state T in Dstates do begin mark T; for each input symbol a do begin U := ǫ-closure(dmove(t, a)); if U / Dstates then add U as an unmarked element of Dstates; Dtrans(T, a) := U; Let s try it. Read Section 3.7. For next time (fig 3.27) 13 14