Lecture 5: Regular Expression and Finite Automata Dr Kieran T. Herley Department of Computer Science University College Cork 2017-2018 KH (28/09/17) Lecture 5: Regular Expression and Finite Automata 2017-2018 1 / 1
Summary Overview here KH (28/09/17) Lecture 5: Regular Expression and Finite Automata 2017-2018 2 / 1
Equivalence of FAs and REs DFAs, NFAs and REs Fact DFAs, NFAs and REs have same expressive power i.e. allow precisely same patterns/sets to be specified KH (28/09/17) Lecture 5: Regular Expression and Finite Automata 2017-2018 3 / 1
Translating REs into NFAs Theorem Theorem For every regular expression R, there is a nondeterministic finite automaton M(R) that accepts the language specified by R. Proof (Sketch) By construction Analyze structure of R in terms of subexpressions; reflect structure in expression tree Build M(R) based on structure of R KH (28/09/17) Lecture 5: Regular Expression and Finite Automata 2017-2018 4 / 1
Automaton Construction Regular expression: a i.e. single symbol Corresponding automaton: Accepts expression if f is an accept state. KH (28/09/17) Lecture 5: Regular Expression and Finite Automata 2017-2018 5 / 1
Automaton Construction cont d Regular expression: Corresponding automaton: ɛ Accepts expression if f is an accept state. KH (28/09/17) Lecture 5: Regular Expression and Finite Automata 2017-2018 6 / 1
Automaton Construction cont d Regular expression: X Y Suppose M(X ) and M(Y ) are automata for expressions X and Y (with start states s, s and accept states f, f ). Corresponding automaton: Lemma Automaton M(X Y ) accepts precisely the strings in X Y (if f is an accept state). KH (28/09/17) Lecture 5: Regular Expression and Finite Automata 2017-2018 7 / 1
Proof of Claim Lemma: Accepts expression X Y if f is an accept state. Why? (implicit induction) Clearly α = α X α Y X Y implies s f s f path labelled α exists Existance of s f path labelled α implies α in X Y : Path has structure s f s f Subpath s f corresponds to string matching X Subpath s f corresponds to string matching Y KH (28/09/17) Lecture 5: Regular Expression and Finite Automata 2017-2018 8 / 1
Automaton Construction cont d Regular expression: X Y Suppose M(X ) and M(Y ) are automata for expressions X and Y (with start states s, s and accept states f, f ); Corresponding automaton: Lemma Automaton M(X Y ) accepts expression X Y if f is an accept state. (Proof similiar to previous lemma.) KH (28/09/17) Lecture 5: Regular Expression and Finite Automata 2017-2018 9 / 1
Automaton Construction cont d Regular expression: X Suppose M(X ) is automaton for expression X (with start state s and accept state f ); Corresponding automaton: Lemma Automaton M(X ) ccepts expression X if f is an accept state. KH (28/09/17) Lecture 5: Regular Expression and Finite Automata 2017-2018 10 / 1
Automaton Construction Summary KH (28/09/17) Lecture 5: Regular Expression and Finite Automata 2017-2018 11 / 1
Notes Rules imply recursive algorithm for translating expression E into automaton M(E) that recognizes patterns matching E. Each rule adds at most two states, so #states = O(expr. length). Accept states in sub-automata employed in construction become non-accept states in composite apart from top-level automaton. KH (28/09/17) Lecture 5: Regular Expression and Finite Automata 2017-2018 12 / 1
Example Expression: (a b) abb Translation: all strings beginning with zero or more as or bs followed by abb. Tree captures structure of RE in terms of subexpressions etc. Each non-leaf represents a operator from () (Note: explicit for concatenation) KH (28/09/17) Lecture 5: Regular Expression and Finite Automata 2017-2018 13 / 1
Example r1 M(a) = r2 M(b) = r3 M(a b) = KH (28/09/17) Lecture 5: Regular Expression and Finite Automata 2017-2018 14 / 1
Example cont d M((a b) ) = KH (28/09/17) Lecture 5: Regular Expression and Finite Automata 2017-2018 15 / 1
Example cont d KH (28/09/17) Lecture 5: Regular Expression and Finite Automata 2017-2018 16 / 1
Applications of RE to FA Constuction Usefulness of RE to NFA Construction Lexical Analysis Specify language tokens (identifiers, numerical constants, symbols etc.) as REs Tools like lex automatically generate automaton-based code to decompose source code into constituent tokens Pattern Matching e.g. text editors, grep Pattern specified as RE Automaton-based search locates occurances KH (28/09/17) Lecture 5: Regular Expression and Finite Automata 2017-2018 17 / 1
Applications of RE to FA Constuction grep grep/egrep/fgrep search a file for a pattern (string or regular expression) Examples: fgrep intro /man/man3/*.3* searches files matching RHS for string intro, listing occurrences found egrep Fred (Smith) (Jones) telephone.txt searches telephone.txt for names with first name Fred and last name Smith or Jones grep/egrep/fgrep differ in generality of patterns handled and their efficiency. fgrep the most efficient, egrep the most general. KH (28/09/17) Lecture 5: Regular Expression and Finite Automata 2017-2018 18 / 1
Applications of RE to FA Constuction grep grep E file Build an automaton M that recognizes occurrences of the regular expression E: Simulate M on each line in file. Every time an accept state is entered an occurrence of the pattern (E) has been detected so flag current line. KH (28/09/17) Lecture 5: Regular Expression and Finite Automata 2017-2018 19 / 1
Applications of RE to FA Constuction Note Automaton M(E) recognizes any string x that matches pattern E. (Recall grep flags lines that contain substring matching the pattern.) To get automaton that recognizes any string that contains a substring y that matches E modify as follows: Use NfaAccept to detect matches. KH (28/09/17) Lecture 5: Regular Expression and Finite Automata 2017-2018 20 / 1
Applications of RE to FA Constuction Another Application Can characterize syntax of building blocks (tokens) of most programming languages (identifiers, numerical literals, symbols, comments etc.) using REs. Software tools can automatically generate code to read source and chop it into tokens KH (28/09/17) Lecture 5: Regular Expression and Finite Automata 2017-2018 21 / 1
From NFAs to DFAs Proof not straightforward, but idea is to construct DFA where each KH (28/09/17) Lecture 5: Regular Expression and Finite Automata 2017-2018 22 / 1 DFA vs NFA Theorem For every NFA, there is an equivalent DFA, i.e. one that accepts precisely the same language.
From DFAs to REs DFA-to-RE Translation Theorem For every DFA there is an RE that captures the strings accepted by that DFA. Define Ri,j k to set set of strings that take DFA from state i to state j without going through any state numbered higher than k. Recurrence k = 0 Ri,j 0 = labels on direct i j edges, if any; add ɛ if i = j k > 0 Ri,j k = R k 1 i,k (Rk 1 k,k ) Rk 1 k,j R k 1 i,j KH (28/09/17) Lecture 5: Regular Expression and Finite Automata 2017-2018 23 / 1
From DFAs to REs r1,3 3 = r1,3(r 2 3,3) 2 r3,3 r 2 1,3 2 = 0 1(ɛ (0 1)0 1) (ɛ (0 1)0 1) 0 1 KH (28/09/17) Lecture 5: Regular Expression and Finite Automata 2017-2018 24 / 1 DFA-to-RE Translation Example k = 0 Ri,j 0 = labels on direct i j edges, if any; add ɛ if i = j k > 0 R k i,j = R k 1 i,k (Rk 1 k,k ) Rk 1 k,j R k 1 i,j Answer is sum ( ) of following two expressions k = 0 k = 1 k = 2 r1,1 k ɛ ɛ (00) r1,2 k 0 0 0(00) r1,3 k 1 1 0 1 r2,1 k 0 0 0(00) r2,2 k ɛ ɛ 00 (00) r2,3 k 1 1 01 0 1 r3,1 k (0 1)(00) 0 r3,2 k 0 1 0 1 (0 1)(00) r3,3 k ɛ ɛ ɛ (0 1)0 1 r1,2 3 = r1,3(r 2 3,3) 2 r3,2 r 2 1,2 2 = 0 1(ɛ (0 1)0 1) (0 1)(00) 0(00)