A Parallel DFA Minimization Algorithm

A Parallel DFA Miimizatio Algorithm Ambuj Tewari, Utkarsh Srivastava, ad P. Gupta Departmet of Computer Sciece & Egieerig Idia Istitute of Techology Kapur Kapur 208 016,INDIA pg@iitk.ac.i Abstract. I this paper,we have cosidered the state miimizatio problem for Determiistic Fiite Automata (DFA). A efficiet parallel algorithm for solvig the problem o a arbitrary CRCW PRAM has bee proposed. For umber of states ad k umber of iputs i Σ of the DFA to be miimized,the algorithm rus i O(k ) time ad uses O( ) processors. 1 Itroductio The problem of miimizig a give DFA (Determiistic F iite Automata) has a log history datig back to the begiigs of automata theory. Cosider a determiistic fiite automato M as a tuple (Q, Σ, q 0,F,δ)whereQ, Σ, q 0 Q, F Q ad δ : Q Σ Q are a fiite set of states, a fiite iput alphabet, the start state, the set of acceptig (or fial) states ad the trasitio fuctio, respectively. A iput strig x is a sequece of symbols over Σ. O a iput strig x = x 1 x 2...x m, the DFA visits a sequece of states q 0 q 1...q m startig with the start state by successively applyig the trasitio fuctio δ. Thus q i+1 = δ(q i,x i+1 )for0 i m 1. The laguage L(M) accepted by a DFA is defied as the set of strigs x that takes the DFA to a acceptig state, i.e. x is i L(M) if ad oly if q m F. Two DFAs are said to be equivalet if they accept the same set of strigs. The problem of DFA miimizatio is to fid a DFA with the miimum umber of states which is equivalet to the give DFA. A fudametal result i automata theory states that such a miimal DFA is uique up to reamig of states. The umber of states i the miimal DFA is give by the umber of equivalece classes i the partitio o the set of all strigs i Σ defied as follows: Two strigs x ad y are equivalet if ad oly if for all strigs z, xz L(M) if ad oly if yz L(M) [6]. Besides beig widely studied, DFA has may applicatios i the diverse fields like patter matchig, optimizatio of logic programs, protocol verificatio ad specificatio ad modelig of fiite state systems [14]. It is kow that odetermiistic fiite automata (NFA) are equivalet to determiistic oes as far as the laguages recogized by them are cocered. Huffma [4] admoore[10] have preseted O( 2 ) algorithms for DFA miimizatio ad are sufficietly fast for most of the classical applicatios. However, there exist umerous algorithms S. Sahi et al. (Eds.) HiPC 2002, LNCS 2552, pp. 34 40, 2002. c Spriger-Verlag Berli Heidelberg 2002

A Parallel DFA Miimizatio Algorithm 35 which are variatios of the same basic idea. A efficiet O( ) algorithm is due to Hopcroft [5]. Blum [2] has also proposed a simpler algorithm with same time complexity. Traditioal applicatios of the DFA miimizatio algorithm ivolve a few thousad states ad the sequetial algorithms available geerally perform well i these settigs. But if the umber of states i a DFA is of the order of a few millios, the the efficiet sequetial algorithms may take a sigificat amout of time ad may eed much more tha the available physical memory. Oe way to achieve the speed-up is the use of multiple processors. DFA miimizatio has bee extesively studied o may parallel computatio models. Jaja ad Kosaraju [7] have preseted a efficiet algorithm o mesh coected computer for the case whe Σ = 1. A simple NC algorithm has also bee outlied i their work. Cho ad Huyh [3] have proved the problem to be NLOGSPACE-complete. Efficiet ad close to cost-optimal algorithms are kow oly for the case of the alphabet cosistig of a sigle iput symbol. A very simple algorithm is proposed by Srikat [13] but the best algorithm for this problem is due to Jaja ad Ryu [8]. It is a CRCW algorithm with time complexity O() ad cost O( log ). Ufortuately, o efficiet algorithm, which is also ecoomical with respect to cost, is kow for the geeral case of multiple iput symbols. The stadard NC algorithm for this problem requires O( 6 ) processors. This is due the use of trasitive closure computatio o a cross product graph. I [12], a simple parallel algorithm for DFA miimizatio alog with its implemetatio is preseted. I this paper we have proposed a efficiet parallel algorithm for DFA miimizatio havig O(klog) time complexity o a arbitrary CRCW PRAM with ( ) processors. The rest of the paper is orgaized as follows. Sectio 2 cotais some prelimiaries alog with the aive sequetial algorithm. I Sectio 3, log we discuss a fast parallel algorithm for the problem which uses a large umber of processors. A efficiet parallel algorithm has bee proposed i Sectio 4. Coclusio is give i the last sectio. 2 Review The DFA miimizatio problem is closely related to the coarsest partitioig problem which ca be stated as follows. We are give a set Q ad its iitial partitio ito m disjoit sets {B 0,...,B m 1 }, ad a collectio of fuctios, f i : Q Q. We have to fid a coarsest partitio of Q, say{e 1,...,E q },such that: (1) each E i is a subset of some B j, ad (2) the partitio respects the give fuctios, i.e. j, ifa ad b both belog to the same E i the f j (a) adf j (b) also belog to the same E k for some k. Assume, Q is the set of states, f i is the restrictio of the trasitio fuctio δ to the ith iput symbol, i.e. f i (q) =δ(q, a i ), the iitial partitio cotais two sets, amely F ad Q F. The size of the miimal DFA is the umber of equivalece classes i the coarsest partitio. For the geeral case of multiple fuctio coarsest partitio problem, a O( ) solutio is give i [1]. Later, a liear

36 Ambuj Tewari et al. Sequetial Algorithm 1. for all fial states q i do block o[q i]=1 2. for all o-fial states q i do block o[q i]=2 3. do 4. for i= 0to k-1 do 5. for j=0to -1 do 6. b 1 =blocko[q j] 7. b 2 =blocko[δ(q j,x i)] 8. label state q j with (b 1,b 2) 9. edfor 10. Assig same block umber to states havig same labels 11. edfor 12. while <umber of blocks is chagig> Algorithm 1: The sequetial algorithm for DFA miimizatio time solutio has bee proposed i [11] for the case of the sigle fuctio coarsest partitio problem. The sigle fuctio versio of the problem correspods to havig oly a sigle symbol i the iput alphabet Σ. But the simplest sequetial algorithm for solvig the problem, give i Algorithm 1, rus i O(k 2 ) time, where Σ = k [1]. It performs as follows. Iitially there are oly two blocks: oe cotaiig all the fial states ad the other cotaiig all the o-fial states. If two states q ad q are foud i the same block such that for some iput symbol a i,thestatesδ(q, a i )adδ(q,a i )arei differet blocks, q ad q are placed i differet blocks for the ext iteratio. The algorithm is iterated for at most times because, i the worst case, each state will be i a block cotaiig just itself. I each iteratio ew block umbers are assiged i O(k) time. Therefore, the total time take is O(k 2 ). 3 A Fast Parallel Algorithm The fastest kow algorithm for the multiple iput symbol case is due to Cho ad Huyh [3]. The DFA miimizatio problem is iitially traslated to a istace of the multiple fuctio coarsest partitio problem to yield the set S of states, the iitial partitio B cotaiig sets of fial ad o-fial states ad the fuctios f i which are restrictios of the trasitio fuctio to sigle iput symbols. A graph G =< V,E>may be geerated as follows: V = {(a, b) a, b S} ad E = {((a, b), (c, d)) c = f i (a), d = f i (b) forsomei}. For ay pair x, y S, x ad y get differet labels i the coarsest partitio if ad oly if there is a path from ode (x, y) tosomeode(a, b) such that a ad b have differet B-labels. The algorithm is give below. It ca be show that the algorithm ca be implemeted o a EREW i time O(log 2 ) with total cost O( 6 ). The boud O( 6 ) arises because of the trasitive closure computatio of agraphwith 2 odes. So far, it has ot bee possible to fid a algorithm with a reasoable cost, say O( 2 ), ad a small ruig time.

A Parallel DFA Miimizatio Algorithm 37 Parallel Algorithm 1. Costruct graph G as defied above 2. Mark all odes (p, q) p ad q belog to differet sets of the B-partitio 3. Uses trasitive closure to mark pairs reachable from marked pairs 4. commet Note that all umarked pairs are equivalet Algorithm 2: A fast parallel algorithm for DFA miimizatio 4 A Efficiet Algorithm I this sectio we have proposed a parallel versio of the simple O( 2 )time sequetial algorithm outlied i Sectio 3. We use O( ) processors to achieve a expected ruig time of O(). Usig a arbitrary-crcw PRAM, we have parallelized the ier for-loop which iterates over the set of states. The the ew labels obtaied i this for-loop are hashed usig parallel hashig algorithm of Matias ad Vishki [9] to get ew block umbers for the states. The algorithm is give below. Lies 1-8 are the same as i the sequetial algorithm except that Lies 5-8 of Algorithm 3 are ow doe i parallel. The labels obtaied i Lie 8 are hashed to [1..O()] usig the hashig techique due to Matias ad Vishki [9]. Theorem 1 (Parallel Hashig Theorem). LetWbeamultisetof umbers from the rage [1..m], where m +1 = p is a prime. Suppose we have New Parallel Algorithm 1. Iitialize block o array 2. do 3. for i =0to k-1 do 4. for j =1to do i parallel 5. for m =(j 1) to j 1 do 6. b 1 =blocko[q m] 7. b 2 =blocko[δ(q m,a i)] 8. label state q m with (b 1,b 2) 9. edfor 10. edfor 11. Use parallel hashig to map the labels to [1..O()] 12. aumbertowhichastate slabelgetsmappedto its ew block o 13. Reduce the rage of block o from O() to 14. edfor 15. while umber of blocks is chagig Algorithm 3: New parallel algorithm for DFA miimizatio

38 Ambuj Tewari et al. processors o a arbitrary-crcw PRAM. A oe-to-oe fuctio F : W [1..O()] ca be foud i O() expected time. The evaluatio of F (x) for each x W, takes O(1) arithmetic operatios (usig umbers from [1..m]). We have to assig ew block umbers to states such that states with differet labels get differet block umbers ad states with same label get same block umber. Also we do ot wat the block umbers to become too large. The labels are pairs of the form (b 1,b 2 )whereb 1,b 2. Therefore we ca map a label (b 1,b 2 )tob 1 ( +1)+b 2 ad we ca treat these labels as umbers i the rage [1..2 2 ]. A umber m such that 2 2 <m 4 2 ad m +1 is a prime ca be foud i O() time(see[9]) ad this has to be doe just oce. For istace, it ca be doe after the iitializatio phase of step 1. We wat the block umbers to remai i the rage [1..]. However, after hashig, we get umbers i the rage [1..K] somefixedk. This rage shrikig ca be implemeted o a arbitrary-crcw PRAM i time O(). The procedure is give as Algorithm 4. First, each processor hashes labels ad sets PRESENT[x] to1ifsome label got hashed to the value x, wherepresent[1..k] is a array. Several processors might try to write i the same locatio i the array but sice we have assumed a arbitrary-crcw PRAM, oe of them will succeed arbitrarily. Each of the processors ow cosiders a rage of K idices of the PRESENT array ad computes the umber of locatios which are set to 1 usig O(log( )) prefix sum algorithm. New block umber for a give locatio is simply the umber of 1 s occurrig before that locatio. A processor ca easily compute the ew block umber for a locatio i its rage by addig the umber of 1 s occurrig before that locatio withi the processor s rage to the umber of 1 s occurrig i earlier rages (this has already bee computed by prefix-sum). To fid the time complexity, let us cosider first Algorithm 4. From the parallel hashig theorem we kow that evaluatio of the hashig fuctio i lie 3 takes O(1) time. Lie 4 is a assigmet ad so the loop i Lies 2-5 takes O() time. Similarly the loop i Lies 9-11 takes O(). Prefix sum computatio i Lies 13-14 also takes O() time. The loop i Lies 17-21 takes O() time. Evaluatio of the hash fuctio at lie 25 takes O(1) time. Therefore, the last loop (Lies 24-27) too takes O() time. The outermost for-loop (Lies 3-14) of Algorithm 3 rus exactly k times where k is the size of the iput alphabet ad the outer do-while-loop (Lies 2-15) of our algorithm ca ru for at most times sice the miimal DFA does othavemoretha states. Therefore, the expected time complexity of our algorithm is O(k ). Sice we use O( ) processors, the cost is O(k2 ) which is the cost optimal parallel adaptatio of the O(k 2 ) sequetial method. 5Coclusio I this paper, we have cosidered a well-kow problem from classical automata theory ad have preseted a parallel algorithm for the problem. We have essetially adapted the aive O(k 2 ) sequetial algorithm ad have show that our

A Parallel DFA Miimizatio Algorithm 39 algorithm requires O( ) timeusigo( ) processors. Thus it is a cost optimal parallelizatio o a arbitrary-crcw PRAM. Fially, it will be of immese theoretical ad practical importace to come up with impossibility results about the limited parallelizability of the sequetial DFA miimizatio algorithms. // Iitialize the PRESENT[1..K] array 1. for i =1to do i parallel 2. for j =(i 1) K +1to i K do 3. Let x be the value to which the label of q j hashes 4. PRESENT[x] =1 5. edfor 6. edfor // Compute umber of 1 s i each processor s rage 7. for i =1to do i parallel 8. a i =0 9. for j=(i 1) K +1to i K do 10. a i = a i + PRESENT[j] 11. edfor 12. edfor //Compute partial sums 13. s 0 =0 14. Compute s i = i k=1 a i for 1 i usig prefix sum //Compute ew block umbers 15. for i =1to do i parallel 16. a i = s i 1 17. for j =(i 1) K +1to i K do 18. a i = a i + PRESENT[j] 19. if PRESENT[j] =1the 20. ew block o[j] = a i 21. edfor 22. edfor // Update the block o array with the ew block umbers 23. for i =1to do i parallel 24. for j =(i 1) to i 1 do 25. Let x be the value to which the label of q j hashes 26. block o[q j ]=ewblock o[x] 27. edfor 28. edfor Algorithm 4: Reducig the rage of block umbers

40 Ambuj Tewari et al. Refereces [1] Aho A. V.,Hopcroft J. E. ad Ullma J. D.: The desig ad aalysis of computer algorithms. Addiso-Wesley,Readig,Massachusetts (1974) 35, 36 [2] Blum N.: A O( ) implemetatio of the stadard method of miimizig -state fiite automata. Iformatio Processig Letters 57 (1996) 65-69 35 [3] Cho S. ad Huyh D. T.: The parallel complexity of coarsest set partitio problems. Iformatio Processig Letters 42 (1992) 89-94 35, 36 [4] Huffma D. A.: The Sythesis of Sequetial Switchig Circuits. Joural of Frakli Istitute 257 (1954) 161-190 34 [5] Hopcroft J. E.: A algorithm for miimizig states i a fiite automata. Theory of Machies ad Computatio,Academic Press (1971) 189-196 35 [6] Hopcroft J. E. ad Ullma J. D.: Itroductio to automata theory,laguages,ad computatio. Addiso-Wesley,Readig,Massachusetts (1979) 34 [7] Jaja J. ad Kosaraju S. R.: Parallel algorithms for plaar graph isomorphism ad related problems. IEEE Trasactios o Circuits ad Systems 35 (1988) 304-311 35 [8] Jaja J. ad Ryu K. W.: A Efficiet Parallel Algorithm for the Sigle Fuctio Coarsest Partitio Problem. Theoretical Computer Sciece 129 (1994) 293-307 35 [9] Matias Y. ad Vishki U.: O parallel hashig ad iteger sortig. Joural of Algorithms 4 (1991) 573 606 37, 38 [10] Moore E. F.: Gedake-experimets o sequetial circuits. Automata Studies, Priceto Uiversity Press (1956) 129-153 34 [11] Paige R.,Tarja R. E. ad Boic R.: A liear time solutio to the sigle fuctio coarsest partitio problem. Theoretical Computer Sciece 40 (1985) 67-84 36 [12] Ravikumar B. ad Xiog X.: A parallel algorithm for miimizatio of fiite automata. Proceedigs of the 10th Iteratioal Parallel Processig Symposium, Hoululu,Hawaii (1996) 187-191 35 [13] Srikat Y. N.: A parallel algorithm for the miimizatio of fiite state automata. Iteratioal Joural Computer Math. 32 (1990) 1-11 35 [14] Vardi M.: Notraditioal applicatios of automata theory. Lecture Notes i Computer Sciece,Spriger-Verlag 789 (1994) 575-597 34