Implementation of Hopcroft's Algorithm

Similar documents
Implementation of Hopcroft s Algorithm

16 Greedy Algorithms

Heap-on-Top Priority Queues. March Abstract. We introduce the heap-on-top (hot) priority queue data structure that combines the

Verifying a Border Array in Linear Time

arxiv: v3 [cs.ds] 18 Apr 2011

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

II (Sorting and) Order Statistics

Efficient validation and construction of border arrays

Equivalence of NTMs and TMs

A technique for adding range restrictions to. August 30, Abstract. In a generalized searching problem, a set S of n colored geometric objects

PLANAR GRAPH BIPARTIZATION IN LINEAR TIME

Limited Automata and Unary Languages

Multiple-choice (35 pt.)

Lecture 5: Suffix Trees

2. Sets. 2.1&2.2: Sets and Subsets. Combining Sets. c Dr Oksana Shatalov, Fall

.Math 0450 Honors intro to analysis Spring, 2009 Notes #4 corrected (as of Monday evening, 1/12) some changes on page 6, as in .

EDAA40 At home exercises 1

Slides for Faculty Oxford University Press All rights reserved.

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

[13] D. Karger, \Using randomized sparsication to approximate minimum cuts" Proc. 5th Annual

AXIOMS FOR THE INTEGERS

time using O( n log n ) processors on the EREW PRAM. Thus, our algorithm improves on the previous results, either in time complexity or in the model o

We assume uniform hashing (UH):

Treewidth and graph minors

Matching Theory. Figure 1: Is this graph bipartite?

CHENNAI MATHEMATICAL INSTITUTE M.Sc. / Ph.D. Programme in Computer Science

Discrete mathematics , Fall Instructor: prof. János Pach

9.5 Equivalence Relations

Algorithms and Theory of Computation. Lecture 5: Minimum Spanning Tree

Algorithms and Theory of Computation. Lecture 5: Minimum Spanning Tree

Localization in Graphs. Richardson, TX Azriel Rosenfeld. Center for Automation Research. College Park, MD

Lecture 6: Faces, Facets

T. Background material: Topology

V Advanced Data Structures

Binary Heaps in Dynamic Arrays

Algorithms, Probability, and Computing Special Assignment 1 HS17

Generell Topologi. Richard Williamson. May 6, 2013

Fast algorithm for generating ascending compositions

A Simpler Proof Of The Average Case Complexity Of Union-Find. With Path Compression

V Advanced Data Structures

Proposition 1. The edges of an even graph can be split (partitioned) into cycles, no two of which have an edge in common.

W4231: Analysis of Algorithms

Optimum Alphabetic Binary Trees T. C. Hu and J. D. Morgenthaler Department of Computer Science and Engineering, School of Engineering, University of C

2.2 Set Operations. Introduction DEFINITION 1. EXAMPLE 1 The union of the sets {1, 3, 5} and {1, 2, 3} is the set {1, 2, 3, 5}; that is, EXAMPLE 2

ALGORITHMS EXAMINATION Department of Computer Science New York University December 17, 2007

SORT INFERENCE \coregular" signatures, they derive an algorithm for computing a most general typing for expressions e which is only slightly more comp

CS2 Language Processing note 3

Introduction to Automata Theory. BİL405 - Automata Theory and Formal Languages 1

On some problems in graph theory: from colourings to vertex identication

Finite-State Transducers in Language and Speech Processing

The Global Standard for Mobility (GSM) (see, e.g., [6], [4], [5]) yields a

DIVIDE AND CONQUER ALGORITHMS ANALYSIS WITH RECURRENCE EQUATIONS

A Linear Time Algorithm for the Minimum Spanning Tree Problem. on a Planar Graph. Tomomi MATSUI. (January 1994 )

22 Elementary Graph Algorithms. There are two standard ways to represent a

SORTING AND SELECTION

Restricted Delivery Problems on a Network. December 17, Abstract

6. Asymptotics: The Big-O and Other Notations

4&5 Binary Operations and Relations. The Integers. (part I)

A Nim game played on graphs II

Lecture 15: The subspace topology, Closed sets

CPSC 536N: Randomized Algorithms Term 2. Lecture 10

CS 6783 (Applied Algorithms) Lecture 5

What is a minimal spanning tree (MST) and how to find one

Notes on Binary Dumbbell Trees

Disjoint Sets. (Disjoint Sets) Data Structures and Programming Spring / 50

(Refer Slide Time: 0:19)

REPORT A SEPARATOR THEOREM FOR PLANAR GRAPHS

Generating edge covers of path graphs

NP-Completeness of 3SAT, 1-IN-3SAT and MAX 2SAT

Parallel and Sequential Data Structures and Algorithms Lecture (Spring 2012) Lecture 16 Treaps; Augmented BSTs

implementing the breadth-first search algorithm implementing the depth-first search algorithm

O(n): printing a list of n items to the screen, looking at each item once.

Regular Languages. MACM 300 Formal Languages and Automata. Formal Languages: Recap. Regular Languages

st-orientations September 29, 2005

Algorithms for Euclidean TSP

Matching Algorithms. Proof. If a bipartite graph has a perfect matching, then it is easy to see that the right hand side is a necessary condition.

Theorem 2.9: nearest addition algorithm

To illustrate what is intended the following are three write ups by students. Diagonalization

TOPOLOGY, DR. BLOCK, FALL 2015, NOTES, PART 3.

TAFL 1 (ECS-403) Unit- V. 5.1 Turing Machine. 5.2 TM as computer of Integer Function

3 No-Wait Job Shops with Variable Processing Times

Byzantine Consensus in Directed Graphs

Ensures that no such path is more than twice as long as any other, so that the tree is approximately balanced

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

CS525 Winter 2012 \ Class Assignment #2 Preparation

Sparse Hypercube 3-Spanners

Lex-BFS and partition renement, with applications to transitive orientation, interval graph recognition and consecutive ones testing

An Ecient Approximation Algorithm for the. File Redistribution Scheduling Problem in. Fully Connected Networks. Abstract

Calculating the Hausdorff Distance Between Curves

The strong chromatic number of a graph

ONE-STACK AUTOMATA AS ACCEPTORS OF CONTEXT-FREE LANGUAGES *

lec3:nondeterministic finite state automata

Optimization I : Brute force and Greedy strategy

Problem Set 4 Solutions

4. Suffix Trees and Arrays

Part 2: Balanced Trees

Finding a winning strategy in variations of Kayles

Today. Types of graphs. Complete Graphs. Trees. Hypercubes.

Introduction to Sets and Logic (MATH 1190)

Priority Queue: Heap Structures

Transcription:

Implementation of Hopcroft's Algorithm Hang Zhou 19 December 2009 Abstract Minimization of a deterministic nite automaton(dfa) is a well-studied problem of formal language. An ecient algorithm for this problem is proposed by Hopcroft in 1970 and we have already learned this algorithm in the course. The goal of this writing is to discuss how this algorithm is carried out in time complexity O(MNlog 2N)(where M is the size of alphabet and N is the number of states). The writing is in three sections: rst, we review the original algorithm; second, we provide corresponding data structures for carrying out the algorithm eciently; and at last we prove that the algorithm is indeed in time O(MNlog 2N) using these data structures. 1 Introduction: Let A be a deterministic nite automaton (Q, A, E, I, F ), where Q is the nite set of states, A the alphabet, E the set of transition edges, I and F the sets of initial and nal states respectively. Let M be the size of alphabet and N be the number of states. For convenience, let A be the set of integers from 1 to M. The problem of minimization is to nd an automaton A with the minimum number of states which is equivalent to A. Denition 1. Let P be a partition of Q. The number of classes in P is bounded by N. Hence every class of P can be entitled with a unique name between 1 and N. From the denition, for every pair (C, a), where C is a class and a is a letter, it corresponds to a unique element in {1,.., N} {1,.., M}. Denition 2. Let A be a deterministic nite automaton (Q, A, E, I, F ). Let a A and B, C Q. A pair (C, a) is said to split B, if B a C and B a C with B a = {x a x B}. In this case, B is split by (C, a) into two parts: B = {x B x a C} and B = {x B x a / C}. 1

Recall Hopcroft's algorithm in our course: Hopcroft's-Minimization-1st-Version() 1 P {F, F c } 2 S 3 for a A 4 do Insert (min(f, F c ), a) to S 5 while S 6 do Delete (C, a) from S 7 Decide subset T of P with element split by (C, a) 8 for each B T 9 do B, B Split(B, C, a) 10 Replace B by B and B in P 11 for b A 12 do if (B, b) S 13 then Replace (B, b) by (B, b) and (B, b) in S 14 else Insert (min(b, B ), b) to S 2 Data structure In this section we provide several data structures for carrying out Hopcroft's algorithm eciently. Let Inverse = a 1 C, which equals to {x a x C}. We will show in the next section that the execution time of the while loop is O( Inverse ). In order to achieve the eect of T, we start by calculating the approximate subset T of P, which contains element B such that B Inverse. It is easier to obtain T after scanning all elements in C. It is in O(1) time to check whether an element in T is also in T. So the previous algorithm can be rewritten as following: Hopcroft's-Minimization-2nd-Version() 1 P {F, F c } 2 S 3 for a A 4 do Insert (min(f, F c ), a) to S 5 while S 6 do Delete (C, a) from S 7 Inverse a 1 C 8 Decide subset T of P with element B such that B Inverse 9 for each B T 10 do if B a C 11 then B, B Split(B, C, a) 12 Replace B by B and B in P 2

13 for b A 14 do if (B, b) S 15 then Replace (B, b) by (B, b) and (B, b) in S 16 else Insert (min(b, B ), b) to S Here a partition P will be characterized by four arrays Class, Part, Card, Place: Class[x] : the name of the class which contains state x; Part [i] : a pointer to a doubly linked list with all states in class named i; Card [i] : the size of class named i; Place[x] : a pointer to the position of state x in the linked list Part[p], where p=class[x]. The structures above enable us to move a state from one class to another in O(1) time. An integer variable Counter maintains the number of classes in P. It is originally set to 2 (or 1, if F = Q), and increases during the execution of the program. In order to calculate Inverse = a 1 C, we use an array Inv indexed by {1,..., N} {1,..., M} such that Inv[x, a] is a pointer to the linked list of state y satisfying y a = x. In this way, Inverse need not be constructed physically, but only be read in time proportional to its size when we traverse the list Part[Class[C]] and, for each state y in this list, traverse Inv[y, a]. So the expression "for all x Inverse" is carried out as : "for all y C and all x Inv[y, a]". The list S should eciently support these operations: insert, search, and delete a "random" element. Noting that the size of S is bounded by MN, we will use a linked list, together with a boolean array InList, which is indexed by {1,..., N} {1,..., M}(InList[C,a] is true if (C, a) S). In this way, insert operation is done in O(1): insert this element into linked list and update its corresponding value in InList; search operation is done in O(1) thanks to InList; delete operation is done in O(1): delete the head element from the linked list and update its corresponding value in InList; at last, to test whether S is also done in O(1) because of the linked list structure. In order to obtain T, we use the following three date structures: Involved : a linked list of the names of classes in T ; Size : an integer array, such that for each class B with name i, Size[i] is the cardinality of B a 1 C. This array should be cleared to 0 every time the while loop is executed; Twin : an integer array, such that Twin[i] is the name of the new class created while splitting the class named i. This array should also be initialized every time the while loop is executed. 3

3 Analysis of time complexity We now analyze the time complexity of Hopcroft's algorithm of the 2 nd version above. Here we are only concerned with the time complexity, while not caring about the space complexity. Lemma 3. The number of classes created during the execution is bounded by 2N 1. In addition, the number of iterations of the while loop is bounded by 2MN. Proof. Consider a binary tree which corresponds to the split operations such that each node is a subset of Q. The root of the tree is the whole set Q. It has two children, and their union equals to Q. Each non-leaf node P in this tree has two children P and P, which is the result of a split operation applied to P. Such a binary tree has at most N leaves, so at most 2N 1 nodes, hence we prove the rst half of the lemma. In order to bound the number of iterations of the while loop, it suces to prove that the number of pairs (C, a) inserted to S is bounded by 2MN, because in each iteration we delete a pair from S. Consider some pair (C, a) inserted to S. Here C is an element of current partition and corresponds to some node in the binary tree above, so the choice of C has at most 2N 1 dierent possibilities; and because the choice of a has M dierent possibilities, we conclude that the number of pairs (C, a) is bounded by 2MN. The initialization before the while loop is in time O(MN). The number of iterations of the while loop is bounded by 2MN (by Lemma 3), so the global execution time of line 6 is O(MN). And the number of classes created is bounded by 2N 1 (also by Lemma 3), so the global execution time of line 13 to line 16 is O(MN). It only rests to evaluate the global execution time of line 7 to line 12. In order to do this we take two steps: rst, evaluate the sum of Inverse in all execution of the while loop (this sum is noted as Inverse from now on), and then prove that the global execution time from line 7 to line 12 is O( Inverse ). Proposition 4. Let a A, p Q being xed. The number of times that we Delete (C, a) from list S, satisfying p C, is bounded by log 2 N. Proof. We call a pair (C, a) to be marked if p C. Before the while loop, S contains at most one marked pair (this condition holds for all instances of a and p). Let (C, a) be the rst marked pair which is deleted from S. If there exists a second marked pair (C, a), then before Delete(C, a), there must be some pair (C, a) which is generated by k Replace operations starting from pair (C, a) (k 0), such that (C, a) is a result of Split (C, a). So Card(C ) Card(C) and Card(C ) Card(C )/2. Now we take (C, a) as (C, a) and repeat in 4

the same way until no such Delete operation occurs. So the number of such Delete operations is not more than log 2 n for a xed pair of p and a. Corollary 5. The sum of Inverse in all execution of the while loop is bounded by MNlog 2 N. Proof. Let (x, a, y) be a xed triplet satisfying y = x a. The number of times that x is read in the list Inverse satisfying y = x a is exactly the same as the number of pairs (C, a) deleted from the list S, such that y C. From Proposition 4, this number is bounded by log 2 N, hence we prove the corollary. In order to show that the global execution time of line 7 to line 12 is O( Inverse ), we carry out this part of algorithm in the following way, by traversing the linked list Inverse twice: Hopcroft's-Minimization-2nd-version-Bottleneck() 1 create an empty list Involved 2 for all y C and all x Inv[y, a] 3 do i Class[x] 4 if Size[i] = 0 5 then Size[i] 1 6 insert i to Involved 7 else Size[i] Size[i] + 1 8 for all y C and all x Inv[y, a] 9 do i Class[x] 10 if Size[i] < Card[i] 11 then if Twin[i] = 0 12 then Counter Counter + 1 13 Twin[i] Counter 14 delete x from class i and insert x into class Twin[i] 15 16 for all j Involved 17 do Size[j] 0 18 Twin[j] 0 From above, it is clear that the execution time of this part in a single while loop is O( Inverse ), so the global execution time of this part is O( Inverse ). Theorem 6. Let A be an alphabet of M letters and A be a DFA on this alphabet with N states, then Hopcroft's algorithm using the data structures in the previous section is in time O(MNlog 2 N) in the worst case. Remark: This complexity is also tight, which is proved in [2] in detail. 5

References [1] D. Beauquier, J. Berstel, and P. Chrétienne. Éléments d'algorithmique. Masson, 1992. [2] J. Berstel and O. Carton. On the complexity of hopcroft's state minimization algorithm. CIAA, 2004. [3] A. Aho, J. Hopcroft, and J. Ullman. The Design and Analysis of Computer Algorithms. Addison-Wesley, 1974. 6