An Effective Grammar-Based Compression Algorithm for Tree Structured Data

Size: px
Start display at page:

Download "An Effective Grammar-Based Compression Algorithm for Tree Structured Data"

Transcription

1 An Effective Grammar-Based Compression Algorithm for Tree Structured Data Kazunori Yamagata 1, Tomoyuki Uchida 1, Takayoshi Shoudai 2, and Yasuaki Nakamura 1 1 Faculty of Information Sciences, Hiroshima City University, Hiroshima , Japan {k yamagata@toc.cs,uchida@cs,nakamura@cs}.hiroshima-cu.ac.jp 2 Department of Informatics, Kyushu University, Kasuga , Japan shoudai@i.kyushu-u.ac.jp Abstract. Many semistructured data such as HTML/XML files are represented by rooted trees t such that all children of each internal vertex of t are ordered and all edges of t have labels. Such data is called tree structured data. Analyzing large tree structured data is a time-consuming process in data mining. If we can reduce the size of input data without loss of information, we can speed up such a heavy process. In this paper, we consider a problem of effective compression of an ordered rooted tree, which represents given tree structured data, without loss of information. Firstly, in order to define this problem in a grammar-based compression scheme, we present a variable replacement grammar (VRG for short) over ordered rooted trees. The grammar-based compression problem for an ordered rooted tree T is defined as a problem of finding a VRG which generates only T and whose size is minimum. For the grammar-based compression problem for an ordered rooted tree, we show that there is no polynomial time algorithm with approximation ratio less than unless P=NP. Secondly, based on this theoretical result, we present an effective compression algorithm for finding a VRG which generates only a given ordered rooted tree and whose size is as small as possible. Finally, in order to evaluate the performance of our grammar-based compression algorithm, we report some experimental results. 1 Introduction Background: Due to rapid growth of Information Technologies, semistructured data such as HTML/XML files have been rapidly increasing and each of them has become larger. Semistructured data having tree structures are called tree structured data and are represented by rooted trees t such that all children of each internal vertex of t are ordered and all edges of t have labels. In general, analyzing large tree structured data is a time-consuming process in data mining. If we can reduce the size of input data without loss of information including structural features, we can speed up such a heavy process. In this paper, we consider a problem

2 2 K. Yamagata et al. of effective compression of an ordered rooted tree, which represents given tree structured data, without loss of information. We must compress a given ordered rooted tree T so that we exclude the loss of structural features which T has. Hence, we cannot apply lossless compression algorithms for strings to tree structured data. The aim of this paper is to give a grammar-based compression scheme for an ordered rooted tree and to present an effective algorithm for compressing a given ordered rooted tree without loss of information in the constructed grammar-based compression scheme. Data Model: As our data model for tree structured data, we use a variant of Object Exchange Model (OEM, for short) presented by Abiteboul et al.[1] as follows. An object o consists of an identifier, a link and a value, which are denoted by &o, link(&o) and val(&o), respectively. The identifier &o uniquely identifies the object o. The link link(&o) is a list (&o 1, &o 2,..., &o p ) of the identifiers of all subobjects o i (i = 1, 2,..., p), where p > 0. The value(&o) is either a string such as a tag in HTML/XML files, or a text such as a text written in the field of PCDATA in HTML/XML files. Tree structured data is represented by an ordered rooted tree with edge labels as follows. Each vertex represents an object identifier &o. An edge (&o, &o i ) represents a reference &o i in link(&o) and has the value val(&o i ). For any object identifier &o with link(&o) = (&o 1, &o 2,..., &o p ), the children &o 1, &o 2,..., &o p of the vertex &o are ordered in this order. For example, in Fig. 1, the ordered tree T represents the structure which Sample html has. Main Results: In this paper, we consider a problem of effective compression of an ordered rooted tree, which represents given tree structured data, without loss of information. Firstly, in order to define such data compression problem for an ordered rooted tree in the grammar-based compression schema, we present a term tree consisting of tree structures and structured variables, and present a Variable Replacement Grammar (VRG for short) over ordered rooted trees which is based on Hyperedge Replacement Grammar (HRG for short, see [6]). A graph transformation of VRG is defined as a mechanism of replacing a variable by an ordered rooted tree. In Fig. 1, as examples of a term tree and a graph transformation of VRG, we give the term tree t and the ordered rooted tree g such that T is obtained from the term tree t and the tree g by replacing all variables labeled with x by g. The grammar-based compression problem for an ordered rooted tree T is defined as a problem of finding a VRG which generates only T and whose size is minimum. We can regard this grammar-based compression problem as an optimization problem for minimizing the size of a VRG which generates only T. Secondly, for the grammar-based compression problem for an ordered rooted tree, we show that there is no polynomial time algorithm with approximation ratio less than unless P=NP. This result shows that approximating the size of the minimum VRG to within a small constant factor is NP-hard. Next, based on this theoretical result, we present an effective grammar-based compression algorithm for finding a VRG which generates only a given ordered rooted tree whose size is as small as possible. This algorithm is based on a greedy approach

3 Title Suppressed Due to Excessive Length 3 table tr td font Text 1-A /font /td td font Text 1-B /font /td /tr tr td font Text 2-A /font /td td font Text 2-B /font /td /tr tr td font Text 3-A /font /td td font Text 3-B /font /td /tr tr td font Text 4-A /font /td td font Text 4-B /font /td /tr /table t T g Sample html Fig. 1. An HTML document Sample html, the ordered rooted tree T which is a data model of Sample html, a term tree t and an ordered rooted tree g. A variable is represented by a box with lines to its elements. The label of a box is the label of the variable. The number in the left side of a vertex denotes the ordering on its siblings. of replacing isomorphic subtrees t, which are not overlap in a given ordered rooted tree, by the same variable in order of increasing the size of t. Next, by improving the algorithm given by Asai et al. [3], we present an efficient algorithm for finding all candidate subtrees s of a given ordered rooted tree T such that s can be replaced by a variable. This algorithm is a pre-processing of our grammarbased compression algorithm. Finally, in order to evaluate the performance of our grammar-based compression algorithm, we report some experimental results of comparing our algorithm with other two algorithms. One is based on a greedy approach of the order of decreasing the size of a candidate subtree which can be replaced by a variable. The other is based on Minimum Description Length (MDL for short) heuristic such as SUBDUE [5]. Experimental results show the effectiveness of our algorithm. Related Works: For a string, several grammar-based compression algorithms have been proposed [4, 8, 9, 12, 13]. Such algorithms are based on the idea of representing a string by

4 4 K. Yamagata et al. a context-free grammar (see [8, 12]). Especially, based on a grammar-based compression scheme, Charikar et al. [4] presented an O(log(n/g )) approximation algorithm and Sakamoto [13] proposed a linear-time approximation algorithm which guarantees O(log 2 n) approximation ratio where n is the length of an input string and g is the size of the smallest grammar. On the other hand, for semistructured data, there are few researches for a grammar-based compression. Hence, we need to define a new grammar-based compression scheme for an ordered rooted tree which is based on HRG (see [6]). For semistructured data which can be represented by a general graph, Cook [5] presented a practical data compression algorithm based on MDL heuristic which is not a grammar-based compression algorithm. For semistructured data with geometric information, we presented an effective compression algorithm in [7] by introducing notions of a layout term graph in [15] and a substitution in logic programming. This compression scheme presented in [7] is regarded as a preliminary version of the grammar-based compression scheme presented in this paper. In the fields of data mining and knowledge discovery, there are increasing demands for effective methods for extracting information from large semistructured data. Several effective algorithms for finding frequent substructures among large tree structured data have been proposed [3, 16]. In [11], we presented an effective algorithm for extracting common structural features among ordered rooted trees. Moreover, in [10, 14], we discussed the learnabilities of tree patterns having tree structure, variables and ordered children from the viewpoint of machine learning. Organization: This paper is organized as follows. In Section 2, we introduce an ordered rooted term tree and define an admissible VRG which leads us to compress an ordered tree without loss of information. In Section 3, we define a problem of finding an admissible VRG whose size is minimum among admissible VRGs generating only an ordered rooted tree. Then, we present an effective greedy algorithm for solving this problem. In Section 4, in order to evaluate the performance of our algorithm, we report some experimental results of applying our algorithm to artificial large trees. 2 Preliminaries 2.1 Ordered Term Tree Let T = (V T, E T ) be an ordered rooted tree with a vertex set V T and an edge set E T. Let l 1 be an integer. A list h = (u 0, u 1,..., u l ) of vertices in V T is called a variable (or a hyperedge) of T if u 1,..., u l is a sequence of consecutive children of u 0, i.e., u 0 is the parent of u 1,..., u l and u j+1 is the next sibling of u j for j with any 1 j < l. Two variables h = (u 0, u 1,..., u l ) and h = (u 0, u 1,..., u l ) are said to be disjoint if {u 1,..., u l } {u 1,..., u l } =. Definition 1. Let T = (V T, E T ) be an ordered rooted tree and H T a set of pairwise disjoint variables of T. An ordered term tree obtained from T and H T is a

5 Title Suppressed Due to Excessive Length 5 triplet t = (V t, E t, H t ) where V t = V T, E t = E T h=(u 0,u 1,...,u l ) H T {{u 0, u i } E T 1 i l} and H t = H T. For two vertices u, u V t, we say that u is the parent of u in t if u is the parent of u in T. Similarly we say that u is a child of u in t if u is a child of u in T. In particular, for a vertex u V t with no child, we call u a leaf of t. We define the order of the children of each vertex u in t as the order of the children of u in T. We often omit the description of the ordered tree T and the variable set H T because we can find them from the triplet t = (V t, E t, H t ). Example 1. The ordered term tree t in Fig. 1 is obtained from the tree T = (V T, E T ) and the set H T, where V T = {v0, v1,..., v17}, E T = {{v0, v1}, {v1, v2}, {v2, v3}, {v1, v4}, {v4, v5}, {v1, v6}, {v6, v7}, {v1, v8}, {v8, v9}, {v1, v10}, {v10, v11}, {v1, v12}, {v12, v13}, {v1, v14}, {v14, v15}, {v1, v16}, {v16, v17}} and H T = {(v1, v2, v4), (v1, v6, v8), (v1, v10, v12), (v1, v14, v16)}. For any ordered term tree t, a vertex u of t, and two children u and u of u, we write u < t u u if u is smaller than u in the order of the children of u. For a set or a list D, the number of elements in D is denoted by D. We assume that every edge and variable of an ordered term tree is labeled with some words from specified languages. Let Λ and X be finite alphabets such that Λ X =. An element of Λ is called a edge label. An element of X is called a variable label and has the rank, denoted by rank(x), that is a nonnegative integer. A variable h has a label x such that rank(x) = h. A term tree t is called a term tree over Λ, X if every edges and every variables of t are labeled by elements in Λ and X, respectively. If Λ and X need not to be specified, we often omit them. Note. In this paper, we treat only ordered rooted term trees, and then we call an ordered rooted term tree a term tree, simply. In particular, a term tree with no variable is called a ground term tree (or simply a tree) and considered to be a tree with ordered children. For a term tree t and its vertices v 1 and v i, a path from v 1 to v i is a sequence v 1, v 2,..., v i of distinct vertices of t such that for any j with any 1 j < i, v j is the parent of v j+1. Let t = (V t, E t, H t ) be a term tree. For subsets V f V t, E f E t and H f H t, if f = (V f, E f, H f ) is a term tree then f is said to be a term subtree of t. For two term subtrees f = (V f, E f, H f ) and g = (V g, E g, H g ) of t, we say that f and g are overlap in t if ((E f E g ) (H f H g )), V f V g and V g V f. Let f and g be term trees over Λ, X each of which has at least two vertices. Let h = (v 0, v 1,..., v l ) be a variable in f and σ = (u 0, u 1,..., u l ) a list of l+1 distinct vertices in g such that u 0 is the root of g and u 1,..., u l are leaves of g. The pair [g, σ] of g and σ is called an (l + 1)-hypertree over Λ, X. If l, Λ and X need not to be specified, we often omit them. The form h [g, σ] is called a variable replacement for h. A new term tree f = f{h [g, σ]} is obtained by applying the variable replacement h [g, σ] to f in the following way. For the variable h = (v 0, v 1,..., v l ), we attach g to f by removing the variable h from H f and by identifying the vertices v 0, v 1,..., v l with the vertices u 0, u 1,..., u l of g in this order. We define a new ordering < f v on every vertex v in f in the following natural way. Suppose that v has more than one child and let

6 6 K. Yamagata et al. f g f Fig. 2. The new ordering on vertices in the term tree f = f{h [g, (u0, u1, u2, u3)]} where h = (v0, v1, v2, v3). v and v be two children of v in f. We note that v i = u i for any 0 i l. (1) If v, v, v V g and v < g v v, then v < f v v. (2) If v, v, v V f and v < f v v, then v < f v v. (3) If v = v 0 (= u 0 ), v V f {v 1,..., v l }, v V g, and v < f v v 1, then v < f v v. (4) If v = v 0 (= u 0 ), v V f {v 1,..., v l }, v V g, and v l < f v v, then v < f v v. In Fig. 2, we give an example of the new ordering on vertices in a term tree. 2.2 Admissible Variable Replacement Grammar Next, we define formally an admissible Variable Replacement Grammar, which generates only one tree, based on a HRG (see [6]). Let Λ and X be finite alphabets with Λ X =. Definition 2. A Variable Replacement Grammar (VRG for short) G = (S, R) over Λ, X is defined as follows: (1) S is a variable label in X with rank(s) = 0 and is called the start variable label. (2) R is a finite set of productions of the form x [g, σ], where x is a variable label in X with rank(x) = l and [g, σ] is an l-hypertree over Λ, X. Let G = (S, R) be a VRG. For a variable label x X, an l-hypertree [g, σ] and an integer i 1, we define the relation x i G [g, σ] inductively as follows. (1) We denote x 1 G [g, σ] if there is a production x [g, σ] in R. (2) For i 2, we denote x i G [g, σ] if there are j, m 1, an l-hypertree [f, σ] and a variable h of rank k with label y in f such that j + m = i, x j G [f, σ], y m G [d, σ ], and g = f(h [d, σ ]).

7 Title Suppressed Due to Excessive Length 7 We write x + G [g, σ] if x i G [g, σ] for some i 1. The graph language generated by a VRG G = (S, R) is the set L(G) = {T T is a tree and S + G [T, ()]}. Let G = (S, R) be a VRG and T a tree. Then, G is said to be admissible if L(G) = {T }. For a given tree T, an admissible VRG G generating only T leads us to compress T without loss of information, if the size of G is less than the size of T. Example 2. Let G = (S, R) be the VRG where R = {S [t 1, ()], x [t 2, (u1, u2)], y [t 3, (v1, v2)]}, and t 1, t 2 and t 3 term trees in Fig. 3. Then, we can see that G is admissible and L(G) = {T }, where T is the tree in Fig. 3. T t 1 t 2 t 3 Fig. 3. A Tree T and term trees t 1, t 2, t 3. 3 Grammar-Based Compression for an Ordered Rooted Tree In this section, we consider a problem of finding an admissible VRG which generates only a given tree and whose size is minimum. Firstly, we formally define this problem and show the hardness of solving this problem. Secondly, for a given tree T, we present an algorithm Find Freq Trees for finding all candidate ground term subtrees of T which can be replaced by variables. Finally, by using

8 8 K. Yamagata et al. Find Freq Trees, we give an effective algorithm for finding an admissible VRG G which generates only a given tree and whose size is as small as possible. 3.1 Hardness of Grammar-Based Compression Problem for an Ordered Rooted Tree For a term tree t = (V t, E t, H t ), we define the size of t as t = V t + 2 E t + h. For a VRG G = (S, R), we define the size of G as G = ( g + h H t x [g,σ] R σ ). For a tree T and an admissible VRG G such that L(G) = {T }, we define a compression ratio ρ of T w.r.t G as ρ = G T 100. Example 3. The size of the tree T in Fig. 3 is T = = 64. The sizes of term trees t 1, t 2 and t 3 in Fig.3 are t 1 = (2 + 2) = 10, t 2 = 3 + (2 + 2) = 7 and t 3 = = 16, respectively. Then, the size of the admissible VRG G = (S, R) is G = (10 + 0) + (7 + 2) + (16 + 2) = 37, where R = {S [t 1, ()], x [t 2, (u1, u2)], y [t 3, (v1, v2)]}. Therefore, the compression ratio ρ of T w.r.t. G is ρ = A grammar-based compression problem for a tree is defined as the following problem Find Min AVRG. Find Min AVRG Instance: A tree T. Problem: Find an admissible VRG G such that L(G) = {T } and for any admissible VRG G with L(G ) = {T }, G G. This problem is regarded as an optimization problem for minimizing the size of an admissible VRG which generates only a given tree. Then, we can prove the following theorem by a reduction from restricted form of VERTEX COVER in a similar way as the proof of Theorem 3.1 in [9]. Theorem 1. There is no polynomial time algorithm for solving Find Min AVRG with approximation ratio less than 8593 unless P=NP This theorem shows the hardness of solving Find Min AVRG. That is, this result indicates that approximating the size of the minimum VRG to within a small constant factor is NP-hard. Based on this theoretical result, in next section, we will present an effective compression algorithm for finding an admissible VRG which generates only a given tree and whose size is as small as possible 3.2 Algorithm of Finding All Frequent Ground Term Subtrees Let T = (V, E, ) be a tree and t = (V t, E t, ) a ground term subtree of T. From the definitions of a variable and a variable replacement, if there exist a path p

9 Title Suppressed Due to Excessive Length 9 in T from a vertex v V t to a vertex u V V t such that v is not the root or a leaf of t and p does not contain any leaf of t, or if for two children w 1 and w 2 of the root r of t, there is a vertex w V V t such that w 1 < T r w and w < T r w 2, then we can not replace t by a variable even if t is frequent in T. Under this constraint, by improving the algorithm given by Asai et al.[3] which finds all frequent ground term subtrees for a given tree T, we present an algorithm Find Freq Trees for finding all candidate ground term subtrees in T which can be replaced by variables. A grammar-based compression algorithm for a tree, which is given later, uses Find Freq Trees as a pre-processing. Let T be a tree and v a vertex in T. The number of vertices in the path from the root of T to v is denoted by depth T (v). We assume that next T (v) returns the nearest right sibling, if any, of v in T. We define pa 0 T (v) = v and pai T (v) as the parent of pa i 1 T (v) for i 1. A tree T is said to be of normal form if T satisfies the following conditions. (1) The set of vertices of T is V T = {1,..., k}. (2) All elements in V T are numbered by preorder traversal [2] of T. We can easily see that if T is a tree with k vertices and is of normal form, then the root of T is 1 and the rightmost leaf of T is k. For a tree T of normal form having k vertices, we denote the rightmost leaf of T by rml(t ), that is, rml(t ) = k, and denote the vertex k 1 by prevrml(t ). The path from the root of T to rml(t ) is called the rightmost branch. For an integer k 1, a k-pattern is a tree T of normal form whose number of vertices is k. For every k 1, we denote the set of all k-patterns by T k and the set of all patterns by T = k T k. Let T = (V T, E T, ) and U = (V U, E U, ) be trees. Then, a matching function from T to U is any function π : V T V U that satisfies the following conditions (1)-(4) for any vertex v V T which is not the root or a leaf of T and any v 1, v 2 V T. (1) π is a one-to-one mapping. That is, if v 1 v 2 then π(v 1 ) π(v 2 ). (2) π preserves the parent-child relation. That is, {v 1, v 2 } E T if and only if {π(v 1 ), π(v 2 )} E U. Moreover, {v 1, v 2 } in E T and {π(v 1 ), π(v 2 )} in E U have a same edge label. (3) π preserves the sibling relation. That is, next T (v 1 ) = v 2 if and only if next U (π(v 1 )) = π(v 2 ). (4) All children of π(v) in U are included in the set {π(u) u V T } V U. If V T = V U, a matching function from T to U can be regarded as an isomorphism between T and U. Then, two trees T and U are said to be isomorphic if V T = V U and there exists a matching function from T to U. Next, a pseudo-matching function from T to U is any function π : V T V U that satisfies the above conditions (1)-(3) of the matching function π and the following condition (4 ). (4 ) For any internal vertex v V T which does not appear in the rightmost branch of T, all children of π (v) in U are included in the set {π (u) u V T } V U.

10 10 K. Yamagata et al. Let U be a tree. Given a k-pattern T T k and a matching function π from T to U, we define the rightmost occurrence (the rml-occurrence for short) and the rightmost occurrence list of T w.r.t. π to be π(k) and Roc U (T ) = {π(k) π is a matching function from T to U}, respectively. Similarly, given k-pattern T T k and a pseudo-matching function π from T to U, we define the pseudo rightmost occurrence (the pseudo-rml-occurrence for short) and the candidate rightmost occurrence list of T w.r.t. U to be π (k) and Roc U (T ) = {π (k) π is a pseudo-matching function from T to U}, respectively. Let r 2 be an integer which is called a occurrence count. T is said to be r-occurred for U if Roc U (T ) r and T is said to be r-pseudo-occurred for U if Roc U (T ) r. Then, we define the set of all r-occurred k-patterns in T k for U as F U,k,r = {T T T k, Roc U (T ) r}, and F U,r = k F U,k,r T. We define the set of all r-pseudo-occurred k-patterns in T k for U as F U,k,r = {T T T k, Roc U (T ) r} and F U,r = k F U,k,r T. Let Roc U,k,r = T F U,k,r {π(k) π is a matching function from T to U} and let Roc U,k,r = T F U,k,r{π (k) π is a pseudo-matching function from T to U}. Let U be a tree, T a tree of normal form, and Roc U (T ) = {π(rml(t )) π is a matching function from T to U}. From the definitions of a matching function and a pseudo-matching function, for a vertex v in Roc U (T ), we can identify the unique matching function π from T to U such that π(rml(t )) = v and the unique matching function π from T to U such that π (rml(t )) = v. For a tree T of normal form and a vertex v of U, a ground term subtree G = (V G, E G, ) of U is said to be identified by T and v if there exists an isomorphism π between T and G such that π(rml(t )) = v. Let T T k 1, 0 p < depth T (rml(t )) any integer, and l Λ any edge label. Then, the (p, l)-expansion of T is the tree S obtained from T by attaching a new vertex k to the vertex v such that the attacked vertex k is the rightmost child of v and the edge between k and v has the label l, where v = pa p T (rml(t )), that is, v is the p-th parent of the rightmost leaf of T. In Fig. 4, given a tree U and an occurrence count r as inputs, we present an efficient algorithm Find Freq Trees which outputs the set F U,r of all r-occurred patterns for U and the set of their rml-occurrences indexed by trees in F U,r w.r.t. U. In Fig. 5, we present a procedure Expand Trees used in Find Freq Trees. Given the set R(T ) calculated in line 4 of the procedure Expand Trees and the integer p as inputs, for every edge label l Λ and every (p, l)-expansion S of T, the procedure Scanning Sibling in line 5 of the procedure Expand Trees returns the candidate rightmost occurrence list N ewroc (that is, the set of all pseudo-rmloccurrences ) of S w.r.t. the tree U as follows. Initially, Scanning Sibling creates an empty set NewRoc. Next, for each v R(T ) and each l Λ, add the pair (l, u) to NewRoc if there exists (p, l)-expansion of T in U such that if p = 0 then u is the leftmost child of v in U, otherwise u is the vertex next U (pa p 1 U (v)). Then, the following theorem holds. Theorem 2. When a tree U and an occurrence count r 2 are given as inputs, the algorithm Find Freq Trees can construct correctly the set F U,r of all r-occurred patterns for U and the set T F U,r Roc U (T ) of rml-occurrences in-

11 Title Suppressed Due to Excessive Length 11 Algorithm Find Freq Trees Input: A tree U and an occurrence count r 2 Output: The set F U,r of all r-occurrence patterns for U and their rml-occurrence lists Roc = T F U,r Roc U (T ) 1. Compute F U,1,r, Roc U,1,r, F U,2,r, and Roc U,2,r from U in level-order traversal; 2. k := 3; 3. while F U,k 1,r do 4. F U,k 1,r, Roc U,k 1,r, F U,k,r, Roc U,k,r := Expand Trees(F U,k 1,r, Roc U,k 1,r, r); 5. k := k + 1; 6. end; 7. F U,r := F U,1,r F U,k 2,r ; /* F U,1,r = F U,1,r */ 8. Roc := Roc U,1,r Roc U,k 2,r ; /* Roc U,1,r = Roc U,1,r */ 9. return F U,r, Roc ; Fig. 4. Algorithm Find Freq Trees dexed by trees in F U,r w.r.t. U in O( V U + A 2 N + A F U,r Λ ) time where V U is the set of vertices in U, A is the maximum number of vertices of trees in the set F U,r of all r-pseudo-occurred patterns for U, and N = Σ T F Roc U,r U (T ). 3.3 Grammar-Based Compression Algorithm for an Ordered Rooted Tree Let U be a tree, T a tree of normal form, and Roc U (T ) = {π(rml(t )) π is a matching function from T to U}. Then, a subset R T Roc U (T ) is a valid subset of Roc U (T ) if for any two vertices u, v R T, t u and t v are not overlap in U, where t u is the ground term subtree identified by T and u and t v is the ground term subtree identified by T and v. Moreover, a valid subset R T of Roc U (T ) is maximal if for any subset R of Roc U (T ) such that R T R, R is not a valid subset of Roc U (T ). We can compute a maximal valid subset R T of Roc U (T ) by level-order traversal of U as follows. Let R T = {v 1 } and Roc U (T ) = {v 1,..., v n } such that for 1 i < j n, v i is found before v j by level-order traversal of U. For each i = 2,..., n, we add v i to R T if there exists no vertex u in R T such that t i and t are overlap in U, where t is the ground term subtree of U identified by T and u and t i is the ground term subtrees of U identified by T and v i. We remark that the above maximal valid subset R T of Roc U (T ) is not always best for compressing a given tree. In Fig. 6, when a tree U and an occurrence count r are given as inputs, we present a greedy algorithm Compress Tree for finding an admissible VRG which generates only T and is as small as possible. The algorithm Compress Tree is based on a greedy approach of replacing isomorphic term subtrees which are not overlap in a given tree by the same variable in order of increasing the size

12 12 K. Yamagata et al. Procedure Expand Trees Input: A set F old of patterns, A set Roc old of pseudo-rml-occurrences indexed by trees in F old and an occurrence count r 2. Output: A set F ixf of r-occurred patterns for U, a set F ixroc of their rml-occurrences indexed by trees in F ixf, a set F new of the rightmost expansions of trees in F old and a set Roc new of pseudo-rml-occurrences indexed by trees in F new w.r.t. U. 1. F new := ; F ixroc := Roc old; F ixf := F old, Roc new := ; 2. foreach tree T F old do 3. foreach 0 p < depth(rml(t )) do 4. R(T ) := {π (rml(t )) π (rml(t )) F ixroc, π is a pseudo-matching function from T to U}; 5. NewRoc := Scanning Sibling(R(T ), p); 6. foreach l Λ do 7. compute the (p, l)-expansion S of T ; 8. NewRoc(l):={v (l, v) NewRoc}; 9. if NewRoc(l) r then 10. F new:=f new {S}; 11. Roc new:=roc new {(S, v) v NewRoc(l)}; /* end of if */ 12. if p 0 and p depth(rml(t ) 1) then 13. while NewRoc(l) do 14. choose a vertex v in { NewRoc(l); 15. F ixroc:=f ixroc π (prevrml(s)) 16. NewRoc(l):=NewRoc(l) {v}; 17. end; /* end of if */ 18. end; 19. R(T ) := {π (rml(t )) π (rml(t )) F ixroc, π is a pseudo-matching function from T to U}; 20. if R(T ) < r then 21. F ixf :=F ixf {T }; 22. F ixroc:=f ixroc {v v R(T )}; 23. break; /* end of if */ 24. end; /* end of foreach-loop */ 25. end; /* end of foreach-loop */ 26. return F ixf, F ixroc, F new, Roc new ; } π is a pseudo-matching function from S to U and ; π (rml(s)) = v Fig. 5. Procedure Expand Trees

13 Title Suppressed Due to Excessive Length 13 of a replaced term subtree. In line 1 of Compress Tree, we find the set F of all r-occurred patterns for U and the set of their rml-occurrences indexed by trees in F w.r.t. U by using the algorithm Find Freq Trees. In the while-loop from line 4 to line 25, Compress Tree fixes on all ground term subtrees which are actually replaced by variables in the procedure Make Grammar of line 26. In line 14, we revise the set Roc by removing all vertices u in {π(rml(g)) Roc π is a matching function from G to U)} from Roc for each G F org such that the identified ground term subtree g u of U by G and u is satisfied the following condition. There exists a vertex v in vroc(t ) such that t v and g u are overlap in U, or there exists a vertex v in vroc(t ) {w} such that g u is a ground term subtree of t v, where w is the first rml-occurrence of T in levelorder traversal of U and t v is the identified ground term subtree of U by T and v. The procedure Make Grammar in the line 26 constructs an admissible VRG G by applying the following operations to U in increasing order of the size of T of (T, V List(T )) tmprules. Let Q = (V Q, E Q, H Q ) be a copy of U. We initialize R Q := and H Q :=. For (T, V List(T )) tmprules, H Q :=H Q {h π π(rml(t )) Roc(T ), (π, h π ) V List(T )} and R Q :=R Q {x [t T, σ]} where x is a new variable label, t T is the corresponding term subtree of Q to the identified ground term subtree by T and the first rml-occurrence in level-order traversal, and σ is the first list of V List(T ). Then, for each element (π, h π ) V List(T ) such that π(rml(t )) Roc(T ), we revise the term tree Q by deleting the corresponding term subtree of Q to the identified ground term subtree by T and π(rml(t )). Finally, the rule S [Q, ()] is added to R Q and the procedure Make Grammar outputs the admissible VRG G = (S, R Q ). Then, the following theorem holds. Theorem 3. When a tree U and an occurrence count r are given, the algorithm Compress Tree in Fig. 6 can produce correctly an admissible VRG G = (S, R) over Λ, X with L(G) = {U} in O( V U +A 2 N +A F U,r Λ +BMC) time, where V U is the vertex set of U, A is the maximum number of vertices of trees in the set F U,r of all r-pseudo occurred patterns for U, N = T F Roc U (T ), B is U,r the maximum number of vertices of trees in F U,r, M = T F U,r Roc U (T ), and C is the number of variable labels appeared in G. Proof. (Sketch) We can prove the correctness of this theorem from the following facts (1) and (2). (1) The admissible VRG G = (R, S) constructed by Compress Tree is deterministic. For any variable label x appeared in G, G has only one production p in R such that the variable label in the leftside of p is x. Therefore, we can see that L(G) = 0 or L(G) = 1. (2) U is in L(G), since any two term subtrees, which are replaced by varibles in Make Grammar, are not overlap in U. From (1) and (2), we can see that G is an admissible VRG with L(G) = {U}. From Theorem 2, line 1 can be executed in O( V U +A 2 N + A F U,r Λ ) time. Moreover, lines from 4 to 25 can be executed in O(BMC) time. Then, we can show the time complexity of Compress Tree.

14 14 K. Yamagata et al. Algorithm Compress Tree Input: A tree U and an integer r 2 Output: An admissible VRG G = (S, R) such that L(G) = {U} and a compression ratio ρ 1. F, Roc :=Find Freq Trees(U); 2. remove all trees consisting of one vertex or two vertices from F ; 3. tmprules:=, F org :=F and for each T F, tmpsize(t ):= T ; 4. while F do 5. let T be a smallest tree in F ; 6. F :=F {T }; 7. Roc(T ):={π(rml(t )) Roc π is a matching function from T to U}; 8. compute a maximal valid subset vroc(t ) of Roc(T ); 9. m:= vroc(t ) ; 10. fix on the integer k > 0 and π is a matching function from T to U such that π(rml(t )) = v, V List(T ):= (π, h v) h v is a variable which consists ; v vroc(t ) of k vertices of U and by which the term subtree identified by T and v can be replaced 11. fix on hypertree [T, σ] such that σ = k, by using V List(T ); 12. Size:=((m 1)tmpSize(T ) (2m + 1)k)); 13. if Size 1 then 14. Revise Roc by removing all useless vertices in Roc, using F org; 15. tmprules:=tmprules {(T, V List(T ))}; 16. foreach G{ F do } π is a matching function from G 17. R(G):= π(rml(g)) Roc ; to U 18. if R(G) 1 then F :=F {G}; F org :=F org {G}; 19. else 20. let w be a vertex in R(G); 21. let g w { be the identified ground term subtree of U by G and w; 22. n:= u vroc(t ) g } w has the identified ground term ; tree by T and u as a term subtree 23. tmpsize(g):=tmpsize(g) n(tmpsize(t ) 2k) /* end of if */ 24. end; /* end of if */ 25. end; 26. G:=Make Grammar(U, tmprules, Roc); 27. return G, G T 100 ; Fig. 6. Algorithm Compress Tree

15 Title Suppressed Due to Excessive Length 15 4 Implementation and Experimental Results In order to evaluate our grammar-based compression algorithm Compress Tree presented in previous section, we have implemented Compress Tree and two other algorithms Algorithm 1 and Algorithm 2. The algorithm Algorithm 1 is based on a greedy approach of replacing isomorphic term subtrees, which are not overlap in a given tree, by the same variable in order of decreasing the size of a replaced term subtree. That is, Algorithm 1 is the algorithm obtained from Compress Tree by changing line 5 of Compress Tree with the instruction, let T be a largest tree in F. The algorithm Algorithm 2 is based on an approach of replacing repeatedly isomorphic term subtrees, which are not overlap in a given tree T and gives us the best compression ratio, by a variable. Algorithm 2 is the algorithm by adding the instruction else break; under line 24 of Compress Tree and changing line 5 of Compress Tree with the following instruction INSTRUMENT. let T be a best tree among F with respect to the compression ratio obtained by replacing the term subtrees, which are isomorphic to T and are not overlap, by a variable. That is, Algorithm 2 is regarded as the algorithm SUBDUE in [5] based on a Minimum Description Length heuristic. We have evaluated our algorithm Compress Tree by comparing with two other algorithms Algorithm 1 and Algorithm 2 with respect to execution time and compression ratio of applying them to artificial large trees. The machine used in experiments is a PC with two 2.4GHz CPUs and 1.00GB main memory. We implemented a data generator to randomly produce an artificial large tree satisfying the following conditions. (1) The number of vertices is 20,000, 40,000, 60,000, 80,000 or 100,000. (2) The degree of each vertex is less than 3. (3) The number of edge labels is less than 2. For N {20, 000, 40, 000, 60, 000, 80, 000, 100, 000}, let D(N) be the set of 10 trees whose numbers of vertices are N and which are produced by the data generator. We tested the execution times and the compression ratios of Compression Tree, Algorithm 1 and Algorithm 2 under the circumstances of different datasets and the occurrence count 2. Fig. 7 (a) shows the relationship between the number of vertices and the execution times. We remark that each execution time does not contain the time of reading data as an input and is the average execution time for trees in a dataset. For example, Fig. 7 (a) indicates that the average execution time of Algorithm 1 for trees in D(60, 000) is about 300 seconds. From Fig. 7 (a), our algorithm Compress Tree is fastest among three algorithms. Fig. 7 (b) shows the relationship between the number of vertices and the compression ratios. Each compression ratio in Fig. 7 (b) is the average compression ratio for trees in a dataset. For example, from Fig. 7 (b), we can see that the average compression ratio of Compress Tree for trees in D(60, 000) is about 60%. From Fig. 7 (a) and (b), Compress Tree and Algorithm 2 have extremely better performance than Algorithm 1. Fig. 7 (c) shows the relationship between the number

16 16 K. Yamagata et al. (a) Execution Time vs Number of Vertices (b) Compression Ratio vs Number of Vertices (c) Number of Variables vs Number of Vertices (d) Number of Variable Labels vs Number of Vertices Fig. 7. Experiment 1 of comparing Compress Tree with Algorithm 1 and Algorithm 2 with respect to execution time and compression ratio under the circumstances of different datasets and the fixed occurrence count 2. of vertices in input data and the number of variables appeared in admissible VRG output by each algorithm. Moreover, Fig. 7 (d) the relationship between the number of vertices in input data and the average number of variable labels used in admissible VRG output by each algorithm. From Fig. 7 (c) and (d), although the number of variables appeared in admissible VRG produced by each algorithm is almost same, Algorithm 1 produced a admissible VRG which has extremely more variable labels in each dataset than other two algorithms. Moreover, in Fig. 7 (b), (c) and (d), we can see that Compress Tree and Algorithm 2 have almost same performance. This indicates that the order of chosen trees at INSTRUCTION in Algorithm 2 almost coincides with the order of chosen trees at line 5 of Compress Tree. From these reasons, we can see that our algorithm Compress Tree and the algorithm Algorithm 2 suit for lossless compression of a large tree, but the algorithm Algorithm 1 does not suit. we tested the execution times and the compression ratios of three algorithms for the dataset D(80, 000) by varying an occurrence count from 2 to 5. Fig. 8 shows the performances of three algorithms for different occurrence counts. We can obtain the similar results as the previous experiments from Fig. 8. From these experimental results, we can see that the algorithm Compress Tree suits for lossless compression of a large tree and have an advantage of execution time.

17 Title Suppressed Due to Excessive Length 17 (a) Execution Time vs Occurrence Count (b) Compression Ratio vs Occurrence Count (c) Number of Variables vs Occurrence Count (d) Number of Variable Labels vs Occurrence Count Fig. 8. Experiment 2 of comparing Compress Tree with Algorithm 1 and Algorithm 2 with respect to execution time and compression ratio under the circumstances of the dataset D(80, 000) and the different occurrence counts. 5 Conclusions We have considered the problem of effective compression of an ordered rooted tree without loss of information. We have presented an admissible VRG which generates only a given ordered rooted tree. Then, for an ordered rooted tree T, we have defined the grammar-based compression problem of finding an admissible VRG which generates only T and whose size is minimum. Moreover, we have shown the hardness of solving this problem by proving that there is no polynomial time algorithm with approximation ratio less than unless P=NP. Next, we have presented an effective algorithm for finding an admissible VRG G, which generates only given ordered rooted tree and which is as small as possible. In order to evaluate the performance of our algorithm, we have implemented our algorithm and other two algorithms. Then, we have shown the effectiveness of our algorithm by comparing them with respect to execution time and compression ratio in applying them to artificial large trees. From the viewpoint of computational complexity, we will analyze the approximation ratio of our algorithm, that is, the maximum ratio between the size of the generated admissible VRG and the smallest possible admissible VRG over all inputs. Moreover, we will construct efficient data mining tools for lossless compressed data and apply to real-world data. Moreover, we will apply our grammar-based compression scheme for other graph structured data.

18 18 K. Yamagata et al. This work is partly supported by Grant-in-Aid for Young Scientists (B) No from the Ministry of Education, Culture, Sports, Science and Technology, Japan, and Hiroshima City University Grant for Special Academic Research(General Studies) No References 1. S. Abiteboul, P. Buneman, and D. Suciu. Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann, A.V. Aho, J.E. Hopcroft, and J.D. Ullman. Data Structures and Algorithms. Addison-Wesley, T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Sakamoto, and S. Arikawa. Efficient substructure discovery from large semi-structured data. Proc. 2nd SIAM Int. Conf. Data Mining (SDM-2002), pages , M. Charikar, E. Lehman, D. Liu, and R. Panigrahy. Approximating the smallest grammar: Kolmogorov Complexity in natural models. Proc. 34th ACM STOC 02, pages , D. J. Cook and L. B. Holder. Graph-based data mining. IEEE Intelligent Systems, 15:32 41, G. Rozenberg (Ed.). Handbook of Graph Grammars and Computing by Graph Transformation, volume 1. World Scientific Publishing, Y. Itokawa, T. Uchida, T. Shoudai, T. Miyahara, and Y. Nakamura. Finding frequent subgraphs from graph structured data with geometric information and its application to lossless. Proc. PAKDD-2003, Springer-Verlag, LNAI 2637, pages , J. C. Kieffer and E-h. Yang. Grammar based codes: A new class of universal lossless source codes. IEEE Transactions on Information Theory, 46: , E. Lehman and A. Shelat. Approximations algorithms for grammar-based compression. Proc. SODA 02, pages , S. Matsumoto, T. Shoudai, T. Miyahara, and T. Uchida. Learning of finite unions of tree patterns with internal structured variables from queries. Proc. AI-2002, Springer-Verlag, LNAI 2557, pages , T. Miyahara, Y. Suzuki, T. Shoudai, T. Uchida, K. Takahashi, and H. Ueda. Discovery of frequent tag tree patterns in semistructured web documents. Proc. PAKDD-2002, Springer-Verlag, LNAI 2336, pages , C. Nevill-Manning and I Witten. Compression and explanation using hierarchical grammars. Computer Journal, 40(2/3): , H. Sakamoto. A fully linear-time approximation algorithm for grammar-based compression. DOI Technical Report 214, Department of Informatics, Kyushu University, Y. Suzuki, R. Akanuma, T. Shoudai, T. Miyahara, and T. Uchida. Polynomial time inductive inference of ordered tree patterns with internal structured variables from positive data. Proc. COLT-2002, Springer-Verlag, LNAI 2375, pages , T. Uchida, Y. Itokawa, T. Shoudai, T. Miyahara, and Y. Nakamura. A new framework for discovering knowledge from two-dimensional structured data using layout formal graph system. Proc. ALT-2000, Springer-Verlag, LNAI 1968, pages , K. Wang and H. Liu. Discovering structural association of semistructured data. IEEE Trans. Knowledge and Data Engineering, 12: , 2000.

Learning Characteristic Structured Patterns in Rooted Planar Maps

Learning Characteristic Structured Patterns in Rooted Planar Maps Learning Characteristic Structured Patterns in Rooted Planar Maps Satoshi Kawamoto Yusuke Suzuki Takayoshi Shoudai Abstract Exting the concept of ordered graphs, we propose a new data structure to express

More information

Monotone Constraints in Frequent Tree Mining

Monotone Constraints in Frequent Tree Mining Monotone Constraints in Frequent Tree Mining Jeroen De Knijf Ad Feelders Abstract Recent studies show that using constraints that can be pushed into the mining process, substantially improves the performance

More information

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph. Trees 1 Introduction Trees are very special kind of (undirected) graphs. Formally speaking, a tree is a connected graph that is acyclic. 1 This definition has some drawbacks: given a graph it is not trivial

More information

A generalization of Mader s theorem

A generalization of Mader s theorem A generalization of Mader s theorem Ajit A. Diwan Department of Computer Science and Engineering Indian Institute of Technology, Bombay Mumbai, 4000076, India. email: aad@cse.iitb.ac.in 18 June 2007 Abstract

More information

Topological Invariance under Line Graph Transformations

Topological Invariance under Line Graph Transformations Symmetry 2012, 4, 329-335; doi:103390/sym4020329 Article OPEN ACCESS symmetry ISSN 2073-8994 wwwmdpicom/journal/symmetry Topological Invariance under Line Graph Transformations Allen D Parks Electromagnetic

More information

A Commit Scheduler for XML Databases

A Commit Scheduler for XML Databases A Commit Scheduler for XML Databases Stijn Dekeyser and Jan Hidders University of Antwerp Abstract. The hierarchical and semistructured nature of XML data may cause complicated update-behavior. Updates

More information

Treewidth and graph minors

Treewidth and graph minors Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under

More information

CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees

CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees MTreeMiner: Mining oth losed and Maximal Frequent Subtrees Yun hi, Yirong Yang, Yi Xia, and Richard R. Muntz University of alifornia, Los ngeles, 90095, US {ychi,yyr,xiayi,muntz}@cs.ucla.edu bstract. Tree

More information

A Simplified Correctness Proof for a Well-Known Algorithm Computing Strongly Connected Components

A Simplified Correctness Proof for a Well-Known Algorithm Computing Strongly Connected Components A Simplified Correctness Proof for a Well-Known Algorithm Computing Strongly Connected Components Ingo Wegener FB Informatik, LS2, Univ. Dortmund, 44221 Dortmund, Germany wegener@ls2.cs.uni-dortmund.de

More information

A 4-Approximation Algorithm for k-prize Collecting Steiner Tree Problems

A 4-Approximation Algorithm for k-prize Collecting Steiner Tree Problems arxiv:1802.06564v1 [cs.cc] 19 Feb 2018 A 4-Approximation Algorithm for k-prize Collecting Steiner Tree Problems Yusa Matsuda and Satoshi Takahashi The University of Electro-Communications, Japan February

More information

Efficient Subtree Inclusion Testing in Subtree Discovering Applications

Efficient Subtree Inclusion Testing in Subtree Discovering Applications Efficient Subtree Inclusion Testing in Subtree Discovering Applications RENATA IVANCSY, ISTVAN VAJK Department of Automation and Applied Informatics and HAS-BUTE Control Research Group Budapest University

More information

The Number of Connected Components in Graphs and Its. Applications. Ryuhei Uehara. Natural Science Faculty, Komazawa University.

The Number of Connected Components in Graphs and Its. Applications. Ryuhei Uehara. Natural Science Faculty, Komazawa University. The Number of Connected Components in Graphs and Its Applications Ryuhei Uehara uehara@komazawa-u.ac.jp Natural Science Faculty, Komazawa University Abstract For any given graph and an integer k, the number

More information

Trees. Q: Why study trees? A: Many advance ADTs are implemented using tree-based data structures.

Trees. Q: Why study trees? A: Many advance ADTs are implemented using tree-based data structures. Trees Q: Why study trees? : Many advance DTs are implemented using tree-based data structures. Recursive Definition of (Rooted) Tree: Let T be a set with n 0 elements. (i) If n = 0, T is an empty tree,

More information

Computing the Longest Common Substring with One Mismatch 1

Computing the Longest Common Substring with One Mismatch 1 ISSN 0032-9460, Problems of Information Transmission, 2011, Vol. 47, No. 1, pp. 1??. c Pleiades Publishing, Inc., 2011. Original Russian Text c M.A. Babenko, T.A. Starikovskaya, 2011, published in Problemy

More information

An Edge-Swap Heuristic for Finding Dense Spanning Trees

An Edge-Swap Heuristic for Finding Dense Spanning Trees Theory and Applications of Graphs Volume 3 Issue 1 Article 1 2016 An Edge-Swap Heuristic for Finding Dense Spanning Trees Mustafa Ozen Bogazici University, mustafa.ozen@boun.edu.tr Hua Wang Georgia Southern

More information

Greedy Algorithms CHAPTER 16

Greedy Algorithms CHAPTER 16 CHAPTER 16 Greedy Algorithms In dynamic programming, the optimal solution is described in a recursive manner, and then is computed ``bottom up''. Dynamic programming is a powerful technique, but it often

More information

Algorithm Design Techniques (III)

Algorithm Design Techniques (III) Algorithm Design Techniques (III) Minimax. Alpha-Beta Pruning. Search Tree Strategies (backtracking revisited, branch and bound). Local Search. DSA - lecture 10 - T.U.Cluj-Napoca - M. Joldos 1 Tic-Tac-Toe

More information

Verifying a Border Array in Linear Time

Verifying a Border Array in Linear Time Verifying a Border Array in Linear Time František Franěk Weilin Lu P. J. Ryan W. F. Smyth Yu Sun Lu Yang Algorithms Research Group Department of Computing & Software McMaster University Hamilton, Ontario

More information

Closed Pattern Mining from n-ary Relations

Closed Pattern Mining from n-ary Relations Closed Pattern Mining from n-ary Relations R V Nataraj Department of Information Technology PSG College of Technology Coimbatore, India S Selvan Department of Computer Science Francis Xavier Engineering

More information

Optimization I : Brute force and Greedy strategy

Optimization I : Brute force and Greedy strategy Chapter 3 Optimization I : Brute force and Greedy strategy A generic definition of an optimization problem involves a set of constraints that defines a subset in some underlying space (like the Euclidean

More information

Applied Mathematics Letters. Graph triangulations and the compatibility of unrooted phylogenetic trees

Applied Mathematics Letters. Graph triangulations and the compatibility of unrooted phylogenetic trees Applied Mathematics Letters 24 (2011) 719 723 Contents lists available at ScienceDirect Applied Mathematics Letters journal homepage: www.elsevier.com/locate/aml Graph triangulations and the compatibility

More information

Throughout the chapter, we will assume that the reader is familiar with the basics of phylogenetic trees.

Throughout the chapter, we will assume that the reader is familiar with the basics of phylogenetic trees. Chapter 7 SUPERTREE ALGORITHMS FOR NESTED TAXA Philip Daniel and Charles Semple Abstract: Keywords: Most supertree algorithms combine collections of rooted phylogenetic trees with overlapping leaf sets

More information

These notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models.

These notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models. Undirected Graphical Models: Chordal Graphs, Decomposable Graphs, Junction Trees, and Factorizations Peter Bartlett. October 2003. These notes present some properties of chordal graphs, a set of undirected

More information

The 3-Steiner Root Problem

The 3-Steiner Root Problem The 3-Steiner Root Problem Maw-Shang Chang 1 and Ming-Tat Ko 2 1 Department of Computer Science and Information Engineering National Chung Cheng University, Chiayi 621, Taiwan, R.O.C. mschang@cs.ccu.edu.tw

More information

Efficient homomorphism-free enumeration of conjunctive queries

Efficient homomorphism-free enumeration of conjunctive queries Efficient homomorphism-free enumeration of conjunctive queries Jan Ramon 1, Samrat Roy 1, and Jonny Daenen 2 1 K.U.Leuven, Belgium, Jan.Ramon@cs.kuleuven.be, Samrat.Roy@cs.kuleuven.be 2 University of Hasselt,

More information

Lecture 5: Graphs. Rajat Mittal. IIT Kanpur

Lecture 5: Graphs. Rajat Mittal. IIT Kanpur Lecture : Graphs Rajat Mittal IIT Kanpur Combinatorial graphs provide a natural way to model connections between different objects. They are very useful in depicting communication networks, social networks

More information

arxiv: v2 [cs.ds] 30 Sep 2016

arxiv: v2 [cs.ds] 30 Sep 2016 Synergistic Sorting, MultiSelection and Deferred Data Structures on MultiSets Jérémy Barbay 1, Carlos Ochoa 1, and Srinivasa Rao Satti 2 1 Departamento de Ciencias de la Computación, Universidad de Chile,

More information

Binary Decision Diagrams

Binary Decision Diagrams Logic and roof Hilary 2016 James Worrell Binary Decision Diagrams A propositional formula is determined up to logical equivalence by its truth table. If the formula has n variables then its truth table

More information

Priority Queues and Binary Heaps

Priority Queues and Binary Heaps Yufei Tao ITEE University of Queensland In this lecture, we will learn our first tree data structure called the binary heap which serves as an implementation of the priority queue. Priority Queue A priority

More information

Faster parameterized algorithms for Minimum Fill-In

Faster parameterized algorithms for Minimum Fill-In Faster parameterized algorithms for Minimum Fill-In Hans L. Bodlaender Pinar Heggernes Yngve Villanger Technical Report UU-CS-2008-042 December 2008 Department of Information and Computing Sciences Utrecht

More information

Faster parameterized algorithms for Minimum Fill-In

Faster parameterized algorithms for Minimum Fill-In Faster parameterized algorithms for Minimum Fill-In Hans L. Bodlaender Pinar Heggernes Yngve Villanger Abstract We present two parameterized algorithms for the Minimum Fill-In problem, also known as Chordal

More information

Efficient Incremental Mining of Top-K Frequent Closed Itemsets

Efficient Incremental Mining of Top-K Frequent Closed Itemsets Efficient Incremental Mining of Top- Frequent Closed Itemsets Andrea Pietracaprina and Fabio Vandin Dipartimento di Ingegneria dell Informazione, Università di Padova, Via Gradenigo 6/B, 35131, Padova,

More information

Graph Theory. Probabilistic Graphical Models. L. Enrique Sucar, INAOE. Definitions. Types of Graphs. Trajectories and Circuits.

Graph Theory. Probabilistic Graphical Models. L. Enrique Sucar, INAOE. Definitions. Types of Graphs. Trajectories and Circuits. Theory Probabilistic ical Models L. Enrique Sucar, INAOE and (INAOE) 1 / 32 Outline and 1 2 3 4 5 6 7 8 and 9 (INAOE) 2 / 32 A graph provides a compact way to represent binary relations between a set of

More information

Distinctive Frequent Itemset Mining from Time Segmented Databases Using ZDD-Based Symbolic Processing. Shin-ichi Minato and Takeaki Uno

Distinctive Frequent Itemset Mining from Time Segmented Databases Using ZDD-Based Symbolic Processing. Shin-ichi Minato and Takeaki Uno TCS Technical Report TCS -TR-A-09-37 Distinctive Frequent Itemset Mining from Time Segmented Databases Using ZDD-Based Symbolic Processing by Shin-ichi Minato and Takeaki Uno Division of Computer Science

More information

XML Clustering by Bit Vector

XML Clustering by Bit Vector XML Clustering by Bit Vector WOOSAENG KIM Department of Computer Science Kwangwoon University 26 Kwangwoon St. Nowongu, Seoul KOREA kwsrain@kw.ac.kr Abstract: - XML is increasingly important in data exchange

More information

Semistructured Data Store Mapping with XML and Its Reconstruction

Semistructured Data Store Mapping with XML and Its Reconstruction Semistructured Data Store Mapping with XML and Its Reconstruction Enhong CHEN 1 Gongqing WU 1 Gabriela Lindemann 2 Mirjam Minor 2 1 Department of Computer Science University of Science and Technology of

More information

Formal Model. Figure 1: The target concept T is a subset of the concept S = [0, 1]. The search agent needs to search S for a point in T.

Formal Model. Figure 1: The target concept T is a subset of the concept S = [0, 1]. The search agent needs to search S for a point in T. Although this paper analyzes shaping with respect to its benefits on search problems, the reader should recognize that shaping is often intimately related to reinforcement learning. The objective in reinforcement

More information

Slides for Faculty Oxford University Press All rights reserved.

Slides for Faculty Oxford University Press All rights reserved. Oxford University Press 2013 Slides for Faculty Assistance Preliminaries Author: Vivek Kulkarni vivek_kulkarni@yahoo.com Outline Following topics are covered in the slides: Basic concepts, namely, symbols,

More information

6. Finding Efficient Compressions; Huffman and Hu-Tucker

6. Finding Efficient Compressions; Huffman and Hu-Tucker 6. Finding Efficient Compressions; Huffman and Hu-Tucker We now address the question: how do we find a code that uses the frequency information about k length patterns efficiently to shorten our message?

More information

Paths, Flowers and Vertex Cover

Paths, Flowers and Vertex Cover Paths, Flowers and Vertex Cover Venkatesh Raman, M.S. Ramanujan, and Saket Saurabh Presenting: Hen Sender 1 Introduction 2 Abstract. It is well known that in a bipartite (and more generally in a Konig)

More information

A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks

A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks Yang Xiang and Tristan Miller Department of Computer Science University of Regina Regina, Saskatchewan, Canada S4S 0A2

More information

Fast algorithm for generating ascending compositions

Fast algorithm for generating ascending compositions manuscript No. (will be inserted by the editor) Fast algorithm for generating ascending compositions Mircea Merca Received: date / Accepted: date Abstract In this paper we give a fast algorithm to generate

More information

Efficient subset and superset queries

Efficient subset and superset queries Efficient subset and superset queries Iztok SAVNIK Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška 8, 5000 Koper, Slovenia Abstract. The paper

More information

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time B + -TREES MOTIVATION An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations finish within O(log N) time The theoretical conclusion

More information

Fixed-Parameter Algorithms, IA166

Fixed-Parameter Algorithms, IA166 Fixed-Parameter Algorithms, IA166 Sebastian Ordyniak Faculty of Informatics Masaryk University Brno Spring Semester 2013 Introduction Outline 1 Introduction Algorithms on Locally Bounded Treewidth Layer

More information

Discrete mathematics

Discrete mathematics Discrete mathematics Petr Kovář petr.kovar@vsb.cz VŠB Technical University of Ostrava DiM 470-2301/02, Winter term 2018/2019 About this file This file is meant to be a guideline for the lecturer. Many

More information

Constructions of hamiltonian graphs with bounded degree and diameter O(log n)

Constructions of hamiltonian graphs with bounded degree and diameter O(log n) Constructions of hamiltonian graphs with bounded degree and diameter O(log n) Aleksandar Ilić Faculty of Sciences and Mathematics, University of Niš, Serbia e-mail: aleksandari@gmail.com Dragan Stevanović

More information

The temporal explorer who returns to the base 1

The temporal explorer who returns to the base 1 The temporal explorer who returns to the base 1 Eleni C. Akrida, George B. Mertzios, and Paul G. Spirakis, Department of Computer Science, University of Liverpool, UK Department of Computer Science, Durham

More information

An approximation algorithm for a bottleneck k-steiner tree problem in the Euclidean plane

An approximation algorithm for a bottleneck k-steiner tree problem in the Euclidean plane Information Processing Letters 81 (2002) 151 156 An approximation algorithm for a bottleneck k-steiner tree problem in the Euclidean plane Lusheng Wang,ZimaoLi Department of Computer Science, City University

More information

Lecture 7 February 26, 2010

Lecture 7 February 26, 2010 6.85: Advanced Data Structures Spring Prof. Andre Schulz Lecture 7 February 6, Scribe: Mark Chen Overview In this lecture, we consider the string matching problem - finding all places in a text where some

More information

Multicut in trees viewed through the eyes of vertex cover

Multicut in trees viewed through the eyes of vertex cover Multicut in trees viewed through the eyes of vertex cover Jianer Chen 1 Jia-Hao Fan 1 Iyad A. Kanj 2 Yang Liu 3 Fenghui Zhang 4 1 Department of Computer Science and Engineering, Texas A&M University, College

More information

Improved algorithms for constructing fault-tolerant spanners

Improved algorithms for constructing fault-tolerant spanners Improved algorithms for constructing fault-tolerant spanners Christos Levcopoulos Giri Narasimhan Michiel Smid December 8, 2000 Abstract Let S be a set of n points in a metric space, and k a positive integer.

More information

st-orientations September 29, 2005

st-orientations September 29, 2005 st-orientations September 29, 2005 Introduction Let G = (V, E) be an undirected biconnected graph of n nodes and m edges. The main problem this chapter deals with is different algorithms for orienting

More information

AUSMS: An environment for frequent sub-structures extraction in a semi-structured object collection

AUSMS: An environment for frequent sub-structures extraction in a semi-structured object collection AUSMS: An environment for frequent sub-structures extraction in a semi-structured object collection P.A Laur 1 M. Teisseire 1 P. Poncelet 2 1 LIRMM, 161 rue Ada, 34392 Montpellier cedex 5, France {laur,teisseire}@lirmm.fr

More information

Generating All Solutions of Minesweeper Problem Using Degree Constrained Subgraph Model

Generating All Solutions of Minesweeper Problem Using Degree Constrained Subgraph Model 356 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'16 Generating All Solutions of Minesweeper Problem Using Degree Constrained Subgraph Model Hirofumi Suzuki, Sun Hao, and Shin-ichi Minato Graduate

More information

Multicasting in the Hypercube, Chord and Binomial Graphs

Multicasting in the Hypercube, Chord and Binomial Graphs Multicasting in the Hypercube, Chord and Binomial Graphs Christopher C. Cipriano and Teofilo F. Gonzalez Department of Computer Science University of California, Santa Barbara, CA, 93106 E-mail: {ccc,teo}@cs.ucsb.edu

More information

Notes on Binary Dumbbell Trees

Notes on Binary Dumbbell Trees Notes on Binary Dumbbell Trees Michiel Smid March 23, 2012 Abstract Dumbbell trees were introduced in [1]. A detailed description of non-binary dumbbell trees appears in Chapter 11 of [3]. These notes

More information

Huffman Coding. Version of October 13, Version of October 13, 2014 Huffman Coding 1 / 27

Huffman Coding. Version of October 13, Version of October 13, 2014 Huffman Coding 1 / 27 Huffman Coding Version of October 13, 2014 Version of October 13, 2014 Huffman Coding 1 / 27 Outline Outline Coding and Decoding The optimal source coding problem Huffman coding: A greedy algorithm Correctness

More information

General Models for Optimum Arbitrary-Dimension FPGA Switch Box Designs

General Models for Optimum Arbitrary-Dimension FPGA Switch Box Designs General Models for Optimum Arbitrary-Dimension FPGA Switch Box Designs Hongbing Fan Dept. of omputer Science University of Victoria Victoria B anada V8W P6 Jiping Liu Dept. of Math. & omp. Sci. University

More information

Optimum Alphabetic Binary Trees T. C. Hu and J. D. Morgenthaler Department of Computer Science and Engineering, School of Engineering, University of C

Optimum Alphabetic Binary Trees T. C. Hu and J. D. Morgenthaler Department of Computer Science and Engineering, School of Engineering, University of C Optimum Alphabetic Binary Trees T. C. Hu and J. D. Morgenthaler Department of Computer Science and Engineering, School of Engineering, University of California, San Diego CA 92093{0114, USA Abstract. We

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Λέων-Χαράλαμπος Σταματάρης

Λέων-Χαράλαμπος Σταματάρης Λέων-Χαράλαμπος Σταματάρης INTRODUCTION Two classical problems of information dissemination in computer networks: The broadcasting problem: Distributing a particular message from a distinguished source

More information

Unifying and extending hybrid tractable classes of CSPs

Unifying and extending hybrid tractable classes of CSPs Journal of Experimental & Theoretical Artificial Intelligence Vol. 00, No. 00, Month-Month 200x, 1 16 Unifying and extending hybrid tractable classes of CSPs Wady Naanaa Faculty of sciences, University

More information

Analysis of Algorithms - Greedy algorithms -

Analysis of Algorithms - Greedy algorithms - Analysis of Algorithms - Greedy algorithms - Andreas Ermedahl MRTC (Mälardalens Real-Time Reseach Center) andreas.ermedahl@mdh.se Autumn 2003 Greedy Algorithms Another paradigm for designing algorithms

More information

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey G. Shivaprasad, N. V. Subbareddy and U. Dinesh Acharya

More information

Simpler, Linear-time Transitive Orientation via Lexicographic Breadth-First Search

Simpler, Linear-time Transitive Orientation via Lexicographic Breadth-First Search Simpler, Linear-time Transitive Orientation via Lexicographic Breadth-First Search Marc Tedder University of Toronto arxiv:1503.02773v1 [cs.ds] 10 Mar 2015 Abstract Comparability graphs are the undirected

More information

Online Algorithms for Mining Semi-structured Data Stream

Online Algorithms for Mining Semi-structured Data Stream DOI-TR-211 June 2002 Department of Informatics, Kyushu Univeristy ftp://ftp.i.kyushu-u.ac.jp/pub/tr/trcs211.ps.gz Online Algorithms for Mining Semi-structured Data Stream (To appear in Proc. 2002 IEEE

More information

Greedy Algorithms 1. For large values of d, brute force search is not feasible because there are 2 d

Greedy Algorithms 1. For large values of d, brute force search is not feasible because there are 2 d Greedy Algorithms 1 Simple Knapsack Problem Greedy Algorithms form an important class of algorithmic techniques. We illustrate the idea by applying it to a simplified version of the Knapsack Problem. Informally,

More information

On Covering a Graph Optimally with Induced Subgraphs

On Covering a Graph Optimally with Induced Subgraphs On Covering a Graph Optimally with Induced Subgraphs Shripad Thite April 1, 006 Abstract We consider the problem of covering a graph with a given number of induced subgraphs so that the maximum number

More information

Representations of Weighted Graphs (as Matrices) Algorithms and Data Structures: Minimum Spanning Trees. Weighted Graphs

Representations of Weighted Graphs (as Matrices) Algorithms and Data Structures: Minimum Spanning Trees. Weighted Graphs Representations of Weighted Graphs (as Matrices) A B Algorithms and Data Structures: Minimum Spanning Trees 9.0 F 1.0 6.0 5.0 6.0 G 5.0 I H 3.0 1.0 C 5.0 E 1.0 D 28th Oct, 1st & 4th Nov, 2011 ADS: lects

More information

Laboratory Module Trees

Laboratory Module Trees Purpose: understand the notion of 2-3 trees to build, in C, a 2-3 tree 1 2-3 Trees 1.1 General Presentation Laboratory Module 7 2-3 Trees 2-3 Trees represent a the simplest type of multiway trees trees

More information

A Method for Construction of Orthogonal Arrays 1

A Method for Construction of Orthogonal Arrays 1 Eighth International Workshop on Optimal Codes and Related Topics July 10-14, 2017, Sofia, Bulgaria pp. 49-54 A Method for Construction of Orthogonal Arrays 1 Iliya Bouyukliev iliyab@math.bas.bg Institute

More information

arxiv: v1 [math.co] 5 Apr 2012

arxiv: v1 [math.co] 5 Apr 2012 Remoteness, proximity and few other distance invariants in graphs arxiv:104.1184v1 [math.co] 5 Apr 01 Jelena Sedlar University of Split, Faculty of civil engeneering, architecture and geodesy, Matice hrvatske

More information

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Fundamenta Informaticae 56 (2003) 105 120 105 IOS Press A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Jesper Jansson Department of Computer Science Lund University, Box 118 SE-221

More information

An Efficient Algorithm for Solving Pseudo Clique Enumeration Problem

An Efficient Algorithm for Solving Pseudo Clique Enumeration Problem An Efficient Algorithm for Solving Pseudo Clique Enumeration Problem Takeaki Uno National Institute of Informatics 2-1-2, Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan, uno@nii.jp Abstract. The problem

More information

V Advanced Data Structures

V Advanced Data Structures V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,

More information

fied by a regular expression [4,7,9,11,23,16]. However, this kind of navigational queries is not completely satisfactory since in many cases we would

fied by a regular expression [4,7,9,11,23,16]. However, this kind of navigational queries is not completely satisfactory since in many cases we would Electronic Notes in Theoretical Computer Science 50 No. 3 (2001) Proc. GT-VMT 2001 URL: http://www.elsevier.nl/locate/entcs/volume50.html 10 pages Graph Grammars for Querying Graph-like Data S. Flesca,

More information

The Structure of Bull-Free Perfect Graphs

The Structure of Bull-Free Perfect Graphs The Structure of Bull-Free Perfect Graphs Maria Chudnovsky and Irena Penev Columbia University, New York, NY 10027 USA May 18, 2012 Abstract The bull is a graph consisting of a triangle and two vertex-disjoint

More information

Optimal Region for Binary Search Tree, Rotation and Polytope

Optimal Region for Binary Search Tree, Rotation and Polytope Optimal Region for Binary Search Tree, Rotation and Polytope Kensuke Onishi Mamoru Hoshi 2 Department of Mathematical Sciences, School of Science Tokai University, 7 Kitakaname, Hiratsuka, Kanagawa, 259-292,

More information

Data Structures and Algorithms

Data Structures and Algorithms Data Structures and Algorithms Trees Sidra Malik sidra.malik@ciitlahore.edu.pk Tree? In computer science, a tree is an abstract model of a hierarchical structure A tree is a finite set of one or more nodes

More information

CS 6783 (Applied Algorithms) Lecture 5

CS 6783 (Applied Algorithms) Lecture 5 CS 6783 (Applied Algorithms) Lecture 5 Antonina Kolokolova January 19, 2012 1 Minimum Spanning Trees An undirected graph G is a pair (V, E); V is a set (of vertices or nodes); E is a set of (undirected)

More information

Algorithms Dr. Haim Levkowitz

Algorithms Dr. Haim Levkowitz 91.503 Algorithms Dr. Haim Levkowitz Fall 2007 Lecture 4 Tuesday, 25 Sep 2007 Design Patterns for Optimization Problems Greedy Algorithms 1 Greedy Algorithms 2 What is Greedy Algorithm? Similar to dynamic

More information

CSE 431/531: Algorithm Analysis and Design (Spring 2018) Greedy Algorithms. Lecturer: Shi Li

CSE 431/531: Algorithm Analysis and Design (Spring 2018) Greedy Algorithms. Lecturer: Shi Li CSE 431/531: Algorithm Analysis and Design (Spring 2018) Greedy Algorithms Lecturer: Shi Li Department of Computer Science and Engineering University at Buffalo Main Goal of Algorithm Design Design fast

More information

An Improved Algorithm for Matching Large Graphs

An Improved Algorithm for Matching Large Graphs An Improved Algorithm for Matching Large Graphs L. P. Cordella, P. Foggia, C. Sansone, M. Vento Dipartimento di Informatica e Sistemistica Università degli Studi di Napoli Federico II Via Claudio, 2 8025

More information

Graph Matching: Fast Candidate Elimination Using Machine Learning Techniques

Graph Matching: Fast Candidate Elimination Using Machine Learning Techniques Graph Matching: Fast Candidate Elimination Using Machine Learning Techniques M. Lazarescu 1,2, H. Bunke 1, and S. Venkatesh 2 1 Computer Science Department, University of Bern, Switzerland 2 School of

More information

arxiv: v3 [cs.ds] 18 Apr 2011

arxiv: v3 [cs.ds] 18 Apr 2011 A tight bound on the worst-case number of comparisons for Floyd s heap construction algorithm Ioannis K. Paparrizos School of Computer and Communication Sciences Ècole Polytechnique Fèdèrale de Lausanne

More information

1 Format. 2 Topics Covered. 2.1 Minimal Spanning Trees. 2.2 Union Find. 2.3 Greedy. CS 124 Quiz 2 Review 3/25/18

1 Format. 2 Topics Covered. 2.1 Minimal Spanning Trees. 2.2 Union Find. 2.3 Greedy. CS 124 Quiz 2 Review 3/25/18 CS 124 Quiz 2 Review 3/25/18 1 Format You will have 83 minutes to complete the exam. The exam may have true/false questions, multiple choice, example/counterexample problems, run-this-algorithm problems,

More information

Spanners of Complete k-partite Geometric Graphs

Spanners of Complete k-partite Geometric Graphs Spanners of Complete k-partite Geometric Graphs Prosenjit Bose Paz Carmi Mathieu Couture Anil Maheshwari Pat Morin Michiel Smid May 30, 008 Abstract We address the following problem: Given a complete k-partite

More information

GRAPH THEORETICAL ALGORITHMS FOR CONTROL FLOW GRAPH COMPARISON

GRAPH THEORETICAL ALGORITHMS FOR CONTROL FLOW GRAPH COMPARISON Proceedings of the IASTED International Conference Software Engineering (SE 21) February 17-19, 21 Innsbruck, Austria GRAPH THEORETICAL ALGORITHMS FOR CONTROL FLOW GRAPH COMPARISON Sergej Alekseev Fachhochschule

More information

Heap-on-Top Priority Queues. March Abstract. We introduce the heap-on-top (hot) priority queue data structure that combines the

Heap-on-Top Priority Queues. March Abstract. We introduce the heap-on-top (hot) priority queue data structure that combines the Heap-on-Top Priority Queues Boris V. Cherkassky Central Economics and Mathematics Institute Krasikova St. 32 117418, Moscow, Russia cher@cemi.msk.su Andrew V. Goldberg NEC Research Institute 4 Independence

More information

V Advanced Data Structures

V Advanced Data Structures V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,

More information

Branch and Bound Algorithm for Vertex Bisection Minimization Problem

Branch and Bound Algorithm for Vertex Bisection Minimization Problem Branch and Bound Algorithm for Vertex Bisection Minimization Problem Pallavi Jain, Gur Saran and Kamal Srivastava Abstract Vertex Bisection Minimization problem (VBMP) consists of partitioning the vertex

More information

Graph Algorithms Using Depth First Search

Graph Algorithms Using Depth First Search Graph Algorithms Using Depth First Search Analysis of Algorithms Week 8, Lecture 1 Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Graph Algorithms Using Depth

More information

Theorem 2.9: nearest addition algorithm

Theorem 2.9: nearest addition algorithm There are severe limits on our ability to compute near-optimal tours It is NP-complete to decide whether a given undirected =(,)has a Hamiltonian cycle An approximation algorithm for the TSP can be used

More information

CHAPTER 3 LITERATURE REVIEW

CHAPTER 3 LITERATURE REVIEW 20 CHAPTER 3 LITERATURE REVIEW This chapter presents query processing with XML documents, indexing techniques and current algorithms for generating labels. Here, each labeling algorithm and its limitations

More information

Algorithm and Complexity of Disjointed Connected Dominating Set Problem on Trees

Algorithm and Complexity of Disjointed Connected Dominating Set Problem on Trees Algorithm and Complexity of Disjointed Connected Dominating Set Problem on Trees Wei Wang joint with Zishen Yang, Xianliang Liu School of Mathematics and Statistics, Xi an Jiaotong University Dec 20, 2016

More information

A CSP Search Algorithm with Reduced Branching Factor

A CSP Search Algorithm with Reduced Branching Factor A CSP Search Algorithm with Reduced Branching Factor Igor Razgon and Amnon Meisels Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, 84-105, Israel {irazgon,am}@cs.bgu.ac.il

More information

The self-minor conjecture for infinite trees

The self-minor conjecture for infinite trees The self-minor conjecture for infinite trees Julian Pott Abstract We prove Seymour s self-minor conjecture for infinite trees. 1. Introduction P. D. Seymour conjectured that every infinite graph is a proper

More information

Keywords: Data Mining, TAR, XML.

Keywords: Data Mining, TAR, XML. Volume 6, Issue 6, June 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com TAR: Algorithm

More information

Analysis of Algorithms

Analysis of Algorithms Analysis of Algorithms Concept Exam Code: 16 All questions are weighted equally. Assume worst case behavior and sufficiently large input sizes unless otherwise specified. Strong induction Consider this

More information