Efficiently Enumerating Results of Keyword Search

Size: px
Start display at page:

Download "Efficiently Enumerating Results of Keyword Search"

Transcription

1 Efficiently Enumerating Results of Keyword Search Benny Kimelfeld and Yehoshua Sagiv The Selim and Rachel Benin School of Engineering and Computer Science The Hebrew University of Jerusalem Edmond J. Safra Campus Jerusalem 91904, Israel Abstract. Various approaches for keyword search have been explored in different settings, including databases, XML and the Web. It is shown that in many cases, systems that incorporate keyword search actually solve similar problems. This paper describes, for this type of problems, the first algorithms that are provably efficient, that is, run with polynomial delay. Specifically, algorithms for enumerating K-fragments are given, where a K-fragment is a subtree T of the given data graph, such that T contains all the keywords of K and no proper subtree of T has this property. Three types of K-fragments are considered: rooted, undirected and strong. For all three types, there are algorithms that enumerate all K-fragments with polynomial delay. For rooted K-fragments and acyclic data graphs, there is an algorithm that enumerates with polynomial delay in the order of increasing weight, assuming that K is of a fixed size. 1 Introduction The advent of the World-Wide Web and the proliferation of search engines has transformed keyword search from a niche role to a major player in the information-technology field. Modern database languages should have both querying and searching capabilities. In recent years, different approaches for developing such capabilities have been investigated. An early example is keyword search in databases [8]. More recently, several papers [1, 3, 9, 10] proposed systems that support keyword search in relational databases. Naturally, keyword search is highly relevant to XML. There are, however, two facets of XML. Data-centric XML is essentially semistructured data and usually query languages (e.g., XQuery) are used for retrieving information. Documentcentric XML consists of large chunks of text, with XML tags that are used mostly for indicating the structure of documents rather than relationships among data items. Consequently, there are different approaches for handling keyword search in XML some are aimed at data-centric XML while others are tailored for document-centric XML. INEX [6] is an initiative that focuses on document-centric XML and evaluates retrieving techniques for different types of queries. One type is just a list of keywords, while another type consists of both keywords and structural conditions

2 that are written in a style similar to XPath. For both types of queries, a result is an element of a document (with all its descendant elements) rather than a whole document. XKeyword [11] is a tool that generates, from a given XML document, descriptive portions containing all the specified keywords. Query evaluation in XKeyword is based on the method that was developed in DISCOVER [10] for keyword search in relational databases. The approach of [4] is aimed at data-centric XML and its goal is to find semantic relationships among nodes of XML documents. Efficient solutions for tree documents are given in [4]. XSEarch [5] combines the approach of [4] with information-retrieval techniques. The notion of semantic relationships is generalized in [14] to graph documents (i.e., XML documents that may have ID references). The above approaches consider different settings and use a variety of techniques. At the core, however, many of these approaches deal with similar graph problems. The goal of this paper is to clearly identify these graph problems and provide provably efficient algorithms (rather than heuristics) for solving them. Essentially, in all of the above approaches, data are (or can be) represented as a graph that has two types of nodes: structural nodes and keyword nodes. For example, in the systems that implement keyword search in relational databases [1, 3, 9, 10], structural nodes represent tuples. Two tuples are connected by an edge if they can be joined on a foreign key. A tuple t and a keyword k are connected if t contains k. A formal framework for keyword search in data graphs is presented in [15]. A key concept in this framework is reduced subtrees. Given a data graph G and a set of keywords K, a subtree T of G is reduced with respect to (abbr. w.r.t.) K if T contains the keywords of K, but no proper subtree of T contains all of these keywords. A K-fragment is a subtree of G that is reduced w.r.t. K. The results of a keyword search are K-fragments. Actually, there are three types of K- fragments: rooted (i.e., directed), undirected and strong. A strong K-fragment is an undirected K-fragment, such that all its keyword nodes are leaves (and since it is reduced w.r.t. K, all its leaves are keywords of K). Note that in a directed data graph, keyword nodes do not have outgoing edges. Hence, in a rooted K- fragment, all keyword nodes must be leaves. In an undirected K-fragment, on the other hand, keyword nodes are not necessarily leaves. In many of the approaches mentioned earlier, processing a keyword search is simply an enumeration of all K-fragments. This is true also for the information unit approach [16] to searching the Web. Typically, results of a keyword search are either strong K-fragments [1, 9 11, 14, 16] or rooted K-fragments [3, 14]; however, they could also be undirected K-fragments [14]. Thus far, heuristics have been employed to solve the different variants of this enumeration problem. These heuristics may perform well in practice, but they either lack a clear upper bound or have an exponential upper bound, even if the number of results is small. 2

3 In this paper, we give efficient algorithms for enumerating K-fragments. Since the output of an enumeration algorithm can be exponential in the size of the input, we use the yardstick of enumeration with polynomial delay as an indication of efficiency. We show that all rooted, undirected or strong K-fragments can be enumerated with polynomial delay. We also consider the problem of enumerating by increasing weight. Specifically, we show that if the size of K is fixed, then all rooted K-fragments of an acyclic data graph can be enumerated by increasing weight with polynomial delay. Note that a known NP-complete problem [7] implies that this result can hold only if the size of K is assumed to be fixed. Making this assumption is realistic and in line with the notion of data complexity [17], which is commonly used for measuring the complexity of query evaluation. In summary, the main contribution of this paper is in giving, for the first time, provably efficient algorithms for enumeration problems that need to be solved in many different settings of keyword search. These settings include relational databases, data-centric as well as document-centric XML, and the Web. This paper is organized as follows. Section 2 defines basic concepts and notations. The notion of enumeration algorithms, their complexity measures, and threaded enumerators are discussed in Section 3. The algorithms are described in Sections 4, 5 and 6. We present a heuristics for sorted enumerations in Section 7. We conclude and discuss future work in Section 8. In Appendix A, we describe two algorithms that cannot be given in the paper itself due to a lack of space. In Appendices B and C, we give proofs of correctness for two of our algorithms. In Appendix D, we give a detailed complexity analysis for the first algorithm. 2 Preliminaries 2.1 Data Graphs A data graph G consists of a set V(G) of nodes and a set E(G) of edges. There are two types of nodes: structural nodes and keyword nodes (or keywords for short). S(G) denotes the set of structural nodes and K(G) denotes the set of keyword nodes. Unless explicitly stated otherwise, edges are directed, i.e., an edge is a pair (n 1, n 2 ) of nodes. Keywords have only incoming edges, while structural nodes may have both incoming and outgoing edges. Hence, no edge can connect two keywords. These restrictions mean that E(G) S(G) V(G). The edges of a data graph G may have weights. The weight function w G assigns a positive weight w G (e) to every edge e E(G). The weight of the data graph G, denoted w(g), is the sum of the weights of all the edges of G, i.e., w(g) = e E(G) w G(e). A data graph is rooted if it contains some node r, such that every node of G is reachable from r through a directed path. The node r is called a root of G. (Note that a rooted data graph may have several roots.) A data graph is connected if its underlying undirected graph is connected. As an example, consider the data graph G 1 depicted in Figure 1. (This data graph is a subgraph of the Mondial 1 XML database.) In this graph, filled circles 1 3

4 G 1 continent gov country organization country gov name Monarchy Belgium hq name city Netherlands Monarchy name Brussels Fig. 1. A data graph G 1 represent structural nodes and keywords are written in italic font. Note that the keyword Monarchy appears twice in this figure; however, in the actual data graph, the keyword Monarchy is represented by a single node that has two incoming edges. Also note that the structural nodes of G 1 have labels, but these are ignored in this paper. The data graph G 1 is rooted and the node labeled with continent is the only root. We use two types of data trees. A rooted tree is a rooted data graph, such that there is only one root and for every node u, there is a unique path from the root to u. An undirected tree is a connected data graph that contains no cycles, even when ignoring the directions of the edges. We say that G is a subgraph of the data graph G, denoted G G, if V(G ) V(G) and E(G ) E(G). The weights of edges in G are the same as those in G. Rooted and undirected subtrees are special cases of subgraphs. For a data graph G and a subset U V(G), we denote by G U the induced subgraph of G that consists of the nodes of V(G) \ U and all the edges of G between these nodes. If u V(G), then we may write G u instead of G { u }. If G 1 and G 2 are subgraphs of G, we use G 1 G 2 to denote the subgraph that consists of all the nodes and edges of both G 1 and G 2 ; that is, the graph G that satisfies V(G ) = V(G 1 ) V(G 2 ) and E(G ) = E(G 1 ) E(G 2 ). Given a data graph G, a subset U V(G) and an edge e = (v, u) E(G), we use U ±e to denote the set (U \ { u }) { v }. Given two nodes u and v in a data graph G, we use u G v to denote the fact that v is reachable from u through a directed path in G. A rooted (respectively, undirected) subtree T of a data graph G is reduced w.r.t. a subset U of the nodes of G if T contains U, but no proper rooted (respectively, undirected) subtree of T contains U. 4

5 name continent name country country Belgium F 1 Netherlands name country city organization name hq country Belgium name gov country Belgium F 2 Monarchy F 3 Netherlands gov name country Netherlands Fig. 2. Fragments of G Keyword Search A query is simply a finite set K of keywords. Given a data graph G, a rooted K-fragment (abbr. RKF) is a rooted subtree of G that is reduced w.r.t. K. Similarly, an undirected K-fragment (abbr. UKF) is an undirected subtree of G that is reduced w.r.t. K. A strong K-fragment (abbr. SKF) is a UKF, such that all the keywords are leaves. Note that an RKF is also an SKF and an SKF is also a UKF. Figure 2 shows three K-fragments of G 1, where K is the query {Belgium,Netherlands}. F 3 is a UKF, F 2 is an SKF and F 1 is an RKF. In some approaches to keyword search (e.g., [1, 9 11, 16]), the goal is to solve the SKF problem, that is, to enumerate all SKFs for a given K. In other approaches (e.g., [3]), the goal is to solve the RKF problem. The work of [14] considers also the UKF problem. 3 Enumeration Algorithms 3.1 Threaded Enumerators In order to construct efficient enumeration algorithms, we employ threaded enumerators that enable one algorithm to use the elements enumerated by another algorithm (or even by itself, recursively) as soon as these elements are generated, rather than waiting for termination. Formally, an enumeration algorithm E generates, for a given input x, a sequence E 1 (x),..., E n(x) (x). Each element E i (x) is enumerated by the operation print( ). We say that E(x) enumerates a set S if { E 1 (x),..., E n(x) (x) } = S and E i (x) E j (x) for all 1 i < j n(x). Sometimes one enumeration algorithm E uses another enumeration algorithm E, or may even use itself recursively. An important property of an enumeration algorithm is the ability to start generating elements as soon as possible. This property is realized by enabling E to use each element generated by E when that 5

6 element is created, rather than having to wait until E finishes its enumeration. In Java [2], for example, each enumeration algorithm can be implemented as a distinct thread. By using the wait and notify mechanisms, E can stop E after every output and later resume E in order to generate the next output. Java threads are rather complex, since they can be executed concurrently. We need a simpler notion of threads, since concurrency is not essential threads are needed for writing enumeration algorithms that realize the desired time complexity, even if the execution is serial. Next, we describe our notion of threads. We write algorithms in pseudo code using threaded enumerators. A specific threaded enumerator TE is constructed by the command TE := new [E ](x), where E is some enumeration algorithm and x is an input for E. The elements E 1 (x),..., E n(x) (x) are enumerated by repeatedly executing the command next[te ]. The ith execution of next[te ] generates the element E i (x) if 1 i n(x); otherwise, if i > n(x), the null element, denoted, is generated. We assume that is not an element in the output of E(x). An enumeration algorithm E may use a threaded enumerator recursively, i.e, a threaded enumerator for E(x ), where x is usually different from x. As an example, consider the pseudo code of the algorithm ReducedSubtrees, presented in Figure 4. In Line 21, a threaded enumerator is constructed for the algorithm RSExtensions (shown in Figure 5(a)). Line 18 is an example of a recursive construction of a threaded enumerator. 3.2 Measuring the Complexity of Enumeration Algorithms Polynomial time complexity is not a suitable yardstick of efficiency when analyzing an enumeration algorithm, since the output size could be exponential in the input size. In [13], several definitions of efficiency for enumeration algorithms are discussed. The weakest definition is polynomial total time, that is, the running time is polynomial in the combined size of the input and the output. Two stronger definitions consider the time that is needed for generating the ith element, after the first i 1 elements have already been created. Incremental polynomial time means that the ith element is generated in time that is polynomial in the combined size of the input and the first i 1 elements. The strongest definition is polynomial delay, that is, the ith element is generated in time that is polynomial only in the input size. For characterizing space efficiency, we use two definitions. Note that the amount of space needed for writing the output is ignored only the space used for storing intermediate results is measured. The usual definition is polynomial space, that is, the amount of space used by the algorithm is polynomial in the input size. Linearly incremental polynomial space means that the space needed for generating the first i elements is bounded by i times a polynomial in the input size. Note that an enumeration algorithm that runs with polynomial delay uses (at most) linearly incremental polynomial space. All the algorithms in this paper, except for one version of the heuristics of Section 7, run with polynomial delay. The algorithms of the next two sections use polynomial space. 6

7 (1) v (2) G a 2... b r v c (a) (b) Fig. 3. (a) A data graph G 2. (b) Extensions: (1) by a directed path, and (2) by a reduced subtree 4 Enumerating Rooted K-Fragments 4.1 The Algorithm In this section, we describe an algorithm for enumerating RKFs. Our algorithm solves the more general problem of enumerating reduced subtrees. That is, given a data graph G and a subset U V(G), the algorithm enumerates, with polynomial delay, the set RS(G, U) of all rooted subtrees of G that are reduced w.r.t. U. Hence, to solve the RKF problem, we execute the algorithm with U = K, where K is the given set of keywords. If U has only two nodes, the enumeration is done by a rather straightforward algorithm, PairRS(G, u, v), that is described in Appendix A. The problem is more difficult for larger sets of nodes, because for some subsets U U, the set RS(G, U ) might be much larger than the set RS(G, U). For example, for the graph G 2 of Figure 3(a), RS(G 2, { a, b, c }) has only one subtree, whereas the size of RS(G 2, { a, b }) is exponential in the size of G 2. In the algorithm ReducedSubtrees(G, U) of Figure 4, every intermediate result, obtained from the recursive calls in Lines 11 and 18, can be extended into at least one distinct element of RS(G, U). Thus, the complexity is not worse than polynomial total time. Next, we describe this algorithm in detail. In Lines 1 3, the algorithm ReducedSubtrees(G, U) terminates after printing a single tree that has one node and no edges, if U = 1. In Lines 4 5, the algorithm terminates if RS(G, U) is empty. Note that RS(G, U) = if and only if there is no node w of G, such that all the nodes of U are reachable from w. An arbitrary node u U is chosen in Line 6 and if the test of Line 7 is true, then u is a leaf in 7

8 ReducedSubtrees(G, U) 1: if U = 1 then 2: print((u, )) 3: exit 4: if RS(G, U) = then 5: exit 6: choose an arbitrary node u U 7: if v U \ { u }, u G v then 8: W := {w (w, u) is an edge of G and RS(G u, U ±(w,u) ) } 9: for all w W do 10: U w := U ±(w,u) 11: TE := new [ReducedSubtrees](G u, U w) 12: T := next[te ] 13: while T do 14: print(t (w, u)) 15: T := next[te ] 16: else 17: let v U be a node s.t. u v and u G v 18: TE 1 := new [ReducedSubtrees](G, U \ { v }) 19: T := next[te 1 ] 20: while T do 21: TE 2 := new [RSExtensions](G, T, v) 22: T := next[te 2 ] 23: while T do 24: print(t ) 25: T := next[te 2 ] 26: T := next[te 1 ] Fig. 4. Enumerating RS(G, U) every tree of RS(G, U). If so, Line 9 iterates over all nodes w, such that (w, u) is an edge of G and RS(G u, U ±(w,u) ). All the trees of RS(G u, U ±(w,u) ) are enumerated in Lines The edge (w, u) is added to each of these trees and the result is printed in Line 14. If the test of Line 7 is false, then Line 17 arbitrarily chooses a node v U (v u) that is reachable from u. All the trees of RS(G, U \ { v }) are enumerated starting at Line 18. Each of these trees can be extended to a tree of RS(G, U) in two different ways, as illustrated in Figure 3(b). For each T RS(G, U \ { v }), all extensions T of T are enumerated starting at Line 21 by calling RSExtensions(G, T, v). These extensions are printed in Line 24. Next, we explain how RSExtensions(G, T, v) works. Given a node v U and a subtree T RS(G, U \ { v }) having a root r, the algorithm RSExtensions(G, T, v) of Figure 5(a) enumerates all subtrees T, such that T contains T and T RS(G, U). In Lines 5 12, T is extended by directed simple paths. Each path P is from a node u (u r) of T to v, and u is the only node in both P and T. These paths are enumerated by the algorithm 8

9 Paths(G, u, v) of Figure 5(b). The extensions of T by these paths are printed in Line 11. In Lines 13 19, T is extended by reduced subtrees T of G. Each T is reduced w.r.t. { r, v } and r is the only node in both T and T. Note that the root of the new tree is the root of T. The trees T are enumerated by PairRS, which is described in Appendix A. The extensions of T by these trees are printed in Line 18. The following theorem shows the correctness of ReducedSubtrees, and its proof is given in Appendix B. Theorem 1. Let G be a data graph and U be a subset of the nodes of G. The algorithm ReducedSubtrees(G, U) enumerates RS(G, U). Interestingly, the algorithm remains correct even if Line 6 and the test of Line 7 are ignored, and only the else part (i.e., Lines 17 26) is executed in all cases, where in Line 17 v can be any node in U. However, the complexity is no longer polynomial total time, since the enumerator TE 1 may generate trees T that cannot be extended by RSExtensions(G, T, v). For example, consider the graph G 2 of Figure 3(a) and let U = { a, b, c }. If we choose v = c, then all directed paths from a to b will be generated by TE 1. However, none of those paths can be extended to a subtree of RS(G 2, U). If, on the other hand, only the then part (i.e., Lines 8 15) is executed, then the algorithm will not be correct. 4.2 Complexity Analysis In this section, we show that the algorithm ReducedSubtrees enumerate with polynomial delay. We first discuss complexity of enumeration algorithms in general. To prove that an enumeration algorithm enumerates with polynomial delay, we have to calculate the computation cost between successive print commands. Formally, let E(x) enumerate the sequence E 1 (x),..., E n (x). For 1 < i n, the ith interval starts immediately after the printing of E i 1 (x) and ends with the printing of E i (x). The first interval starts at the beginning of the execution of E(x) and ends with the printing of E 1 (x). The (n + 1)st interval starts immediately after the printing of E n (x) and ends when the execution of E(x) terminates. The ith delay of E(x) is the execution cost of the ith interval. The cost of each command, other than a next command, is defined as usual. For a threaded enumerator TE of E (x ), the cost of the ith execution of next[te ] is 1 + C, where C is the ith delay of E (x ). (Note that this a recursive defintion.) The ith space usage of E(x) is the amount of space used for printing the first i elements E 1 (x),..., E i (x). Note that the (n + 1)st space usage is equal to the total space used by E(x) from start to finish. It is not always easy to evaluate the ith delay directly, since an enumeration algorithm may use recursively threads of other enumeration algorithms, leading to complex recursive equations. So, we take a different approach. First, we evaluate the basic ith delay that is defined as the cost of the ith interval, assuming that each next command has a unit cost. Second, for each interval, we count the total number of next commands that are executed during that interval, including next commands of threaded enumerators that are created recursively. The ith 9

10 RSExtensions(G, T, v) 1: let r be the root of T 2: if v V(T ) then 3: print(t ) 4: exit 5: for all u V(T ) \ { r } do 6: Ḡ := G (V(T ) \ { u }) 7: if u Ḡ v then 8: TE := new [Paths](Ḡ, u, v) 9: P := next[te ] 10: while P do 11: print(t P ) 12: P := next[te ] 13: G r := G (V(T ) \ { r }) 14: if RS(G r, { r, v }) then 15: TE := new [PairRS](G r, r, v) 16: T := next[te ] 17: while T do 18: print(t T ) 19: T := next[te ] Paths(G, u, v) 1: if u = v then 2: let P the path containing u only 3: print(p ) 4: exit 5: W := {w (u, w) is an edge of G and w G u v} 6: for all w W do 7: TE := new [Paths](G u, w, v) 8: P := next[te ] 9: while P do 10: print(p (u, w)) 11: P := next[te ] (b) (a) Fig. 5. (a) Enumerating subtree extensions. (b) Enumerating simple directed paths delay is the product of (upper bounds on) the basic ith delay and the number of next commands in the ith interval. In the algorithm ReducedSubtrees, for example, it is rather easy to see that for all threaded enumerators used during that algorithm, the basic ith delay is polynomial in the input size. It is more difficult to show that only a polynomial number of next commands are executed during each interval. Note that this would not be true if the algorithm created threaded enumerators that return empty results (e.g., by ignoring the test of either Line 4 of Figure 4 or Line 14 of Figure 5(a)). The complexity of ReducedSubtrees is summarized in the following theorem and the detailed analysis is given in Appendix D. Theorem 2. Let K be a query of size k and G be a data graph with n nodes and m edges. Consider the execution of ReducedSubtrees(G, K). Let F i denote the ith rooted K-fragment printed and F i denote its number of nodes. Then, The first delay is O (mk F 1 ); For i > 1, the ith delay is O (mk( F i + F i 1 )); and The ith space usage is O (mn). 10

11 Corollary 1. The RKF problem can be solved with polynomial delay and polynomial space. A simple optimization that can be applied to the algorithm is to first remove irrelevant nodes. A node v is considered irrelevant if either no keyword of K can be reached from v or v cannot be reached from any node u, such that all the keywords of K are reachable from u. We implemented the algorithm Reduced- Subtrees and tested it on the data graph of the Mondial XML document (ID references were replaced with edges). We found that usually the running time was improved by an order of magnitude due to this optimization. Also note that the space usage can be reduced to O(m) by implementing the algorithm so that different threaded enumerators share data structures. 5 Enumerating Strong and Undirected K-Fragments Enumerating SKFs is simpler than enumerating RKFs. It suffices to choose an arbitrary keyword k K and recursively enumerate all the strong (K \ { k })- fragments. Each strong (K\{ k })-fragment T is extended to a strong K-fragment by adding all simple undirected paths P, such that P starts at some structural node u of T, ends at k and passes only through structural nodes that are not in T. These paths are enumerated by U-Paths(G, u, k), which is similar to Paths(G, u, v) and its description is omitted. The complete algorithm StrongFragments for enumerating SKFs is described in Appendix A. In order to enumerate UKFs, the algorithm StrongFragments should be modified so that the generated paths may include, between u and k, both structural and keyword nodes that are not in T (note that u itself may also be a keyword). Theorem 3. The SKF and UKF problems can be solved with polynomial delay and polynomial space. The algorithms of this and the previous sections can be easily parallelized by assigning a processor to each threaded enumerator that executes a recursive call for a smaller set of nodes (in Line 18 of ReducedSubtrees and in Line 11 of StrongFragments). The processor that does the recursive call sends the results to the processor that generated that call. The latter extends those results to fragments with one more keyword of K. Note that there is no point in assigning a processor to each threaded enumerator that is created in Line 11 of ReducedSubtrees, since the extension process in this case is very simple (i.e., adding just one edge). 6 Enumerating Rooted K-Fragments in Sorted Order In this section, we present an efficient algorithm for enumerating RKFs by increasing weight, assuming that the query is of a fixed size and the data graph is acyclic. As in the unordered case, we solve this problem by solving the more 11

12 SortedRS(G, U) 1: Initialize( U ) 2: i := 1 3: while T [U, i] do 4: print(t [U, i]) 5: i := i + 1 6: Generate(U, i) Initialize(K) 1: for all subsets W V, such that 1 W K, in the s order do 2: I[W ] := 0 3: u := max W 4: for all edges e = (v, u) in G do 5: N [W ±e, e] := 1 6: Generate(W, 1) NextSubtree(W, e) 1: l := N [W ±e, e] 2: if T [W ±e, l] then 3: return T [W ±e, l] e 4: else 5: return Generate(W, i) 1: if I[W ] i then 2: return 3: if W = 1 then 4: T [W, 1] := (W, ), T [W, 2] :=, I[W ] = 2 5: return 6: u := max W 7: if u has no incoming edges in G then 8: T [W, 1] :=, I[W ] := 1, return 9: let e be an incoming edge of u, such that w(nextsubtree(w, e)) is minimal 10: T [W, i] :=NextSubtree(W, e) 11: if NextSubtree(W, e) then 12: Generate(W ±e, N [W ±e, e] + 1) 13: N [W ±e, e] := N [W ±e, e] : I[W ] := i Fig. 6. Enumerating RS(G, U) by increasing weight general problem of enumerating reduced subtrees by increasing weight. Thus, the input is an acyclic data graph and a subset of nodes. Note that a related, but simpler problem is that of enumerating the k shortest paths (e.g., [12]). We use to denote a topological order on the nodes of G. The maximal element of a nonempty set W is denoted as max W. Given the input G and U, the algorithm generates the reduced subtrees w.r.t. every set of nodes W, such that W U, and stores them in the array T [W, i], where T [W, 1] is the smallest, etc. Values are assigned to T [W, i] in sorted order, and the array I[W ] stores the largest i, such that the subtree T [W, i] has already been created. If T [W, i] = (i 1), it means that the graph G has i 1 subtrees that are reduced w.r.t. W. Consider an edge e entering max W. A sorted sequence of reduced subtrees w.r.t. W can be obtained by adding e to each subtree T [W ±e, i]. Let { T [W ±e, i] e } denote this sequence. The complete sequence { T [W, i] } is generated by merging all the sequences { T [W ±e, i] e } of edges e that enter max W. We use N [W ±e, e] to denote the smallest j, such that the subtree T [W ±e, j] e has not yet been merged into the sequence T [W, i]. 12

13 The algorithm is shown in Figure 6. Subtrees are assigned to T [W, i] in Line 10 of Generate. It can be shown that i = I[W ] + 1 whenever this line is reached. Let e 1,..., e m be all the edges entering max W. The reduced subtree w.r.t. W that is assigned to T [W, i] is chosen in Line 9 and is a minimal subtree among T [W ±e1, N [W ±e1, e 1 ]] e 1,..., T [W ±em, N [W ±em, e m ]] e m, which are obtained by calling NextSubtree. Clearly, all the subtrees T [W ±ej, N [W ±ej, e j ]] (1 j m) should have been generated before T [W, i]. For that reason, if T [W ±e k, N [W ±e k, e k ]] e k is the subtree that has just been assigned to T [W, i], then in Line 12 the subtree T [W ±e k, N [W ±e k, e k ] + 1] is generated. Note that T [W ±e k, N [W ±e k, e k ] + 1] = may hold after executing Line 12; it happens if RS(G, W ±e k ) < N [W ±e k, e k ]+1. (Note that w( ) =.) It is also possible that T [W ±e k, N [W ±e k, e k ]+1] may have already been created before executing Line 12; hence, the test in Line 1 of Generate. The enumeration algorithm SortedRS(G, U) starts by calling the algorithm Initialize( U ) in order to compute T [W, 1] for every nonempty subset W, such that W U. The loop in Line 1 of Initialize traverses the sets W in the s order, where W 1 s W 2 if max W 1 max W 2. After initialization, the subtrees T [U, i] are generated in sorted order. The algorithm terminates when T [U, i] =. The following theorem states the correctness of SortedRS. The crux of the proof (given in Appendix C) is in showing that each of the arrays T, I, and N holds the correct information described above. Theorem 4. Let G be an acyclic data graph and U be a subset of the nodes of G. SortedRS(G, U) enumerates RS(G, U) by increasing weight. Theorem 5. Let K be a query of size k and G be an acyclic data graph with n nodes and m edges. In the execution of SortedRS(G, K), The first delay is O ( mn k) ; For i > 1, the ith delay is O(m); The ith space usage is O ( n k+2 + in 2). Corollary 2. If queries are of fixed size and data graphs are acyclic, then the sorted RKF problem can be solved with polynomial delay. Note that in practice, for each set W, the array T [W, i] should be implemented as a linked list and the array N [W, e] should store pointers to that list. This does not change the running time and it limits the amount of space just to the size of the subtrees that are actually explored for W. 7 A Heuristics for Sorted Enumerations Usually, the goal is enumeration by increasing weight. There are two approaches for achieving this goal. In [1, 10], the enumeration is by increasing weight, but the worst-case upper bound on the running time is (at best) exponential. In [3, 16], a heuristic approach is used to enumerate in an order that is likely to be close to the sorted order. Note that there is no guarantee by how much the actual 13

14 order may deviate from the sorted order. The upper bound on the running time is exponential [16] or not stated [3]. In comparison, the algorithms of Sections 4 and 5 imply that enumeration by increasing weight can be done in polynomial total time (even if the size of the query is unbounded) simply by first generating all the fragments and then sorting them. None of the current systems achieves this worst-case upper bound. Generating and then sorting would work well when there are not too many results. Next, we outline a heuristics that runs with polynomial delay (even if the query is of unbounded size) and enumerates in an order that is likely to be close to the sorted order. The general idea is to apply the algorithms of Sections 4 and 5 in a neighborhood of the data graph around the keywords of K, starting with the neighborhood comprising just the keywords of K and then enlarging this neighborhood in stages. The heuristics for building the successive neighborhoods is based on assigning a cost C(n) to each node n and then adding the nodes, one at a time, in the order of increasing cost. C(n) could be, for example, the sum of (or maximal value among) the distances between n and the keywords of K. Alternatively, C(n) could be a number that is at most twice the weight of a minimal undirected subtree that contains all the keywords of K and n. Note that in either case, C(n) can be computed efficiently. For a given neighborhood, we should generate all K-fragments that contain v, where v is the most-recently added node. One way of doing that is by applying directly the algorithms of Sections 4 and 5, and printing only those K-fragments that contain the node v. This would result in an enumeration that runs in incremental polynomial time. To realize enumeration with polynomial delay, we should have algorithms that can enumerate, with polynomial delay, K-fragments that contain a given node v / K (note that v must be an interior node). We can show that such algorithms exist for enumerating SKFs and UKFs. For RKFs, we can show existence of such an algorithm if the data graph is acyclic, and that for cyclic data graphs, no such algorithm exists, unless P=NP. The proof of these results is beyond the scope of this paper. 8 Conclusion and Future Work We have given provably efficient algorithms for problems that occur in different settings of keyword search, including databases, data-centric XML as well as document-centric XML, and the Web. Ours are the first algorithms, for these type of problems, that run with polynomial delay (or even polynomial total time). We have also shown how our algorithms can lead to heuristics and we believe that this heuristics will outperform existing ones [1, 3, 10, 11, 16]. Experimentation with this heuristics, however, is beyond the scope of this paper. The results of these paper can be extended in two ways. First, for queries of fixed size, all K-fragments can be enumerated with polynomial delay and in the order of increasing weight. This result holds for all three types of K-fragments (i.e., RKFs, SKFs and UKFs), but for RKFs the polynomial delay is not as good 14

15 as the polynomial delay of the algorithm SortedRS of Section 6, which works under the additional assumption that the data graph is acyclic. The second extension is a formal definition of enumeration in an approximate order, as well as algorithms for enumerating all three types of K-fragments in an approximate order and with polynomial delay, even if queries are of unbounded size. Note that the heuristics of Section 7 does not satisfy the notion of an approximate order but it has a better polynomial delay. These extensions are summarized in [15] and will be described in detail in a future paper. Additional future work includes the development of indices and other optimizations that would enhance the efficiency of our algorithms. References 1. S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: enabling keyword search over relational databases. In SIGMOD Conference, page 627, Ken Arnold, James Gosling, and David Holmes. The Java Programming Language. Addison-Wesley Longman Publishing Co., Inc., G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. In ICDE, pages , S. Cohen, Y. Kanza, and Y. Sagiv. Generating relations from XML documents. In ICDT, pages , S. Cohen, J. Mamou, Y. Kanza, and Y. Sagiv. XSEarch: A semantic search engine for XML. In VLDB, pages 45 56, N. Fuhr, M. Lalmas, and S. Malik, editors. INitiative for the Evaluation of XML Retrieval (INEX). Proceedings of the Second INEX Workshop, M. R. Garey, R. L. Graham, and D. S. Johnson. The complexity of computing Steiner minimal trees. SIAM Journal on Applied Mathematics, 32: , R. Goldman, N. Shivakumar, S. Venkatasubramanian, and H. Garcia-Molina. Proximity search in databases. In VLDB, pages 26 37, V. Hristidis, L. Gravano, and Y. Papakonstantinou. Efficient IR-style keyword search over relational databases. In HDMS, V. Hristidis and Y. Papakonstantinou. DISCOVER: Keyword search in relational databases. In VLDB, pages , V. Hristidis, Y. Papakonstantinou, and A. Balmin. Keyword proximity search on XML graphs. In ICDE, pages , V. M. Jiménez and A. Marzal. Computing the K shortest paths: A new algorithm and an experimental comparison. In Algorithm Engineering, pages 15 29, D.S. Johnson, M. Yannakakis, and C.H. Papadimitriou. On generating all maximal independent sets. Information Processing Letters, 27: , March B. Kimelfeld. Interconnection semantics for XML. Master s thesis, The Hebrew University of Jerusalem, B. Kimelfeld and Y. Sagiv. Efficient engines for keyword proximity search. In WebDB, Wen-Syan Li, K. Selçuk Candan, Quoc Vu, and Divyakant Agrawal. Retrieving and organizing web pages by information unit. In WWW, pages , M. Y. Vardi. The complexity of relational query languages (extended abstract). In STOC, pages ,

16 A Additional Algorithms In this appendix, we describe two algorithms. The algorithm PairRS(G, u, v) of Figure 7 enumerates all rooted subtrees of a data graph G that are reduced w.r.t. { u, v }, where u and v are two given nodes of G. The algorithm StrongFragments(G, K) of Figure 8 enumerates all SKFs for a data graph G and a set of keywords K. PairRS(G, u, v) 1: if u = v then 2: print(({ u }, )) 3: exit 4: if u G v then 5: TE := new [Paths](G, u, v) 6: T := next[te ] 7: while T do 8: print(t ) 9: T := next[te ] 10: let W be the set of nodes w s.t. (w, u) is an edge of G and w and v are reachable from a common node in G u 11: for all w W do 12: TE := new [PairRS](G u, w, v) 13: T := next[te ] 14: while T do 15: print(t (w, u)) 16: T := next[te ] Fig. 7. An algorithm for enumerating RS(G, { u, v }). B Proof of Theorem 1 (Correctness of ) Lemma 1 (correctness of RSExtensions). Suppose that T is a subtree of G that is reduced w.r.t. W and let v be a node of G. RSExtensions(G, T, v) enumerates all subtrees ˆT, such that T ˆT G and ˆT is reduced w.r.t. U = W { v }. Proof. Let RS T (G, U) be the set of trees in RS(G, U) that have T as a subtree. Let T 1,..., T k be the subtrees printed by RSExtensions(G, T, v). The lemma follows from the following claims. Claim 1. T i T j for all 1 i < j k. Claim 2. { T 1,..., T k } RS T (G, U). 16

17 StrongFragments(G, K) 1: if G does not contain a strong K-fragment then 2: exit 3: if K = 1 then 4: print((k, )) 5: exit 6: if K = 2 then 7: let G K be obtained from G by removing all keywords not in K 8: U-Paths(G K, K) 9: exit 10: choose an arbitrary keyword k K 11: TE t := new [StrongFragments](G, K \ { k }) 12: T := next[te t ] 13: while T do 14: for all u S(T ) do 15: Ḡ := (G (S(T ) \ { u })) (K(G) \ { k }) 16: TE p := new [U-Paths](Ḡ, u, k) 17: P := next[te ] p 18: while P do 19: print(t P ) 20: P := next[te p ] 21: T := next[te t ] Fig. 8. Enumerating strong K-fragments. Claim 3. RS T (G, U) { T 1,..., T k }. Claims 1 and 2 are rather straightforward. We will only prove Claim 3. Suppose that ˆT RS T (G, U). We will show that ˆT is generated by the algorithm. Let T be the undirected tree obtained by ignoring the directions of the edges of ˆT. Let r be the root of T and u be a node of T that is closest to v in T. We consider three cases. Case 1: u = v, i.e., v is a node of T and so T is reduced w.r.t. U. In this case, T itself is the only tree in RS T (G, U). This tree is printed in Line 3 of RSExtensions(G, T, v). Case 2: r u v. By the choice of u, there is an undirected path P between u and v in T, and u is the only node of T on P. Thus, v must be a leaf of ˆT that is reachable from u by the path P, which is obtained from P by directing the edges as in ˆT ; otherwise, some node of P would have an in-degree two in ˆT. Therefore, ˆT can be obtained by concatenating T and P. Node u is chosen in some iteration of the loop of Line 5. By the correctness of Paths, P is returned by the threaded enumerator TE that is created in Line 8. Hence, ˆT = T P is eventually printed in Line 11. Case 3: r = u v. Let T be the subtree of ˆT that is reduced w.r.t. { r, v }. The concatenation T T is a subtree of ˆT that includes all the nodes of U, and hence, must be ˆT itself. Note that r is the only node that T and T have 17

18 in common. By the correctness of PairRS, it follows that T is returned by the threaded enumerator TE that is created in Line 15. Hence, ˆT is eventually printed in Line 18. We conclude that ˆT is generated by the algorithm, as claimed. Proof of Theorem 1. The proof follows from the next three claims that are proved by induction on V(G) + U. Claim 1. R(G, U) RS(G, U), where R(G, U) is the set of subtrees generated by ReducedSubtrees(G, U). Claim 2. RS(G, U) R(G, U), where R(G, U) is the same as above. Claim 3. No tree is generated twice by the algorithm. For the basis of the induction, suppose that U = 1; this includes the case where V(G) = 1. Thus, RS(G, U) has a single tree, consisting of the only node in U. ReducedSubtrees(G, U) prints this tree in Line 2 and terminates. Hence, all three claims hold when U = 1. For the inductive step of Claim 1, suppose that U 2. Note that all the trees generated by ReducedSubtrees(G, U) are printed either in Line 14 or in Line 24, depending on the test of Line 7. If this test is true, then the threaded enumerator TE is initialized in Line 11 and, by the inductive hypothesis, it generates reduced subtrees T of G u w.r.t. U ±(w,u). Thus, each T is a reduced subtree of G w.r.t. U ±(w,u), such that T contains w and does not contain u. Therefore, adding the edge (w, u) to T results in a reduced subtree of G w.r.t. U. If the test of Line 7 is false, then by the inductive hypothesis, the threaded enumerator TE 1 (initialized in Line 18) generates reduced subtrees T of G w.r.t. U \{ v }. By Lemma 1, the threaded enumerator TE 2 (initialized in Line 21) extends T to reduced subtrees of G w.r.t. U. For the inductive step of Claim 2, suppose that U 2 and let T RS(G, U). Thus, neither the test of Line 1 nor the test in Line 4 is true, and the algorithm proceeds by choosing a node u in Line 6. To show that T is printed by the algorithm, we consider two cases depending on the test in Line 7. If this test is true, then u must be a leaf of T. Let w be the parent of u in T and let T u be obtained by removing the edge (w, u) from T. The tree T u is a reduced subtree of G u w.r.t. U ±(w,u). By the inductive hypothesis, T u is generated in either Line 12 or 15 and hence, the subtree T is printed in Line 14. If the test in Line 7 is false, then let v be the node chosen in Line 17. Let T be the subtree of T that is reduced w.r.t. U \ { v } (note that T may contain v). By the inductive hypothesis, the threaded enumerator TE 1 (which is initialized in Line 18) generates T. By Lemma 1, the subtree T is generated by the threaded enumerator TE 2 and printed in Line 24. To prove the inductive step of Claim 3, recall that the algorithm prints all subtrees in either Line 14 or Line 24. Consider two subtrees T 1 and T 2 that are printed by the algorithm. If they are printed in Line 14, then either each one has a different node as the parent of u or they are different by the inductive 18

19 hypothesis. If they are printed in Line 24, let v be the node chosen in Line 17. Suppose that T 1 and T 2 are generated as extensions of the trees ˆT 1 and ˆT 2, respectively, where ˆT 1 and ˆT 2 are created by the threaded enumerator TE 1. By Lemma 1, each ˆT i is the subtree of T i that is reduced w.r.t. U \ { v }. Hence, if ˆT 1 and ˆT 2 are not identical, then neither are T 1 and T 2. So suppose that ˆT 1 and ˆT 2 are identical. By the inductive hypothesis, the subtree ˆT 1 is generated only once by the enumerator TE 1. Thus, T 1 and T 2 are printed in the same iteration of the loop of Line 20. Hence, the claim follows from Lemma 1. C Proof of Theorem 4 In this section, we prove Theorem 4 correctness of the algorithm SortedRS. We assume that G is an acyclic data graph and that U is a set of K > 1 nodes in G. We first prove the following lemma. Lemma 2. Let W be a nonempty set of K or fewer nodes in G. During the execution of SortedRS(G, U), whenever Generate(W, i) is being called, i I[W ] + 1 holds. Proof. For i = 1, the lemma is trivial, since I[W ] is a nonnegative integer. So suppose that i > 1. We will first prove the following claim. Claim 1. If Generate(W, i) is called during the execution of SortedRS, then Generate(W, i 1) is called in some prior step. To prove this claim, we consider several cases. Case 1: Generate(W, i) is called in Line 6 of SortedRS(G, U). In this case, Generate(W, i 1) was called in the previous iteration of Line 3 of SortedRS. Case 2: i > 2 and Generate(W, i) is called in Line 12 of Generate(Ŵ, î), for some Ŵ and î. In that call, W = Ŵ ±e for some edge e, and i is the value N [W, e] + 1. Now, consider the step in which N [W, e] was set to its current value i 1. Since i 1 > 1, this step is an execution of Line 13 of Generate, in some previous execution of Generate with Ŵ as an argument. In that execution, Generate(W, i 1) was called in Line 12. Case 3: i = 2 and Generate(W, i) is called in Line 12 of Generate(Ŵ, î), for some Ŵ and î. Since W precedes Ŵ in the s order, Generate(W, 1) is called in Line 6 of Initialize before the set Ŵ is even considered. This completes the proof of Claim 1. The following observation follows from the tests of Line 3 of SortedRS and Line 11 of Generate. Observation 1. If, during the execution of SortedRS(G, U), T [W, i] is set to, then T [W, j] is never called for j > i. Now, consider a specific call to Generate(W, i). Let k be the value of I[W ] at that call. We will show that k i 1. From Claim 1 it follows that Generate(W, i 1) was previously called. Let j be the value of I[W ] at that 19

20 call. Obviously, j k. From Observation 1 it follows that after the call to Generate(W, i 1), the value of T [W, i 1] is not. We now consider that execution of Generate(W, i 1). If the test of Line 1 is true, then j i 1 holds, and hence k i 1. Otherwise, if the test of Line 3 is true, then both i and k must be 2. From Observation 1, the test of Line 7 cannot hold. Finally, if none of these tests is true, then Line 14 is reached, and hence j = i 1. It follows that k i 1, as claimed. For simplification, we assume that for every nonempty set W of K or fewer nodes in G, no two subtrees in RS(G, W ) have the same weight. Note that this assumption is not required for the correctness of the algorithm SortedRS. The notion of safety relates to the values that are stored about a subset of the nodes of G, during the execution of SortedRS(G, U). Let W be a nonempty set of K or fewer nodes in G. We say that W is safe if the following conditions hold: 1. For every i, such that 1 i I[W ], the value T [W, i] is defined, and it forms the ith lightest tree in RS(G, W ), or if i > RS(G, U) ; and 2. If W > 1, then for every incoming edge e of max W, T [W ±e, N [W ±e, e]] is defined, and it forms the lightest tree T, such that T e RS(G, W ) \ { T [W, i] 1 i I[W ] }. If no such T exists, then T [W ±e, N [W ±e, e]] =. Lemma 3. Let W be a nonempty set of K or fewer nodes in G. During the execution of SortedRS(G, U), whenever Generate(W, i) is being called, the set W is safe. Proof. To prove this lemma, we need the following observation. Observation 1. The first time Generate is called with the argument W is in Line 6 of Initialize. This observation follows from the order in which the sets are traversed in Initialize. Another observation we make use of is the following. Observation 2. If W is safe at the first call to Generate(W, 1), then safety of W can only be impaired during some execution of Generate with W as an argument. We prove this lemma by induction on the position of W in the s order. For the base case, we assume that W consists of only one node. Observation 2 implies that it is satisfactory to show that W is safe at the first call to Generate(W, 1), and that safety is not impaired during any execution of Generate with W as an argument. From Observation 1 it follows that, when Generate is first called with W as an argument, the value of I[W ] is 0. Hence, W is trivially safe. Furthermore, Lines 3 5 of Generate imply that W remains safe in the end of each execution of Generate(W, i). We conclude that the lemma holds for W, as required. For the inductive step, assume that W > 1. Let u = max W. We first show that W is safe on the first call to Generate(W, 1) in Initialize. Since 20

21 I[W ] = 0 at that time, Condition 1 of safety is satisfied in an empty manner. For Condition 2, consider an incoming edge e of u. Then, N [W ±e, e] = 1. Since W ±e precedes W in the s order, Generate(W ±e, 1) was called in a previous iteration of Line 1 of Initialize. Hence, I[W ±e ] 1. From the induction hypothesis it follows that W ±e is safe. In particular, T [W ±e, N [W ±e, e]] is defined and it forms the Steiner tree of W ±e, or if RS(G, W ±e ) =. It follows that T [W ±e, 1] is a smallest tree T, such that T e RS(G, W ), if RS(G, W ±e ) ; or, otherwise. We conclude W is safe at the first call to Generate with W as an argument, as claimed. By Observation 2, it is now satisfactory to prove that if W is safe at the beginning some execution of Generate with W as an argument, then W is also safe at the end of that execution. Consider an execution of Generate(W, i), for some i. If the test of Line 1 of Generate is true, then the values of the arrays that relate to the set W remain unchanged, and hence, W remains safe. So assume that this test is false (i.e., i > I[W ]). From Lemma 2 it follows that i = I[W ]+1. Let u = max W. If the test of Line 7 is true, then RS(G, W ) =. In that case, Line 8 implies that W remains safe. Otherwise, let e be the edge that is chosen at Line 9. Since W is safe at the beginning of the algorithm execution, the tree T [W, i] that is defined in Line 10 is the smallest tree in RS(G, W ) \ { T [W, j] 1 j I[W ] }, or, if no such tree exists. Since, in Line 14, I[W ] is set to i (that is, I[W ] is increased by 1), Condition 1 of safety is satisfied at the end of the algorithm execution. To show that Condition 2 of safety is also satisfied, it is enough to show that it is satisfied w.r.t. the edge e. If the test of Line 11 is false, then the value N [W ±e, e] does not change, as required. Otherwise, Generate(W ±e, N [W ±e, e] + 1) is executed in Line 12. Hence, in Line 13, I[W ±e ] N [W ±e, e] + 1, and by the induction hypothesis, the set W ±e is safe. Thus, T [W ±e, N [W ±e, e] + 1] is the smallest tree T, such that T e RS(G, W ), and the weight of T is greater than the weight of T [W ±e, N [W ±e, e]]; or if no such tree exists. Since the value of N [W ±e, e] is incremented in Line 13, Condition 2 is satisfied. We conclude that W remains safe at the end of that execution of Generate, as claimed. Theorem 4 follows directly from Lemma 3, when taking W to be the set U. D Complexity of In this section, we analyze the time complexity of the algorithm ReducedSubtrees. Let E be an enumeration algorithm and x be an input for E. We use N i [E(x)] to denote the number of next commands that are executed during the ith interval of E(x). The following two lemmas give upper bounds on N i [E(x)] for ReducedSubtrees and the other three enumeration algorithms used by ReducedSubtrees. Note that these upper bounds hold under the assumption that each algorithm generates a nonempty result. During the execution of ReducedSubtrees, tests are always made to guarantee that when a threaded enumerator is initialized, it will return a nonempty result. 21

Efficient Engines for Keyword Proximity Search

Efficient Engines for Keyword Proximity Search Efficient Engines for Keyword Proximity Search Benny Kimelfeld The Selim and Rachel Benin School of Engineering and Computer Science The Hebrew University of Jerusalem Edmond J. Safra Campus Jerusalem

More information

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph. Trees 1 Introduction Trees are very special kind of (undirected) graphs. Formally speaking, a tree is a connected graph that is acyclic. 1 This definition has some drawbacks: given a graph it is not trivial

More information

Computing Full Disjunctions

Computing Full Disjunctions Computing Full Disjunctions (Extended Abstract) Yaron Kanza School of Computer Science and Engineering The Hebrew University of Jerusalem yarok@cs.huji.ac.il Yehoshua Sagiv School of Computer Science and

More information

Top-k Keyword Search Over Graphs Based On Backward Search

Top-k Keyword Search Over Graphs Based On Backward Search Top-k Keyword Search Over Graphs Based On Backward Search Jia-Hui Zeng, Jiu-Ming Huang, Shu-Qiang Yang 1College of Computer National University of Defense Technology, Changsha, China 2College of Computer

More information

Theorem 2.9: nearest addition algorithm

Theorem 2.9: nearest addition algorithm There are severe limits on our ability to compute near-optimal tours It is NP-complete to decide whether a given undirected =(,)has a Hamiltonian cycle An approximation algorithm for the TSP can be used

More information

Bipartite Roots of Graphs

Bipartite Roots of Graphs Bipartite Roots of Graphs Lap Chi Lau Department of Computer Science University of Toronto Graph H is a root of graph G if there exists a positive integer k such that x and y are adjacent in G if and only

More information

Keyword search in relational databases. By SO Tsz Yan Amanda & HON Ka Lam Ethan

Keyword search in relational databases. By SO Tsz Yan Amanda & HON Ka Lam Ethan Keyword search in relational databases By SO Tsz Yan Amanda & HON Ka Lam Ethan 1 Introduction Ubiquitous relational databases Need to know SQL and database structure Hard to define an object 2 Query representation

More information

Matching Theory. Figure 1: Is this graph bipartite?

Matching Theory. Figure 1: Is this graph bipartite? Matching Theory 1 Introduction A matching M of a graph is a subset of E such that no two edges in M share a vertex; edges which have this property are called independent edges. A matching M is said to

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

Scan Scheduling Specification and Analysis

Scan Scheduling Specification and Analysis Scan Scheduling Specification and Analysis Bruno Dutertre System Design Laboratory SRI International Menlo Park, CA 94025 May 24, 2000 This work was partially funded by DARPA/AFRL under BAE System subcontract

More information

Distributed minimum spanning tree problem

Distributed minimum spanning tree problem Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Paths, Flowers and Vertex Cover

Paths, Flowers and Vertex Cover Paths, Flowers and Vertex Cover Venkatesh Raman M. S. Ramanujan Saket Saurabh Abstract It is well known that in a bipartite (and more generally in a König) graph, the size of the minimum vertex cover is

More information

arxiv:cs/ v1 [cs.ds] 20 Feb 2003

arxiv:cs/ v1 [cs.ds] 20 Feb 2003 The Traveling Salesman Problem for Cubic Graphs David Eppstein School of Information & Computer Science University of California, Irvine Irvine, CA 92697-3425, USA eppstein@ics.uci.edu arxiv:cs/0302030v1

More information

These notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models.

These notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models. Undirected Graphical Models: Chordal Graphs, Decomposable Graphs, Junction Trees, and Factorizations Peter Bartlett. October 2003. These notes present some properties of chordal graphs, a set of undirected

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Treewidth and graph minors

Treewidth and graph minors Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under

More information

Binary Decision Diagrams

Binary Decision Diagrams Logic and roof Hilary 2016 James Worrell Binary Decision Diagrams A propositional formula is determined up to logical equivalence by its truth table. If the formula has n variables then its truth table

More information

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings On the Relationships between Zero Forcing Numbers and Certain Graph Coverings Fatemeh Alinaghipour Taklimi, Shaun Fallat 1,, Karen Meagher 2 Department of Mathematics and Statistics, University of Regina,

More information

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data Enhua Jiao, Tok Wang Ling, Chee-Yong Chan School of Computing, National University of Singapore {jiaoenhu,lingtw,chancy}@comp.nus.edu.sg

More information

Paths, Flowers and Vertex Cover

Paths, Flowers and Vertex Cover Paths, Flowers and Vertex Cover Venkatesh Raman, M.S. Ramanujan, and Saket Saurabh Presenting: Hen Sender 1 Introduction 2 Abstract. It is well known that in a bipartite (and more generally in a Konig)

More information

Handout 9: Imperative Programs and State

Handout 9: Imperative Programs and State 06-02552 Princ. of Progr. Languages (and Extended ) The University of Birmingham Spring Semester 2016-17 School of Computer Science c Uday Reddy2016-17 Handout 9: Imperative Programs and State Imperative

More information

Solution for Homework set 3

Solution for Homework set 3 TTIC 300 and CMSC 37000 Algorithms Winter 07 Solution for Homework set 3 Question (0 points) We are given a directed graph G = (V, E), with two special vertices s and t, and non-negative integral capacities

More information

On Covering a Graph Optimally with Induced Subgraphs

On Covering a Graph Optimally with Induced Subgraphs On Covering a Graph Optimally with Induced Subgraphs Shripad Thite April 1, 006 Abstract We consider the problem of covering a graph with a given number of induced subgraphs so that the maximum number

More information

Chapter S:V. V. Formal Properties of A*

Chapter S:V. V. Formal Properties of A* Chapter S:V V. Formal Properties of A* Properties of Search Space Graphs Auxiliary Concepts Roadmap Completeness of A* Admissibility of A* Efficiency of A* Monotone Heuristic Functions S:V-1 Formal Properties

More information

Faster parameterized algorithms for Minimum Fill-In

Faster parameterized algorithms for Minimum Fill-In Faster parameterized algorithms for Minimum Fill-In Hans L. Bodlaender Pinar Heggernes Yngve Villanger Technical Report UU-CS-2008-042 December 2008 Department of Information and Computing Sciences Utrecht

More information

Complexity Results on Graphs with Few Cliques

Complexity Results on Graphs with Few Cliques Discrete Mathematics and Theoretical Computer Science DMTCS vol. 9, 2007, 127 136 Complexity Results on Graphs with Few Cliques Bill Rosgen 1 and Lorna Stewart 2 1 Institute for Quantum Computing and School

More information

Byzantine Consensus in Directed Graphs

Byzantine Consensus in Directed Graphs Byzantine Consensus in Directed Graphs Lewis Tseng 1,3, and Nitin Vaidya 2,3 1 Department of Computer Science, 2 Department of Electrical and Computer Engineering, and 3 Coordinated Science Laboratory

More information

From Static to Dynamic Routing: Efficient Transformations of Store-and-Forward Protocols

From Static to Dynamic Routing: Efficient Transformations of Store-and-Forward Protocols SIAM Journal on Computing to appear From Static to Dynamic Routing: Efficient Transformations of StoreandForward Protocols Christian Scheideler Berthold Vöcking Abstract We investigate how static storeandforward

More information

Faster parameterized algorithms for Minimum Fill-In

Faster parameterized algorithms for Minimum Fill-In Faster parameterized algorithms for Minimum Fill-In Hans L. Bodlaender Pinar Heggernes Yngve Villanger Abstract We present two parameterized algorithms for the Minimum Fill-In problem, also known as Chordal

More information

Dominance Constraints and Dominance Graphs

Dominance Constraints and Dominance Graphs Dominance Constraints and Dominance Graphs David Steurer Saarland University Abstract. Dominance constraints logically describe trees in terms of their adjacency and dominance, i.e. reachability, relation.

More information

Number Theory and Graph Theory

Number Theory and Graph Theory 1 Number Theory and Graph Theory Chapter 6 Basic concepts and definitions of graph theory By A. Satyanarayana Reddy Department of Mathematics Shiv Nadar University Uttar Pradesh, India E-mail: satya8118@gmail.com

More information

Generating edge covers of path graphs

Generating edge covers of path graphs Generating edge covers of path graphs J. Raymundo Marcial-Romero, J. A. Hernández, Vianney Muñoz-Jiménez and Héctor A. Montes-Venegas Facultad de Ingeniería, Universidad Autónoma del Estado de México,

More information

These are not polished as solutions, but ought to give a correct idea of solutions that work. Note that most problems have multiple good solutions.

These are not polished as solutions, but ought to give a correct idea of solutions that work. Note that most problems have multiple good solutions. CSE 591 HW Sketch Sample Solutions These are not polished as solutions, but ought to give a correct idea of solutions that work. Note that most problems have multiple good solutions. Problem 1 (a) Any

More information

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Ramin Zabih Computer Science Department Stanford University Stanford, California 94305 Abstract Bandwidth is a fundamental concept

More information

The 3-Steiner Root Problem

The 3-Steiner Root Problem The 3-Steiner Root Problem Maw-Shang Chang 1 and Ming-Tat Ko 2 1 Department of Computer Science and Information Engineering National Chung Cheng University, Chiayi 621, Taiwan, R.O.C. mschang@cs.ccu.edu.tw

More information

Module 11. Directed Graphs. Contents

Module 11. Directed Graphs. Contents Module 11 Directed Graphs Contents 11.1 Basic concepts......................... 256 Underlying graph of a digraph................ 257 Out-degrees and in-degrees.................. 258 Isomorphism..........................

More information

Strongly connected: A directed graph is strongly connected if every pair of vertices are reachable from each other.

Strongly connected: A directed graph is strongly connected if every pair of vertices are reachable from each other. Directed Graph In a directed graph, each edge (u, v) has a direction u v. Thus (u, v) (v, u). Directed graph is useful to model many practical problems (such as one-way road in traffic network, and asymmetric

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

Parameterized graph separation problems

Parameterized graph separation problems Parameterized graph separation problems Dániel Marx Department of Computer Science and Information Theory, Budapest University of Technology and Economics Budapest, H-1521, Hungary, dmarx@cs.bme.hu Abstract.

More information

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Greedy Algorithms (continued) The best known application where the greedy algorithm is optimal is surely

More information

Lecture 8: The Traveling Salesman Problem

Lecture 8: The Traveling Salesman Problem Lecture 8: The Traveling Salesman Problem Let G = (V, E) be an undirected graph. A Hamiltonian cycle of G is a cycle that visits every vertex v V exactly once. Instead of Hamiltonian cycle, we sometimes

More information

Monotone Paths in Geometric Triangulations

Monotone Paths in Geometric Triangulations Monotone Paths in Geometric Triangulations Adrian Dumitrescu Ritankar Mandal Csaba D. Tóth November 19, 2017 Abstract (I) We prove that the (maximum) number of monotone paths in a geometric triangulation

More information

Chordal deletion is fixed-parameter tractable

Chordal deletion is fixed-parameter tractable Chordal deletion is fixed-parameter tractable Dániel Marx Institut für Informatik, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany. dmarx@informatik.hu-berlin.de Abstract. It

More information

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Rada Chirkova Department of Computer Science, North Carolina State University Raleigh, NC 27695-7535 chirkova@csc.ncsu.edu Foto Afrati

More information

Definition For vertices u, v V (G), the distance from u to v, denoted d(u, v), in G is the length of a shortest u, v-path. 1

Definition For vertices u, v V (G), the distance from u to v, denoted d(u, v), in G is the length of a shortest u, v-path. 1 Graph fundamentals Bipartite graph characterization Lemma. If a graph contains an odd closed walk, then it contains an odd cycle. Proof strategy: Consider a shortest closed odd walk W. If W is not a cycle,

More information

A more efficient algorithm for perfect sorting by reversals

A more efficient algorithm for perfect sorting by reversals A more efficient algorithm for perfect sorting by reversals Sèverine Bérard 1,2, Cedric Chauve 3,4, and Christophe Paul 5 1 Département de Mathématiques et d Informatique Appliquée, INRA, Toulouse, France.

More information

V1.0: Seth Gilbert, V1.1: Steven Halim August 30, Abstract. d(e), and we assume that the distance function is non-negative (i.e., d(x, y) 0).

V1.0: Seth Gilbert, V1.1: Steven Halim August 30, Abstract. d(e), and we assume that the distance function is non-negative (i.e., d(x, y) 0). CS4234: Optimisation Algorithms Lecture 4 TRAVELLING-SALESMAN-PROBLEM (4 variants) V1.0: Seth Gilbert, V1.1: Steven Halim August 30, 2016 Abstract The goal of the TRAVELLING-SALESMAN-PROBLEM is to find

More information

FOUR EDGE-INDEPENDENT SPANNING TREES 1

FOUR EDGE-INDEPENDENT SPANNING TREES 1 FOUR EDGE-INDEPENDENT SPANNING TREES 1 Alexander Hoyer and Robin Thomas School of Mathematics Georgia Institute of Technology Atlanta, Georgia 30332-0160, USA ABSTRACT We prove an ear-decomposition theorem

More information

Discharging and reducible configurations

Discharging and reducible configurations Discharging and reducible configurations Zdeněk Dvořák March 24, 2018 Suppose we want to show that graphs from some hereditary class G are k- colorable. Clearly, we can restrict our attention to graphs

More information

Simpler, Linear-time Transitive Orientation via Lexicographic Breadth-First Search

Simpler, Linear-time Transitive Orientation via Lexicographic Breadth-First Search Simpler, Linear-time Transitive Orientation via Lexicographic Breadth-First Search Marc Tedder University of Toronto arxiv:1503.02773v1 [cs.ds] 10 Mar 2015 Abstract Comparability graphs are the undirected

More information

The External Network Problem

The External Network Problem The External Network Problem Jan van den Heuvel and Matthew Johnson CDAM Research Report LSE-CDAM-2004-15 December 2004 Abstract The connectivity of a communications network can often be enhanced if the

More information

In this lecture, we ll look at applications of duality to three problems:

In this lecture, we ll look at applications of duality to three problems: Lecture 7 Duality Applications (Part II) In this lecture, we ll look at applications of duality to three problems: 1. Finding maximum spanning trees (MST). We know that Kruskal s algorithm finds this,

More information

Greedy Algorithms 1. For large values of d, brute force search is not feasible because there are 2 d

Greedy Algorithms 1. For large values of d, brute force search is not feasible because there are 2 d Greedy Algorithms 1 Simple Knapsack Problem Greedy Algorithms form an important class of algorithmic techniques. We illustrate the idea by applying it to a simplified version of the Knapsack Problem. Informally,

More information

MOST attention in the literature of network codes has

MOST attention in the literature of network codes has 3862 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010 Efficient Network Code Design for Cyclic Networks Elona Erez, Member, IEEE, and Meir Feder, Fellow, IEEE Abstract This paper introduces

More information

A Reduction of Conway s Thrackle Conjecture

A Reduction of Conway s Thrackle Conjecture A Reduction of Conway s Thrackle Conjecture Wei Li, Karen Daniels, and Konstantin Rybnikov Department of Computer Science and Department of Mathematical Sciences University of Massachusetts, Lowell 01854

More information

COMP 182: Algorithmic Thinking Prim and Dijkstra: Efficiency and Correctness

COMP 182: Algorithmic Thinking Prim and Dijkstra: Efficiency and Correctness Prim and Dijkstra: Efficiency and Correctness Luay Nakhleh 1 Prim s Algorithm In class we saw Prim s algorithm for computing a minimum spanning tree (MST) of a weighted, undirected graph g. The pseudo-code

More information

Implementation of Skyline Sweeping Algorithm

Implementation of Skyline Sweeping Algorithm Implementation of Skyline Sweeping Algorithm BETHINEEDI VEERENDRA M.TECH (CSE) K.I.T.S. DIVILI Mail id:veeru506@gmail.com B.VENKATESWARA REDDY Assistant Professor K.I.T.S. DIVILI Mail id: bvr001@gmail.com

More information

A CSP Search Algorithm with Reduced Branching Factor

A CSP Search Algorithm with Reduced Branching Factor A CSP Search Algorithm with Reduced Branching Factor Igor Razgon and Amnon Meisels Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, 84-105, Israel {irazgon,am}@cs.bgu.ac.il

More information

On the Complexity of the Policy Improvement Algorithm. for Markov Decision Processes

On the Complexity of the Policy Improvement Algorithm. for Markov Decision Processes On the Complexity of the Policy Improvement Algorithm for Markov Decision Processes Mary Melekopoglou Anne Condon Computer Sciences Department University of Wisconsin - Madison 0 West Dayton Street Madison,

More information

Throughout the chapter, we will assume that the reader is familiar with the basics of phylogenetic trees.

Throughout the chapter, we will assume that the reader is familiar with the basics of phylogenetic trees. Chapter 7 SUPERTREE ALGORITHMS FOR NESTED TAXA Philip Daniel and Charles Semple Abstract: Keywords: Most supertree algorithms combine collections of rooted phylogenetic trees with overlapping leaf sets

More information

DOUBLE DOMINATION CRITICAL AND STABLE GRAPHS UPON VERTEX REMOVAL 1

DOUBLE DOMINATION CRITICAL AND STABLE GRAPHS UPON VERTEX REMOVAL 1 Discussiones Mathematicae Graph Theory 32 (2012) 643 657 doi:10.7151/dmgt.1633 DOUBLE DOMINATION CRITICAL AND STABLE GRAPHS UPON VERTEX REMOVAL 1 Soufiane Khelifi Laboratory LMP2M, Bloc of laboratories

More information

Chapter 11: Graphs and Trees. March 23, 2008

Chapter 11: Graphs and Trees. March 23, 2008 Chapter 11: Graphs and Trees March 23, 2008 Outline 1 11.1 Graphs: An Introduction 2 11.2 Paths and Circuits 3 11.3 Matrix Representations of Graphs 4 11.5 Trees Graphs: Basic Definitions Informally, a

More information

Matching Algorithms. Proof. If a bipartite graph has a perfect matching, then it is easy to see that the right hand side is a necessary condition.

Matching Algorithms. Proof. If a bipartite graph has a perfect matching, then it is easy to see that the right hand side is a necessary condition. 18.433 Combinatorial Optimization Matching Algorithms September 9,14,16 Lecturer: Santosh Vempala Given a graph G = (V, E), a matching M is a set of edges with the property that no two of the edges have

More information

Adjacent: Two distinct vertices u, v are adjacent if there is an edge with ends u, v. In this case we let uv denote such an edge.

Adjacent: Two distinct vertices u, v are adjacent if there is an edge with ends u, v. In this case we let uv denote such an edge. 1 Graph Basics What is a graph? Graph: a graph G consists of a set of vertices, denoted V (G), a set of edges, denoted E(G), and a relation called incidence so that each edge is incident with either one

More information

Greedy Algorithms 1 {K(S) K(S) C} For large values of d, brute force search is not feasible because there are 2 d {1,..., d}.

Greedy Algorithms 1 {K(S) K(S) C} For large values of d, brute force search is not feasible because there are 2 d {1,..., d}. Greedy Algorithms 1 Simple Knapsack Problem Greedy Algorithms form an important class of algorithmic techniques. We illustrate the idea by applying it to a simplified version of the Knapsack Problem. Informally,

More information

On the Max Coloring Problem

On the Max Coloring Problem On the Max Coloring Problem Leah Epstein Asaf Levin May 22, 2010 Abstract We consider max coloring on hereditary graph classes. The problem is defined as follows. Given a graph G = (V, E) and positive

More information

6. Lecture notes on matroid intersection

6. Lecture notes on matroid intersection Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans May 2, 2017 6. Lecture notes on matroid intersection One nice feature about matroids is that a simple greedy algorithm

More information

Disjoint Support Decompositions

Disjoint Support Decompositions Chapter 4 Disjoint Support Decompositions We introduce now a new property of logic functions which will be useful to further improve the quality of parameterizations in symbolic simulation. In informal

More information

Connecting face hitting sets in planar graphs

Connecting face hitting sets in planar graphs Connecting face hitting sets in planar graphs Pascal Schweitzer and Patrick Schweitzer Max-Planck-Institute for Computer Science Campus E1 4, D-66123 Saarbrücken, Germany pascal@mpi-inf.mpg.de University

More information

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) January 11, 2018 Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 In this lecture

More information

Lecture 1. 1 Notation

Lecture 1. 1 Notation Lecture 1 (The material on mathematical logic is covered in the textbook starting with Chapter 5; however, for the first few lectures, I will be providing some required background topics and will not be

More information

MC 302 GRAPH THEORY 10/1/13 Solutions to HW #2 50 points + 6 XC points

MC 302 GRAPH THEORY 10/1/13 Solutions to HW #2 50 points + 6 XC points MC 0 GRAPH THEORY 0// Solutions to HW # 0 points + XC points ) [CH] p.,..7. This problem introduces an important class of graphs called the hypercubes or k-cubes, Q, Q, Q, etc. I suggest that before you

More information

Basic Graph Theory with Applications to Economics

Basic Graph Theory with Applications to Economics Basic Graph Theory with Applications to Economics Debasis Mishra February, 0 What is a Graph? Let N = {,..., n} be a finite set. Let E be a collection of ordered or unordered pairs of distinct elements

More information

Fundamental Properties of Graphs

Fundamental Properties of Graphs Chapter three In many real-life situations we need to know how robust a graph that represents a certain network is, how edges or vertices can be removed without completely destroying the overall connectivity,

More information

Dual-Based Approximation Algorithms for Cut-Based Network Connectivity Problems

Dual-Based Approximation Algorithms for Cut-Based Network Connectivity Problems Dual-Based Approximation Algorithms for Cut-Based Network Connectivity Problems Benjamin Grimmer bdg79@cornell.edu arxiv:1508.05567v2 [cs.ds] 20 Jul 2017 Abstract We consider a variety of NP-Complete network

More information

a Steiner tree for S if S V(T) holds. For convenience, although T is not a rooted tree, we call each degree-1 vertex of T a leaf of T. We say that a l

a Steiner tree for S if S V(T) holds. For convenience, although T is not a rooted tree, we call each degree-1 vertex of T a leaf of T. We say that a l Reachability between Steiner Trees in a Graph Haruka Mizuta 1,2,a) Takehiro Ito 1,3,b) Xiao Zhou 1,c) Abstract: In this paper, we study the reachability between Steiner trees in a graph: Given two Steiner

More information

On The Complexity of Virtual Topology Design for Multicasting in WDM Trees with Tap-and-Continue and Multicast-Capable Switches

On The Complexity of Virtual Topology Design for Multicasting in WDM Trees with Tap-and-Continue and Multicast-Capable Switches On The Complexity of Virtual Topology Design for Multicasting in WDM Trees with Tap-and-Continue and Multicast-Capable Switches E. Miller R. Libeskind-Hadas D. Barnard W. Chang K. Dresner W. M. Turner

More information

Ramsey s Theorem on Graphs

Ramsey s Theorem on Graphs Ramsey s Theorem on Graphs 1 Introduction Exposition by William Gasarch Imagine that you have 6 people at a party. We assume that, for every pair of them, either THEY KNOW EACH OTHER or NEITHER OF THEM

More information

Decreasing the Diameter of Bounded Degree Graphs

Decreasing the Diameter of Bounded Degree Graphs Decreasing the Diameter of Bounded Degree Graphs Noga Alon András Gyárfás Miklós Ruszinkó February, 00 To the memory of Paul Erdős Abstract Let f d (G) denote the minimum number of edges that have to be

More information

1 Variations of the Traveling Salesman Problem

1 Variations of the Traveling Salesman Problem Stanford University CS26: Optimization Handout 3 Luca Trevisan January, 20 Lecture 3 In which we prove the equivalence of three versions of the Traveling Salesman Problem, we provide a 2-approximate algorithm,

More information

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Fundamenta Informaticae 56 (2003) 105 120 105 IOS Press A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Jesper Jansson Department of Computer Science Lund University, Box 118 SE-221

More information

Math 443/543 Graph Theory Notes

Math 443/543 Graph Theory Notes Math 443/543 Graph Theory Notes David Glickenstein September 3, 2008 1 Introduction We will begin by considering several problems which may be solved using graphs, directed graphs (digraphs), and networks.

More information

On 2-Subcolourings of Chordal Graphs

On 2-Subcolourings of Chordal Graphs On 2-Subcolourings of Chordal Graphs Juraj Stacho School of Computing Science, Simon Fraser University 8888 University Drive, Burnaby, B.C., Canada V5A 1S6 jstacho@cs.sfu.ca Abstract. A 2-subcolouring

More information

Key words. graph algorithms, chromatic number, circular arc graphs, induced cycles

Key words. graph algorithms, chromatic number, circular arc graphs, induced cycles SIAM J. COMPUT. Vol. 0, No. 0, pp. 000 000 c XXXX Society for Industrial and Applied Mathematics REVISITING TUCKER S ALGORITHM TO COLOR CIRCULAR ARC GRAPHS MARIO VALENCIA-PABON Abstract. The circular arc

More information

Outline. Introduction. 2 Proof of Correctness. 3 Final Notes. Precondition P 1 : Inputs include

Outline. Introduction. 2 Proof of Correctness. 3 Final Notes. Precondition P 1 : Inputs include Outline Computer Science 331 Correctness of Algorithms Mike Jacobson Department of Computer Science University of Calgary Lectures #2-4 1 What is a? Applications 2 Recursive Algorithms 3 Final Notes Additional

More information

FUTURE communication networks are expected to support

FUTURE communication networks are expected to support 1146 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 13, NO 5, OCTOBER 2005 A Scalable Approach to the Partition of QoS Requirements in Unicast and Multicast Ariel Orda, Senior Member, IEEE, and Alexander Sprintson,

More information

Framework for Design of Dynamic Programming Algorithms

Framework for Design of Dynamic Programming Algorithms CSE 441T/541T Advanced Algorithms September 22, 2010 Framework for Design of Dynamic Programming Algorithms Dynamic programming algorithms for combinatorial optimization generalize the strategy we studied

More information

Rigidity, connectivity and graph decompositions

Rigidity, connectivity and graph decompositions First Prev Next Last Rigidity, connectivity and graph decompositions Brigitte Servatius Herman Servatius Worcester Polytechnic Institute Page 1 of 100 First Prev Next Last Page 2 of 100 We say that a framework

More information

Topology Homework 3. Section Section 3.3. Samuel Otten

Topology Homework 3. Section Section 3.3. Samuel Otten Topology Homework 3 Section 3.1 - Section 3.3 Samuel Otten 3.1 (1) Proposition. The intersection of finitely many open sets is open and the union of finitely many closed sets is closed. Proof. Note that

More information

Copyright 2000, Kevin Wayne 1

Copyright 2000, Kevin Wayne 1 Linear Time: O(n) CS 580: Algorithm Design and Analysis 2.4 A Survey of Common Running Times Merge. Combine two sorted lists A = a 1,a 2,,a n with B = b 1,b 2,,b n into sorted whole. Jeremiah Blocki Purdue

More information

COL351: Analysis and Design of Algorithms (CSE, IITD, Semester-I ) Name: Entry number:

COL351: Analysis and Design of Algorithms (CSE, IITD, Semester-I ) Name: Entry number: Name: Entry number: There are 6 questions for a total of 75 points. 1. Consider functions f(n) = 10n2 n + 3 n and g(n) = n3 n. Answer the following: (a) ( 1 / 2 point) State true or false: f(n) is O(g(n)).

More information

Chapter 3 Trees. Theorem A graph T is a tree if, and only if, every two distinct vertices of T are joined by a unique path.

Chapter 3 Trees. Theorem A graph T is a tree if, and only if, every two distinct vertices of T are joined by a unique path. Chapter 3 Trees Section 3. Fundamental Properties of Trees Suppose your city is planning to construct a rapid rail system. They want to construct the most economical system possible that will meet the

More information

A note on the subgraphs of the (2 )-grid

A note on the subgraphs of the (2 )-grid A note on the subgraphs of the (2 )-grid Josep Díaz a,2,1 a Departament de Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya Campus Nord, Edifici Ω, c/ Jordi Girona Salgado 1-3,

More information

Bottleneck Steiner Tree with Bounded Number of Steiner Vertices

Bottleneck Steiner Tree with Bounded Number of Steiner Vertices Bottleneck Steiner Tree with Bounded Number of Steiner Vertices A. Karim Abu-Affash Paz Carmi Matthew J. Katz June 18, 2011 Abstract Given a complete graph G = (V, E), where each vertex is labeled either

More information

Outline. Definition. 2 Height-Balance. 3 Searches. 4 Rotations. 5 Insertion. 6 Deletions. 7 Reference. 1 Every node is either red or black.

Outline. Definition. 2 Height-Balance. 3 Searches. 4 Rotations. 5 Insertion. 6 Deletions. 7 Reference. 1 Every node is either red or black. Outline 1 Definition Computer Science 331 Red-Black rees Mike Jacobson Department of Computer Science University of Calgary Lectures #20-22 2 Height-Balance 3 Searches 4 Rotations 5 s: Main Case 6 Partial

More information

SEQUENCES, MATHEMATICAL INDUCTION, AND RECURSION

SEQUENCES, MATHEMATICAL INDUCTION, AND RECURSION CHAPTER 5 SEQUENCES, MATHEMATICAL INDUCTION, AND RECURSION Alessandro Artale UniBZ - http://www.inf.unibz.it/ artale/ SECTION 5.5 Application: Correctness of Algorithms Copyright Cengage Learning. All

More information

Universal Cycles for Permutations

Universal Cycles for Permutations arxiv:0710.5611v1 [math.co] 30 Oct 2007 Universal Cycles for Permutations J Robert Johnson School of Mathematical Sciences Queen Mary, University of London Mile End Road, London E1 4NS, UK Email: r.johnson@qmul.ac.uk

More information

Structured System Theory

Structured System Theory Appendix C Structured System Theory Linear systems are often studied from an algebraic perspective, based on the rank of certain matrices. While such tests are easy to derive from the mathematical model,

More information

On vertex types of graphs

On vertex types of graphs On vertex types of graphs arxiv:1705.09540v1 [math.co] 26 May 2017 Pu Qiao, Xingzhi Zhan Department of Mathematics, East China Normal University, Shanghai 200241, China Abstract The vertices of a graph

More information