TF-Label: a Topological-Folding Labeling Scheme for Reachability Querying in a Large Graph

Size: px
Start display at page:

Download "TF-Label: a Topological-Folding Labeling Scheme for Reachability Querying in a Large Graph"

Transcription

1 TF-Label: a Topologcal-Foldng Labelng Scheme for Reachablty Queryng n a Large Graph James Cheng, Slu Huang, Huanhuan Wu, Ada Wa-Chee Fu Department of Computer Scence and Engneerng The Chnese Unversty of Hong Kong {jcheng, slhuang, hhwu, adafu}@cse.cuhk.edu.hk ABSTRACT Reachablty queryng s a basc graph operaton wth numerous mportant applcatons n databases, network analyss, computatonal bology, software engneerng, etc. Although many ndexes have been proposed to answer reachablty queres, most of them are only effcent for handlng relatvely small graphs. We propose TF-label, an effcent and scalable labelng scheme for processng reachablty queres. TF-label s constructed based on a novel topologcal foldng (TF) that recursvely folds an nput graph nto half so as to reduce the label sze, thus mprovng query effcency. We show that TF-label s effcent to construct and propose effcent algorthms and optmzaton schemes. Our experments verfy that TF-label s sgnfcantly more scalable and effcent than the stateof-the-art methods n both ndex constructon and query processng. Categores and Subject Descrptors E.1 [DATA]: DATA STRUCTURES Graphs and networks General Terms Algorthms, Performance Keywords Graph reachablty, graph ndexng, graph queryng 1. INTRODUCTION A reachablty query asks whether there exsts a path from one vertex to another vertex n a drected graph. Reachablty queryng s one of the fundamental operatons n drected graphs. It has a wde range of applcatons such as processng recursve queres n data and knowledge base management, queryng assocatons and logcal reasonng n Web and Semantc Web graphs, pattern matchng n graphs and XML documents, analyzng the bologcal functon of genes, checkng connectons n geographc navgaton systems, socal network analyss, ontology queryng, program analyss, and many more. Reachablty queryng has been extensvely studed n the past [1, 2, 3, 4, 5, 6, 10, 11, 12, 14, 17, 18, 19, 20, 21, 23, 24, 25, Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, to republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. SIGMOD 13, June 22 27, 2013, New York, New York, USA. Copyrght 2013 ACM /13/06...$ , 28]. In recent years, there s a shft of nterest to handle large graphs. The more recent works [6, 18, 19, 25, 28] have hghlghted the applcatons of reachablty queryng n large graphs such as Web graphs, Semantc Web and RDF graphs, socal networks, large XML databases, etc., and more efforts have been gven to the development of scalable methods for answerng reachablty queres. As ponted out n [18], most exstng methods can only handle relatvely small graphs wth tens to hundreds of thousands vertces and edges. For processng larger graphs, these methods are ether too costly n ndexng or n query processng (more dscusson n Secton 9), thus lmtng ther applcaton to real world graphs. For graphs wth mllons of vertces and edges, only a few methods can process them wth reasonably good effcency [19, 25, 28]. For larger graphs wth tens of mllons of vertces and edges, the only known method that attans reasonable ndexng and queryng effcency s the recently proposed backbone structure [18]. A reachablty query, where a vertex s can reach another vertex t, can be answered by (1) frst fndng all backbone vertces B s that can be reached from s and all backbone vertces B t that can reach t, and then (2) check whether any vertex n B s can reach any vertex n B t. Any exstng method can be appled to the backbone graph to process Step (2), and queryng s generally faster snce the backbone can be sgnfcantly smaller than the orgnal graph. Although the backbone s used as a general framework (called SCARAB [18]) to further mprove the scalablty of a reachablty ndex (ncludng ours), an effcent and scalable method tself s stll most crucal for query performance for the two man reasons (both verfed n our experments). Frst, SCARAB tself may not be scalable to large graphs. Second, the backbone of a large graph may stll be too large for exstng methods. We propose an effcent and scalable labelng scheme, whch can process large graphs that cannot be handled by SCARAB and other exstng methods. Gven the labels of s and t,.e., a set of vertces that are reachable from s and can reach t respectvely, we can answer whether s can reach t effcently by smply ntersectng ther labels (as n [14]). We gve the man dea of our method as follows. We propose a novel data structure, called topologcal foldng (TF), based on whch we develop our labelng scheme, TF-label. Gven a drected graph, we can convert t nto a drected acyclc graph (DAG) by condensng each strongly connected component (SCC) n the graph nto a super node. Reachablty queres can be answered on the DAG snce all vertces are reachable from each other wthn an SCC. We defne a topologcal structure T for the DAG. TF s ntutvely a structure obtaned by foldng T nto half each tme, whch essentally mples a great reducton n the label sze as labelng s processed n O(lg l) levels nstead of a total of l levels n T. Then, we apply a labelng technque, nspred by the

2 work of [16], on the TF structure to construct labels for answerng reachablty queres. We summarze the man contrbuton of our work as follows. We propose an effcent and scalable TF-based labelng scheme for reachablty query processng. We propose optmzaton technques such as specal handlng of hgh-degree vertces to further mprove the scalablty. We propose effcent algorthms for constructng the TF structure and then the labels from the TF, as well as the optmzaton technques. Our experments on a wde spectrum of real and synthetc datasets verfy that TF-label acheves compettve ndexng performance and sgnfcantly better query performance than the state-of-the-art methods [18, 19, 25, 28]. In many cases, TF-label s an order to several orders of magntude faster n query processng. We also show that TF-label s more scalable and has stable performance wth the change n varous graph propertes. The rest of the paper s organzed as follows. We frst gve some basc notatons and problem defnton n Secton 2. Then, through Sectons 3 to 7 we present the detals of TF and TF-label wth ther desgn and algorthms. We evaluate the performance of TF-label n Secton 8. Fnally, we dscuss related work n Secton 9 and conclude the paper n Secton NOTATIONS/PROBLEM DEFINITION Gven a drected graph G, a reachablty query asks whether there s a path from a vertex u to another vertex v n G. We assume u v as t s trval to process u = v. Formally, a drected edge,or smply an edge (snce all edges are drected n ths paper), from u to v s denoted by (u, v). Apath P from v 1 to v p n G s defned by P = v 1,...,v p such that (v,v +1) s an edge n G for 1 < p. Weuseu v to ndcate that u can reach v (or v s reachable from u), and u v to ndcate that u cannot reach v. Gven any two vertces u and v n a strongly connected component (SCC) ofg, u can always reach v. Wth ths observaton, exstng methods frst compute a compressed graph, G =(V G,E G), of G as follows: the set of vertces V G of G s the set of SCCs of G, and a drected edge s created n G from one SCC C 1 to another SCC C 2 f there exsts a drected edge (v 1,v 2) n G, wherev 1 s a vertex n C 1 and v 2 s a vertex n C 2. Then, a reachablty query s answered by checkng whether there s a path from C u to C v n G, where C u,c v V G, u s a vertex n C u and v s a vertex n C v. The compressed graph G created above s n fact a drected acyclc graph (DAG). Thus, for smplcty, we call G the DAG of G n ths paper. Snce the SCCs of G can be computed effcently [15], we follow the conventon of exstng methods and assume that the nput to our algorthm s the DAG of the nput drected graph. Gven a DAG, G =(V G,E G), we defne the set of n-neghbors (out-neghbors)ofavertexv V G as nb n(v, G) ={u :(u, v) E G} (nb out(v, G) = {u : (v, u) E G}), and the n-degree (out-degree) ofv as deg n (v, G) = nb n(v, G) (deg out (v, G) = nb out(v, G) ). Problem defnton. We study the followng problem: gven a DAG G =(V G,E G), compute a set of vertex labels (also called an ndex) for processng reachablty queres,.e., gven s, t V G, the query whether s can reach t can be effcently answered usng the labels of s and t. 3. TOPOLOGICAL FOLDING Through Sectons 3 to 6, we present our man ndexng scheme, called TF-label, whch s desgned based on a novel topologcal foldng scheme of the DAG of a drected graph. We frst present the concept of topologcal foldng n ths secton. 3.1 Basc Topologcal Foldng Gven a DAG G =(V G,E G), we start by assgnng each vertex n G a topologcal level number as follows. DEFINITION 1 (TOPOLOGICAL LEVEL NUMBER). Gven a DAG G =(V G,E G), thetopologcal level number of a vertex v V G, denoted by l(v, G), s defned as follows: If nb n(v, G) = : l(v, G) =1; Else: l(v, G) =max{(l(u, G)+1):u nb n(v, G)}. The topologcal level number of G, denoted by l(g), s gven by l(g) =max{l(v, G) :v V G}. Snce G s a DAG, t s easy to see that every vertex v V G has exactly one topologcal level number, whch can be derved from a topologcal orderng of the DAG. Gven the topologcal level number, we now defne the topologcal levels of a DAG and state an mportant property that wll be used n the defnton of topologcal foldng later on. DEFINITION 2 (TOPOLOGICAL LEVELS). A DAG G = (V G,E G) conssts of t topologcal levels of vertces, denoted by {L 1(G),...,L t(g)}, wheret = l(g), and L (G) ={v : v V G,l(v, G) =} for 1 t. LEMMA 1. Each topologcal level L (G) of a DAG G,for1 l(g), sanndependent set of G. PROOF. L (G) s an ndependent set of G f u, v L (G), (u, v) / E G and (v, u) / E G. Suppose to the contrary f (u, v) E G or (v, u) E G,thenwehaveetherl(u, G) <l(v, G) or l(v, G) <l(u, G), contradctng the fact that u, v L (G),.e., l(u, G) =l(v, G) =. To clearly llustrate the concepts, for now let us assume that the DAG G only has edges gong from vertces n L (G) to vertces n L +1(G), and there s no edge gong from any vertex n L (G) to a vertex n L j(g) where j>+1(we wll handle such edges n Secton 3.2). We call such a DAG a k-partte DAG, wherek = l(g). Fgure 1(a) shows an example of a k-partte DAG where k =6. We defne a topologcal foldng scheme that recursvely folds up G by takng away half of the levels, as follows. DEFINITION 3 (TOPOLOGICAL FOLDING (TF)). Gven a l(g)-partte DAG G =(V G,E G), thetopologcal foldng (TF) of G s a set of DAGs, G = {G 1,G 2,...,G f }, where each G =(V G,E G ) s defned as follows: V G1 = V G and for 2 f, V G = 1 j l(g 1 )/2 L2j(G 1); For 1 f, E G s a set of edges wth whch G s a l(g )-partte DAG and u, v V G, u v n G f and only f u v n G. The topologcal foldng number, or TF number, of G, denoted by tf(g), s gven by tf(g) =f = G = log 2 l(g) +1.

3 Intutvely, TF folds each G nto half (.e., takng away half of the levels together wth ther vertces) to obtan G 1, startng from G 1 = G to G f whch has only one level and cannot be folded any more. Hence, we have the name topologcal foldng. To correctly process reachablty queres, t s necessary for the edge sets E G to mantan the reachablty of the vertces. To effcently process reachablty queres, we also want each E G to be as small as possble. The followng lemma leads to a smple and effcent method to construct E G. LEMMA 2. Let G = (V G,E G) be a l(g)-partte DAG and G = {G 1,G 2,...,G tf(g) } be a topologcal foldng of G. For 2 tf(g), V G 1 \V G s an ndependent set of G 1. PROOF. Accordng to Lemma 1, each L j(g 1) for 1 j l(g 1) s an ndependent set of G 1. Accordng to the defnton of G, V G1 = V G and for 2 tf(g), V G 1 \V G are the vertces at all the odd levels of G 1. Snce each G 1 s a l(g 1)-partte DAG, the unon of the vertces at all the odd levels of G 1 s clearly an ndependent set of G 1. We construct the edge sets E G as follows. E G1 = E G; For 2 tf(g), E G s constructed from G 1 as follows: for each v L j(g 1), wherej s odd, create a new edge n E G from each n-neghbor (f any n G 1) ofv to each out-neghbor (f any n G 1) ofv. LEMMA 3. The edge sets E G constructed above gve a vald topologcal foldng G of a l(g)-partte DAG G =(V G,E G). PROOF. Frst, each G s a l(g )-partte DAG snce each edge n E G only goes from L j(g ) to L j+1(g ), for1 j l(g ). Second, reachablty from each vertex to another s mantaned because each u n L j 1(G 1) s connected to each u out L j+1(g 1) by an edge n E G f the edges (u n,v) and (v, u out) exst n G 1,wherev L j(g 1) and j s odd. Note that the correctness of the proof of Lemma 3 also depends on the valdty of Lemma 2, because f any edge (u, v), where u, v V G 1 \V G,exstsnG 1, then the reachablty establshed n the proof of Lemma 3 wll not be vald. The followng example llustrates the dea of topologcal foldng. EXAMPLE 1. Fgure 1 shows the topologcal foldng of a 6- partte DAG G (l(g) =6). G 2 s constructed from G 1 by addng edges (c, f), (d, f), and (f,h), and then removng all vertces n the odd levels of G 1. Next, odd level vertces of G 2 are removed to form G Dealng wth Cross-Level Edges In Secton 3.1 we ntroduced the basc concepts and structure of topologcal foldng of a DAG and some of ts essental propertes. However, the DAG G of a real world drected graph s rarely l(g)- partte. On the contrary, there can be many cross-level edges n G,.e., there can be edges from vertces n L (G) to vertces n L j(g), where 1 <+1<j l(g), as shown n Fgure 2. To deal wth these cross-level edges n the DAG, we observe that each DAG G n a topologcal foldng G need not be l(g )-partte, but only need the followng essental propertes to be mantaned n each G : (1) the set of vertces to be removed from G s an ndependent set of G 1 for 2 tf(g); and(2) u, v (V G V G), u v n G f and only f u v n G. 1 Fgure 1: Topologcal foldng To construct each G that satsfes the above two propertes, we devse a transformaton scheme for G 1, for2 tf(g), wth whch we construct the correspondng transformed topologcal foldng as follows: Procedure 1. TRANSFORMED TF CONSTRUCTION: 1. G 1 = G,andset =1; 2. Intalze G =G, then do the followng three steps n order: 2.1. For 1 j l(g ) and j s odd, for each v L j(g ): Let U =(L k (G ) nb out(v, G )),wherek>j+1. If U, then add a dummy vertex w to L j+1(g ),add a new edge set {(w, u out) :u out U} and a new edge (v, w) to E G, and remove the edge set {(v, u out) : u out U} from E G For 1 j l(g ) and j s odd, for each v L j(g ): Let U =(L k (G ) nb n(v, G )), wherek<j 1 and k s even. If U, then add a dummy vertex w to L j 1(G ), add a new edge set {(u n,w):u n U} and a new edge (w, v) to E G, and remove the edge set {(u n,v):u n U} from E G For 1 j l(g ) and j s odd, for each v L j(g ): add a new edge set {(u n,u out) :u n (L j 1(G ) nb n(v, G )),u out (L j+1(g ) nb out(v, G ))} to E G. 3. If l(g ) > 1, ntalze G +1 = G, and remove all vertces at odd levels of G +1 together wth all edges ncdent to them; then, set = +1and go to Step 2. Otherwse, return the transformed topologcal foldng G = {G 1,...,G tf(g)} and qut. Note that Step 2.2 gnores all Level-k n-neghbors of v f k s odd, because for ths case a dummy vertex must have been created at an even level n Step 2.1, and s thus also handled n Step 2.2. Also note that we do not ncrease the number of levels n any G or G, and hence tf(g) s stll defned n the same way as n Defnton 3. We also defne the TF number of a vertex as follows. DEFINITION 4 (TOPOLOGICAL FOLDING NUMBER). Let G =(V G,E G) be a DAG, G = {G 1,...,G tf(g)} be the transformed topologcal foldng of G, and let V be the set of dummy vertces created n G.TheTF number of a vertex v (V G V ), denoted by tf(v), s gven by tf(v) =max{ : v V G }. The TF number of G s gven by tf(g) = G = log 2 l(g) + 1. Also note that tf(g) =max{tf(v) :v V G}. We llustrate the concept usng the followng example. 3 2

4 a c e b d f g a b 1 c d a 1 c d b1 2 f b 1 a 1 a 2 e 3 h h e 1 a 1 f G 2 G 2 * g e 1 c d e 1 f b 1 neghbor of w,wehaveu v n G f and only f u v n G j for any u, v (V G V G j ), whch together wth (2) mples (3). Note that by a recursve analyss on (3) of Lemma 4, we can actually prove a stronger lemma that shows u v n G f and only f u v n G j,forallu, v (V G V G j ),where1 j< tf(g) (nstead of j = 1 as n (3) of Lemma 4). 6 h G=G 1 G 1 * h 1 a 2 e 1 f G 3 G 3 * Fgure 2: Transformed topologcal foldng EXAMPLE 2. Fgure 2 shows the transformed topologcal foldng of a DAG. The DAG G n Fgure 2(a) contans a number of cross-level edges: (a, h), (b, f), (d, f), (e, g). ByProcedure1,we frst transform G = G 1 to G 1. At level 1, Step 2.1 s executed, we add dummy vertex a 1 for a, and add edges (a, a 1) and (a 1,h),then edge (a, h) s removed; smlarly, we add b 1, (b, b 1) and (b 1,f), and remove (b, f). Next consder level 3, e 1 s added for e, and we add (e, e 1), (e 1,g), and remove (e, g). At Step 2.3, we add (c, e 1) and (c, f). Fnally for level 5, at Step 2.3, we add (e 1,h) and (f,h). Thus, we have constructed G 1,.e., the fgure on the rght n Fgure 2(a). Note that n G 1, the vertces at all the odd levels are ndependent of each other. At Step 3 these vertces are removed, and we obtan G 2, as shown n Fgure 2(b). Repeatng the process, we obtan G 2 and G 3, whle G 3 s smply the same as G 3. By Defnton 4, tf(v) =1for v {a, b, e, g} snce ther last occurrence s n G 1. Smlarly, tf(v) =2for v {a 1,c,d,b 1,h}, tf(v) =3for v {a 2,e 1,f}, and tf(g) =3. One concern n the process of Procedure 1 s that many dummy vertces and edges may be created. We wll handle these cases n Sectons 5 and 6. In fact, G (or G ) s also not useful for reachablty processng and hence deleted after the labelng process. The followng lemma are mportant n establshng the correctness of our method for reachablty query answerng n Secton 4.1. LEMMA 4. Let G = {G 1,...,G tf(g)} be the transformed topologcal foldng of a DAG G =(V G,E G). LetG be the graph from whch G s transformed. Then, (1) V G 1 \V G s an ndependent set of G 1 for 2 tf(g); and (2) u, v (V G V G ),where1 tf(g), u v n G f and only f u v n G ; and (3) u, v (V G V G j ),wherej = 1 and 1 < tf(g), u v n G f and only f u v n G j. PROOF. We frstprove (1). Accordng to Procedure 1, we obtan G by removng the odd levels of G 1,.e., VG 1 \V G. Snce there s no edge from a vertex to another vertex at the same level n G 1, each level of G 1 s an ndependent set of G 1. Forany edge that goes from u at an odd level to v at another odd level, the edge s removed from G 1 and a dummy vertex s created to preserve the connecton from u to v. Thus, for any u, v VG 1 \V G, (u, v) does not exst n G 1. Next we prove (2). FromG to G, Procedure 1 ether converts a cross-level edge to a path wth a mddle dummy vertex or adds an edge from an n-neghbor to an out-neghbor of an odd-level vertex n G. Thus, n both cases, (2) s true. Lastly, we prove (3). Accordng to Procedure 1, all the crosslevel edges n G j are removed from G j and hence a vertex w at L k of G j,where1 k l(g j ) and k s odd, has only n-neghbors at L k 1 (f any) and out-neghbors at L k+1 (f any). Snce Procedure 1 creates an edge from every n-neghbor of w to every out- 4. LABELING AND QUERY ANSWERING In ths secton, we present our TF-based labelng scheme and dscuss reachablty query answerng usng the labels. 4.1 The Labelng Scheme The label of a vertex s defned as follows. DEFINITION 5 (VERTEX LABEL). Let G =(V G,E G) be a DAG, G = {G 1,...,G tf(g)} be the transformed topologcal foldng of G, and let V be the set of dummy vertces created n G.Then-label and out-label of a vertex v (V G V ), denoted by label n(v) and label out(v), are defned as follows: label n(v): (1) v label n(v), and (2) for any u label n(v), nb n(u, G tf(u)) label n(v). label out(v): (1) v label out(v), and (2) for any u label out(v), nb out(u, G tf(u)) label out(v). Intutvely, we add to label n(v) and label out(v) recursvely the n-neghbors and out-neghbors n the foldng graph G of each vertex u currently n label n(v) and label out(v), where = tf(u). The followng property between a vertex and ts nneghbors/out-neghbors shows that, n constructng the labels for a vertex, we only go for reachable vertces wth hgher TF number and gnore all other reachable vertces. Ths s a crucal desgn prncple of our labelng scheme that leads to a sgnfcant reducton on the label sze (compared wth transtve closure), snce each vertex has O(l(G)) levels of reachable vertces, but only O(lg l(g)) levels of reachable vertces wth hgher TF number. LEMMA 5. If w nb n(u, G tf(u)) or w nb out(u, G tf(u)), then tf(w) >tf(u). PROOF. Snce w s n G tf(u), wehavetf(w) tf(u). However, tf(w) = tf(u) mples that both w and u are n an ndependent set of G tf(u), whch contradcts the fact that the edge (u, w) or (w, u) exsts n G tf(u). Thus, tf(w) tf(u) and tf(w) >tf(u). We use the followng example to llustrate the labelng scheme. EXAMPLE 3. Consder the labelng for vertex a. Intally, a s added to label n(a) and label out(a). Snce tf(a) = 1 and nb n(a, G 1) =, we fnalze label n(a) = {a}. Next, snce nb out(a, G 1) = {a 1,c,d}, {a 1,c,d} are added to label out(a). Snce a 1 has an out-neghbor a 2 n G tf(a 1 ) = G 2, we add a 2 to label out(a). We also add {e 1,f} to label out(a) for nb out(c, G 2)={e 1,f} and nb out(d, G 2)={f}. The vertces {a 2,e 1,f} have TF number of 3 but they have no out-neghbor n G 3, and hence the labelng for a s completed. The labels for all vertces are shown n Table Reachablty Queryng usng Labels We now dscuss how we use the vertex labels to process reachablty queres. Gven two vertces s and t n G, we ask whether s can reach t, the query answer s gven by the followng equaton.

5 vertex label out label n a {a, a 1,c,d,e 1,f} {a} b {b, b 1,d,f} {b} e {e, e 1,f} {c, e} g {g, h} {e 1,f,g} a 1 {a 1, a 2 } {a 1 } c {c, e 1,f} {c} d {d, f} {d} b 1 {b 1,f} {b 1 } h {h} {a 2, e 1,f,h} a 2 {a 2 } {a 2 } e 1 {e 1 } {e 1 } f {f} {f} Table 1: Labelng for the example n Fgure 2 { true, f labelout(s) label s t = n(t) ; (1) false, f label out(s) label n(t) =. We gve an example of reachablty query processng as follows. EXAMPLE 4. Consder the example n Fgure 2, the labelng s shown n Table 1. Suppose the query s to ask whether c can reach h: sncelabel out(c) label n(h) ={e 1,f}, the answer s true. Now consder whether a can reach b: sncelabel out(a) label n(b) =, the answer s false. Lemmas 6-9 and Theorem 1 establsh the correctness of reachablty query answerng by Equaton (1). The lemmas themselves also reveal mportant propertes and the desgn of the TF structure, and hence how TF labelng works for reachablty query answerng. LEMMA 6. Gven a path P = u 1,...,u α n any graph n G, there exsts a sequence of vertces S = u 1 = v 1,...,v β = u α such that for 1 <β: (1) the edge (v,v +1) s n G j where j =mn(tf(v ),tf(v +1)); and (2) the sequence S s maxmal,.e., no sub-sequence can be nserted between any v and v +1 such that the resultant sequence also satsfes (1). PROOF. The path P mples that there exsts a sequence S = u 1,S 1,u 2,S 2,...,u α 1,S α 1,u α, where each S for 1 < α s constructed (accordng to Procedure 1) as follows. If l(u,g j ) = l(u +1,G j ) + 1, where j = mn(tf(u ),tf(u +1)), then ether u or u +1 wll be removed n G j+1 and hence S must be an empty set. In ths case, we have (u,u +1) n G j. Otherwse, (u,u +1) s a cross-level edge n G j,wherej = mn(tf(u ),tf(u +1)), thens s a sequence of dummy vertces. Assume j = tf(u ) (the case j = tf(u +1) can be processed smlarly). To preserve the reachablty from u to u +1 n G j, at least one dummy vertex w must be created n G j together wth the edges (u,w) and (w, u +1). Thus, we have the edge (u,w) n G j. If (w, u +1) s stll a cross-level edge n G j, where j = mn(tf(w),tf(u +1)), then another dummy vertex s to be created n G j to preserve the reachablty from w to u +1 n G j. A recursve expanson n ths way gves the subsequence S = u = w 1,w 2,...,w γ 1,w γ = u +1, where S = w 2,...,w γ 1, andfor1 k<γ, (w k,w k+1 ) n G j and j =mn(tf(w k ),tf(w k+1 )). S s ensured to be maxmal f the above recursve expanson s executed untl no more sub-sequence can be generated. By relabelng the vertces, we obtan S = u 1 = v 1,...,v β = u α such that S satsfes both (1) and(2). Lemma 6 s used to show that a sequence of vertces S wth a specal property (as specfed n the lemma) exsts for a path P n any graph n G. The exstence of such a sequence s essental n provng the correctness of Lemma 9 and hence Theorem 1. LEMMA 7. Gven a sequence of vertces S = s = v 1,...,v β = t, where for 1 < β,theedge(v,v +1) s n G j where j =mn(tf(v ),tf(v +1)): f s and t are both n some graph G φ G,thens t n G φ. PROOF. Frst, each edge (v,v +1) n G j mples v v +1 n G j. We can derve the reachablty from v 1 to v β n G φ as follows. Consder the vertex v S where tf(v ) <φand tf(v ) tf(v) for all v S\{v }. If v exsts n S, then accordng to Procedure 1, v 1 must be connected to v +1 n G tf(v ) n order to preserve the the reachablty from v 1 to v +1 va v. Thus, removng v from S we stll have v 1 v +1 n G j,wherej = mn(tf(v 1),tf(v +1)). We repeat the above process wth S = S\{v } untl we have tf(v) φ for all remanng vertces v n S, and let S = s = v 1,...,v β = t be the new sequence obtaned at the end of ths process. We contnue wth S as follows. Consder the vertex v S that s not n G φ and tf(v ) tf(v) for all v S \{v }. Ifv exsts n S,thenwehavev 1 v n G tf(v 1 ) and v v +1 n G tf(v +1 ).Sncev s not n G φ and tf(v ) >φ, v s a dummy vertex and v preserves the reachablty from v 1 to v +1 n G j,wherej =mn(tf(v 1),tf(v +1)). Thus, removng v from S we stll have v 1 v +1 n G j.we repeat the above process wth S = S \{v } untl all the remanng vertces are n G φ.lets = s = v 1,...,v β = t be the new sequence obtaned at the end of ths process. Note that both s and t are stll n S snce s and t are n G φ. Accordng to the dervaton process, we have v v +1 n G φ for 1 β, from whch we have s = v 1 v β = t. Thus, s t n G φ. Lemma 7 reveals an mportant reachablty relaton between vertces n a sequence as defned n Lemma 6. Ths reachablty relaton s also crucal n the proofs of Lemmas 8 and 9. LEMMA 8. Gven two vertces s, t V G, f there exsts a vertex x label out(s) label n(t),thens t n G. PROOF. Let us frst assume that x s and x t. Then, accordng to Defnton 5, f x label out(s), there exsts a vertex u label out(s) such that x nb out(u, G tf(u)). Moreover, u label out(s) n turn mples that there exsts u label out(s) such that u nb out(u,g tf(u )). Thus, we obtan a sequence S out = s = u 1,...,u α = x, where for 1 < α the edge (u,u +1) s n G tf(u ). Smlarly, we obtan another sequence S n = x = v β,...,v 1 = t, where for 1 < β the edge (v +1,v ) s n G tf(v ). Accordng to Lemma 5, tf(u ) < tf(u +1) for 1 < α and tf(v ) < tf(v +1) for 1 < β. Thus, accordng to Lemma 7, the sequence S = s = u 1,...,u α = x = v β,...,v 1 = t mples that s t n G 1, and hence s t n G = G 1 by Lemma 4. If x = t, then t label out(s) gves the sequence S = s = u 1,...,u α = x = t, whch mples that s t n G. And smlarly for x = s. The followng lemma proves the reverse statement of Lemma 8. LEMMA 9. Gven two vertces s, t V G,fs t n G, then there exsts a vertex x label out(s) label n(t). PROOF. Weshowthatfs t n G, then there exsts a sequence of vertces S = s,...,t such that there s a vertex x n S, where x label out(s) and x label n(t). Frst, s t n G mples that there s a path P = s =...,t n G 1 (by Procedure 1 and Lemma 4). Accordng to Lemma

6 6, there exsts a sequence S = s = w 1,...,w γ = t such that for 1 < γ, the edge (w,w +1) s n G j where j = mn(tf(w ),tf(w +1)),andS s maxmal. Next, we show that there exsts a unque vertex x n S such that tf(x) >tf(w) for all w S\{x}. It s trvally true that there exsts x such that tf(x) tf(w) for all w S\{x}. Tore- move the = sgn, suppose to the contrary that there exsts another vertex x such that tf(x ) = tf(x) = j, whch mples that x and x are both n G j. Assume, wthout loss of generalty, that x appears before x n S. Then, tf(x ) = tf(x) = j mples that x and x are both n an ndependent set of G j accordng to Lemma 4. The ndependence between x and x mples that ether (1) x x or (2) x reaches x va some other vertex x n G j such that tf(x ) >tf(x). For (1), t s a contradcton snce x x n G j accordng to Lemma 7. For (2), we have the path P = x,...,x,...,x n G j and by Lemma 6 we can obtan another sequence S = x,...,x,...,x from P, whch contradcts to the fact that S s maxmal. We complete the proof by showng that the unque vertex x, where tf(x) >tf(w) for all w S\{x}, s n both label out(s) and label n(t). LetS = s = u 1,...,u α = x = v β,...,v 1 = t. We frst consder the sub-sequence s = u 1,...,u α = x. If s = u 1 = u α = x,thenx label out(s) by Defnton 5. If α>1, for each u, we fnd the frst u j,where1 <j α, such that tf(u ) <tf(u j). Suchau j must exst snce there s at least one vertex u α where tf(u ) <tf(u α). Moreover, u u j n G tf(u ) accordng to Lemma 7. Thus, (u,u j) s an edge n G tf(u ) because otherwse, u reaches u j n G tf(u ) va some other vertex u k, whch contradcts to the fact that S s maxmal. Thus, we obtan a sequence s = u 1,...,u α = x, where tf(u ) < tf(u +1) and (u,u +1) s an edge n G tf(u ) for 1 <α. Accordng to Defnton 5, s = u 1 label out(s), u 2 label out(s) snce u 1 label out(s) and u 2 nb out(g tf(u )), 1..., u +1 label out(s) snce u label out(s) and u +1 nb out(g tf(u )),..., x = u α labelout(s) snce u α 1 label out(s) and u α nb out(g tf(u )). Fnally, a smlar analyss shows that x label α 1 n(t). We note that the sequence S n the proof of Lemma 9 may not be unque, but we only need to show the exstence of one such sequence for the proof. The followng theorem proves the correctness of reachablty query answerng by vertex labels. THEOREM 1. Gven a reachablty query whether a vertex s V G can reach another vertex t V G, the answer gven by Equaton 1 s correct. PROOF. The proof follows drectly from Lemmas 8 and REMOVING DUMMY VERTICES The vertex labels constructed n Secton 4 contan dummy vertces, whch may take up a lot of space and ncur extra processng n query answerng. In ths secton, we propose a new label wth all dummy vertces removed. Accordng to Procedure 1, a dummy vertex w s created only as ether an out-neghbor of u or an n-neghbor of v for a crosslevel edge (u, v). If w s created as an out-neghbor of u (or an n-neghbor of v), then u (or v) s called the n-source vertex (or out-source vertex)of w, denoted by src(w) =u (orsrc(w) =v). If src(w) =v s a vertex n G,.e., v s not a dummy vertex, then v s called the root vertex of w, denoted by rt(w). In general, we have rt(w) =src(src( src(w) )). Wth the defnton of n-source/out-source vertces and root vertces, we defne a new vertex label as follows. DEFINITION 6 (VERTEX LABEL WITHOUT DUMMIES). Let f(u) be a functon such that f(u) =rt(u) f u s a dummy vertex, and f(u) = u otherwse. The new labels of a vertex v V G, denoted by label2 n(v) and label2 out(v), are defned as follows: label2 n(v) ={f(u) :u label n(v)}. label2 out(v) ={f(u) :u label out(v)}. Intutvely, label2 n(v) s obtaned by replacng every dummy vertex u n label n(v) wth rt(u), and smlarly for label2 out(v). For all v V G, label2 n(v) label n(v) and label2 out(v) label out(v), snce there can be multple dummy vertces wth the same root vertex and/or the root vertex may already exst n the set. Thus, compared wth label, label2 reduces ndex storage space and mproves queryng effcency. The followng lemma and theorem prove the correctness of query answerng usng label2. LEMMA 10. Gven s, t V G, (1) f x label out(s) and rt(x) / label out(s), then s rt(x) n G; and (2) f x label n(t) and rt(x) / label n(t),thenrt(x) t n G. PROOF. We frst prove (1). From the proof of Lemma 8, x label out(s) mples a sequence S = s = u 1,...,u α = x,where for 1 < α the edge (u,u +1) s n G tf(u ). Snce x s a dummy vertex, accordng to Procedure 1 there exsts another sequence S 2 = rt(x) =v 1,...,v β 1 = src(x),v β = x, where for 1 <β: ether the edge (v,v +1) s n G tf(v ) f rt(x) s an n-source vertex, or (v +1,v ) s n G tf(v ) f rt(x) s an out-source vertex. If rt(x) s an n-source vertex, then we construct the proof as follows. Let y = x. Start from = α 1 to = 2,wereassgn y = u f u = src(y) (note that 1snce s = u 1 = rt(x) contradcts rt(x) / label out(s)). Let s = u 1,...,u α = y be the sub-sequence such that u α 1 src(y). Accordng to Procedure 1, u α 1 s an n-neghbor of rt(x) so that u α 1 s also connected to v 2 n G tf(rt(x)) to preserve the reachablty from u α 1 to rt(x) s cross-level out-neghbors (now va v 2). Note that v 2 may not be n label out(s),.e., S, because v 2 may not be an out-neghbor of u α 1 n G tf(u α 1 ),.e., tf(v 2) <tf(u α 1). Thus, we have the sequence s = u 1,...,u α 1,rt(x), where (u α 1,rt(x)) n G tf(rt(x)), from whch we have s rt(x) n G by Lemma 7. If rt(x) s an out-source vertex, then we have s = u 1,...,u α = x = v β,v β 1 = src(x),...,v 1 = rt(x). Agan, by Lemma 7 we have s rt(x) n G. Smlarly we can prove (2). THEOREM 2. Gven a reachablty query whether a vertex s V G can reach another vertex t V G, the answer gven by Equaton 1 wth label replaced by label2 s correct. PROOF. Let X = label out(s) label n(t) and X2 = label2 out(s) label2 n(t). Weshowthat(1) f X, then X2,and(2) f X =,thenx2 =. We frst prove (1). IfX, then ether () x X, x s not a dummy vertex, or () x X, x s a dummy vertex. For (), x s also n X2 accordng to Defnton 6 and hence X2. For(), rt(x) s n X2 and hence X2. We now prove (2). Suppose to the contrary that X2, whch must be caused by the replacement of some dummy vertex x by rt(x),.e., rt(x) X2 for some dummy vertex x. Wehavethe followng possble cases:

7 vertex label2 out label2 n a {a,c,d,e,f} {a} b {b, d, f} {b} e {e, f} {c, e} g {g, h} {e,f,g} c {c,e,f} {c} d {d, f} {d} h {h} {a,e,f,h} f {f} {f} a b c d f h j k l m e g 1 c d e 2 3 h j k f 1 n (b) G 2 Table 2: Removng dummy vertces from the labels n Table 1 () If x label out(s) and rt(x) / label out(s): thenwehave rt(x) label2 out(s) as a replacement of x. Thus, by Lemma 10, we have s rt(x) n G. Otherwse, rt(x) s orgnally n label out(s) snce rt(x) X2. Thus, we have rt(x) =s,ors rt(x) n G by Lemma 8sncert(x) label out(s) and rt(x) label n(rt(x)). () If x label n(t) and rt(x) / label n(t): then smlarly as () we have rt(x) t n G by Lemma 10. Otherwse, smlarly as () we have ether rt(x) = t, or rt(x) t n G. For every combnaton of the cases n () and () above, we have s t n G, whch mples X by Lemma 9 and thus a contradcton. Therefore, we have our result that X = mples X2 =. Gven (1) and (2), the correctness of the theorem follows drectly from Theorem 1. The followng example llustrates the concept of label2. EXAMPLE 5. Table 2 shows the labelng of the same graph n Example 3 wth dummy vertces removed. In Table 1, we have label out(b) ={b, b 1,d,f}, butlabel2 out(b) ={b, d, f} n Table 2sncert(b 1)=balready exsts n label out(b). Forlabel out(c) = {c, e 1,f} n Table 1, we replace dummy vertex e 1 wth rt(e 1)=e and obtan label2 out(c) = {c, e, f} n 2. Smlarly, we obtan label2 for all other vertces n G. 6. HANDLING HIGH-DEGREE VERTICES In the constructon of G +1 from G,orG from G,manynew edges may be created to connect the n-neghbors of a vertex v to v s out-neghbors. Although such connectons are necessary to preserve reachablty after v s removed, the constructon s costly n the presence of hgh-degree vertces snce the number of edges created s gven by (deg n (v, G ) deg out (v, G )). The followng example llustrates the problem caused by hgh-degree vertces. EXAMPLE 6. Consder the example n Fgure 3(a), f s a hghdegree vertex wth deg n (f,g 1) deg out (f,g 1)=3 5 =15.By Procedure 1, f s removed at the frst teraton and we need to add many edges n order to mantan reachablty n G 2 as shown n Fgure 3(b). In the DAG of many real graphs, often we have a few vertces wth very hgh degree (these vertces normally correspond to gant SCCs n the orgnal drected graph). For example, n the p2p dataset, we have a vertex v wth deg n (v, G 1) = and deg out (v, G 1) = 366. Such hgh-degree vertces wll take up a lot of space n the ntermedate graphs and hence ncur a sgnfcant amount of extra processng n the overall labelng process. Here we propose a method to address ths problem. For smplcty, n the subsequent dscusson we focus on handlng hgh-degree 6 n (a) G=G 1 1 h j f 1 k (c) G 3 Fgure 3: Problem caused by hgh-degree vertces vertces n G 1 = G, but we remark that the method apples to other G n the same way. Gven a vertex v V G, we defne the set of vertces that are reachable from v as reach out(v, G) ={u : v u}, and the set of vertces that can reach v as reach n(v, G) ={u : u v}. LetH be the set of top-k hgh-degree vertces defned as follows: h H and v V G\H, (deg n (h, G) deg out (h, G)) (deg n (v, G) deg out (v, G)). We may set k as the h-ndex value of a graph [8, 9]. We propose a new vertex label of a vertex v V G, denoted by label3 n(v) and label3 out(v), whch have dummy vertces removed as n Secton 5 and hgh-degree vertces handled as follows: 1. For each h H, label3 n(h)={h} and label3 out(h)={h}. 2. For each v V G\H, ntalze label3 n(v) = {h : h H, v reach out(h, G)} and label3 out(v) = {h : h H, v reach n(h, G)}. 3. Remove all vertces n H, together wth all edges ncdent to them, from G. LetG be the remanng graph. 4. For each v V G (.e., v V G\H), construct label2 n(v) and label2 out(v) from G as dscussed n Sectons For each v V G\H, label3 n(v)=label2 n(v) label3 n(v) and label3 out(v) =label2 out(v) label3 out(v). The followng theorem proves the correctness of reachablty query answerng usng label3 obtaned from the above steps. THEOREM 3. Gven a reachablty query whether a vertex s V G can reach another vertex t V G, the answer gven by Equaton 1 wth label replaced by label3 s correct. PROOF. Frst, we show that f s t n G,.e., there exsts a path P = s,...,t n G, then the answer returned s true. 1. If P contans no vertex n H, thenp must be n the remanng graph G. Thus, query answerng usng label2, whch s constructed from G and contaned n label3, returns true as proved n Theorem If P contans at least one vertex h H, then we must have h label3 out(s) and h label3 n(t). Thus, the answer returned s true. Next, we show that f s t n G, then the answer returned s false. Suppose to the contrary that the answer s true,.e., x (label3 out(s) label3 n(t)).

8 c a d b k g e n (a) G=G 1 l j m h 1 c d e 2 k (b) G 2 1 k (c) G 3 Fgure 4: Topologcal foldng wth hgh-degree vertex removed label3 out label3 n a {a,c,f} {a} b {b,d,e,f,k} {b} c {c,f} {c} d {d,f} {d} e {e,f,k} {e} f {f} {f} g {g,k} {e,g} h {h} {f,h} {,l} {f,} j {j,m} {f,j} k {k} {f,k} l {l} {f,l} m {m} {f,m} n {n} {f,l,m,n} (a) l m label2 out label2 n a {a,c,f,h,,j,k} {a} b {b,d,e,f,h,,j,k} {b} c {c,f,h,,j,k} {c} d {d,f,h,,j,k} {d} e {e,f,h,,j,k} {e} f {f,h,,j,k} {c,d,e,f} g {g, k} {e,g} h {h} {h} {} {} j {j} {j} k {k} {k} l {l,n} {,l} m {m,n} {j,m} n {n} {f,,j,n} (b) Table 3: Labelng for G n Fgure3(a): (a) label3 ;(b)label2 1. If x H, then we have s reach n(x, G) and t reach out(x, G), assumng that x s and x t. Thus, we have s x and x t n G, whch mples s t n G. Nowfx = s or x = t, thent reach out(x = s, G) or s reach n(x = t, G), whch agan mples s t n G. In each case, the result contradcts to the fact that s t n G. 2. If x/ H, thenx label2 out(s) and x label2 n(t), whch mples s t n G by Theorem 2. Snce G s a subgraph of G, wehaves t n G, whch s a contradcton. The followng example further llustrates the dea. EXAMPLE 7. Consder the example n Fgure 3. We frst obtan reach n(f,g) = {a, b, c, d, e} and reach out(f,g) = {h,, j, k, l, m, n}. Then, we ntalze label3 for the vertces: label3 out(v) = {f} for each v {a, b, c, d, e}, and label3 n(v) ={f} for each v {h,, j, k, l, m, n}. Then, we remove f and all edges ncdent to f, whch gves the graph as shown n Fgure 4(a). Next we construct the TF and then label2 from the DAG n Fgure 4(a). Fnally, we merge label2 and label3 to obtan the fnal label3 as shown n Table 3(a). Compared wth label2 computed for the graph n Fgure 3(a), whch s shown n Table 3(b), label3 s consderably smaller. The example also reveals that after removng the hgh-degree vertces, the graph becomes much easer to handle. 7. ALGORITHM AND COMPLEXITY In ths secton, we dscuss the algorthmc and complexty ssues of our proposed method. Our method conssts of two man phases, namely, the pre-processng or ndexng phase and the query Algorthm 1: Labelng(G = {G 1,...,G tf(g)}) 1 Let V G =, where = tf(g)+1; 2 for =1,..., tf(g) do 3 foreach v (V G \V G+1 ) do 4 label n (v) {v} {u :(u, v) G }; 5 label out (v) {v} {u :(v, u) G }; 6 for = tf(g),..., 1 do 7 foreach v (V G \V G+1 ) do 8 foreach u label n (v) do 9 label n (v) label n (v) label n (u); 10 foreach u label out(v) do 11 label out (v) label out (v) label out (u); 12 return label n (v) and label out (v) for all vertces v; processng phase. Query processng s just an ntersecton of two sets whch termnates as soon as the frst common element s found and thus the complexty s bounded by the label sze. The preprocessng phase ncludes computng the DAG from an nput drected graph, topologcal sortng of the resultng DAG, constructon of the transformed TF structure, and the label constructon. The steps before labelng are ether smple or have been presented n suffcent detals. We therefore focus our dscusson on the labelng algorthm here. We propose an effcent top-down algorthm to construct the vertex labels defned n Defnton 5. As shown n Algorthm 1, Lnes 1-5 ntalzes label n(v) and label out(v) for each vertex v to contan the n-neghbors and out-neghbors of v n G tf(v). Note that for each v (V G \V G+1 ), tf(v) = snce v no longer exsts n G +1. Lne 1 s ntroduced so that (V G \V G+1 )=V G when = tf(g) n Lnes 3 and 7, snce G tf(g)+1 does not really exst. Lnes 6-11 performs a top-down operaton startng at the hghest level of the TF structure. At each level, for each vertex v (V G \V G+1 ), we smply nclude the n-label (out-label) of v s nneghbors (out-neghbors) n label n(v) (label out(v)). The correctness of Algorthm 1 follows from Defnton 5 and Lemma 5. Whle the algorthm does not remove dummy vertces, we dscuss how t can be handled wth lttle addtonal overhead, as nspred by the followng lemma. LEMMA 11. For any vertex v V G and any G G,atmost two dummy vertces wll be created n G whose root vertex s v. PROOF. Accordng to Procedure 1, ntally we may create one dummy vertex u out as an out-neghbor of v and/or another dummy vertex u n as an n-neghbor of v. Andu out and u n must be created n G tf(v). At most one dummy vertex (let t be w out) wll be created as an out-neghbor of u out snce all ncomng edges of u out are not cross-level edges by constructon. And w out must be created n G j,wherej = tf(u out). Smlarly, at most one dummy vertex wll be created as an out-neghbor of w out, and so on. A smlar analyss apples to u n and thus n any G G,wehave at most two dummy vertces created whose root vertex s v. If v s the root vertex of any dummy vertex and v s the n-source vertex, then Lemma 11 mples the exstence of a unque sequence S out = v = u 1,...,u α, whereu j 1 s the n-source vertex of u j for 1 <j α; thus, we can use only two labels, label n(u j) and label out(u j), to keep the labels for all dummy vertces u j at each level = tf(u j) n Lnes 6-11 of Algorthm 1. Smlarly, the same strategy apples to another unque sequence f v s the root vertex of a set of dummy vertex and v s the out-source vertex.

9 Thus, n the top-down labelng process, n total we mantan at most four labels for each vertex v V G for all dummy vertces created wth v as ther root vertex. Next we analyze the complexty of the pre-processng phase. Computng the DAG takes lnear tme n the sze of the nput drected graph. Gven the DAG G = (V G,E G), topologcal sortng takes O( V G + E G ) tme. Then, we apply Procedure 1 to construct the TF structure, whch takes O(lg l(g)) teratons of Steps 2 and 3. At the -th teraton, we need O( v V (deg G n (v, G ) deg out (v, G ))) tme for the constructon. From Lemma 11, V G 2 V G and the degree of a dummy vertex w s bounded by that of src(w). The total tme complexty s gven by C1 = O( 1 lg l(g) v V G (deg n (v, G ) deg out (v, G ))). The complexty of Algorthm 1, together wth dummy vertex handlng, s bounded by C2 = O( 1 lg l(g) v (V G \V G+1 ) ( u nb n (v,g ) labeln(u) + u nb out (v,g ) labelout(u))). Both C1 and C2 depend on the characterstcs of the nput DAG, especally the vertex degree. Both C1 and C2 can be sgnfcantly reduced by removng the set of hgh-degree vertces H, whch takes O( H ( V G + E G )) tme to remove H and add h H to the labels of other vertces as dscussed n Secton EXPERIMENTAL EVALUATION We mplemented our method, TF-label, n C++ (source code avalable n authors webpage). We compare TF-label wth the followng state-of-the-art methods for processng reachablty queres: PathTree [19], [28], [25], ScaPathTree and. ScaPathTree and are the applcaton of PathTree and n the SCARAB framework [18],.e., frst computng the backbone of the nput DAG and then applyng PathTree or for reachablty queryng (more detals n Secton 1). Though n theory any exstng method can be appled n SCARAB, we were not able to do so for and TF-label due to unfamlarty wth ther system. ScaPathTree and were provded by the authors of [18]. All source codes of the methods we compare wth are the latest verson provded by ther authors, and all were mplemented n C++ and compled usng the same gcc compler as TF-label. We ran all experments on a computer wth an Intel 3.3 GHz CPU, 16GB RAM, and runnng Ubuntu Lnux OS. 8.1 Performance on Real Datasets We frst evaluate the performance of our method on real-world datasets from a wde spectrum of domans. As shown below, the frst set of 7 datasets are from 3 dfferent domans, whle the second set of 5 datasets are from 5 dfferent domans. We want to examne the dfferences n the spectrum of datasets that our method can handle versus those of exstng methods. Real datasets. We used the followng 7 large real datasets that are used n [18, 28] for scalablty test: cteseer, cteseerx and ct-patent (patent) are ctaton networks, n whch non-leaf vertces have an average out-degree of 10 to 30; go-unprot s the jont graph of Gene Ontologyterm and the annotatons from the UnProt database ( whch s the unversal proten resource; unprot22m, unprot100m and unprot150m are the subsets of the complete RFG graph of UnProt. We also used 5 real datasets from Stanford Large Network Dataset Collecton. We selected one large drected graph from each of the followng categores: emal-euall (emal) from communcaton networks, soc-lvejournal1 (LJ) from so- Table 4: Real datasets (K = ) Dataset V G E G V G E G l(g) d avg cteseer 694K 312K cteseerx 6540K 15011K go-unprot 6968K 34770K patent 3775K 16519K unprot22m 1595K 1595K unprot100m 16087K 16087K unprot150m 25038K 25038K emal 265K 420K 231K 223K LJ 4848K 68994K 971K 1024K p2p 63K 148K 48K 55K web 876K 5105K 372K 518K wk 2394K 5021K 2282K 2312K cal networks, p2p-gnutella31 (p2p) from Internet peerto-peer networks, web-google (web) from Web graphs, and wk-talk (wk) from Wkpeda networks. In addton, ct-patent from ctaton networks s already ncluded n the frst 7 graphs. Detaled descrptons of the datasets can be found n (snap.stanford.edu/data). Table 4 lsts the number of vertces and edges n the orgnal drected graph, G, as well as n the DAG G of G, respectvely. We do not show V G and E G for the datasets obtaned from [28] snce the authors dd not provde these numbers. Note that exstng methods for reachablty queryng assume that the nput s a DAG. We also show the topologcal level number of G, l(g), as well as the average degree of the vertces (denoted by d avg) ng. Indexng Performance. We frst report ndexng performance results, but remark that (onlne) query performance should be the more mportant performance ndcator, provded that (offlne) ndexng performance s reasonable. We report the ndex constructon tme (total elapsed tme n seconds) n Table 5. The shortest tme for each dataset s hghlghted n bold. Table 5: Index constructon tme (n sec) TF-label PathTree ScaPathTree cteseer cteseerx go-unprot patent unprot22m unprot100m unprot150m emal LJ p2p web wk For the datasets from [28], has the best performance and the performance of s close to that of. The ndexng tme of TF-label s comparable to that of for most datasets. For cteseerx and patent, TF-label s 135 and 8.5 tmes faster than. Compared wth ScaPathTree, our method s from a few tmes to 74 tmes faster. ScaPathTree was not able to obtan the results for cteseerx and patent, whle PathTree can only run on cteseer. For the datasets from the Stanford Collecton, TF-label s the best for ndexng all the datasets. TF-label s about twce faster than and on average, and up to orders of magntude faster than, PathTree and ScaPathTree. We note that we dd not specfcally pck these datasets, but rather smply selected one large graph from each category of drected graphs (we dd leave out two categores because the DAGs of these graphs are too small,

10 for whch most exstng methods wll be effcent enough). Therefore, the result shows that our method s able to perform well for graphs from varous domans. Table 6 reports the ndex sze (n MB). For the 3 unprot datasets, TF-label s from about 3 to 10 tmes smaller than all other methods. For cteseer, TF-label s only worse than PathTree, but much better than the other methods. But for cteseerx, patent and go-unprot, TF-label s much larger. However, for the second set of 5 datasets, TF-label s much smaller n all cases except p2p for whch t s larger than PathTree. Table 6: Index or label sze (n MB) TF-label PathTree ScaPathTree cteseer cteseerx go-unprot patent unprot22m unprot100m unprot150m emal 0.9 LJ p2p web wk Overall, the results of ndexng tme and ndex sze show that our method s very compettve n ndexng performance, especally for the datasets from the Stanford Collecton. In fact, only and are able to beat TF-label for ndexng a few datasets. However, next we wll show that and are sgnfcantly slower n query processng than TF-label for all datasets. Query Performance. We randomly generate 1 mllon queres for each dataset and Table 7 reports the total tme taken to run the queres (the shortest tme for each dataset s hghlghted n bold). Table 7: Total query processng tme (n mll-sec) TF-label PathTree ScaPathTree cteseer cteseerx go-unprot patent unprot22m unprot100m unprot150m emal LJ p2p web wk The result clearly shows that TF-label outperforms all other methods n all cases except for p2p, for whch TF-label s comparable wth. can run on all datasets, but s from about 2 to 32 tmes slower than TF-label. ScaPathTree and are also sgnfcantly slower than TF-label, and they cannot scale to run on a number of datasets. s up to orders of magntude slower than TF-label and PathTree cannot scale for processng most of the datasets. Another mportant feature of TF-label s that t has stable goodperformance for all datasets, unlke the other methods whch are slow for processng some datasets. For example, s partcularly slow n processng web, for whch ScaPathTree and perform reasonably well. Smlarly, ScaPathTree s slow n processng unprot150m and s slow n processng patent. Such a stable performance from TF-label s mportant for handlng datasets from varous applcaton domans. We also emphasze that TF-label can be further appled n the SCARAB framework, as do and ScaPathTree, to mprove the performance. Thus, our result s mpressve snce TFlabel even sgnfcantly outperforms the exstng methods appled n SCARAB. In the next experment, we show that TF-label scales well where all exstng methods, ncludng SCARAB, cannot scale, for both ndexng and queryng. 8.2 Scalablty and Effects of Varous Graph Propertes We use synthetc datasets to control the dfferent propertes of the DAG graph and hence assess ther effects on the performance of our method, for both effcency and scalablty. Synthetc datasets. We consder three mportant propertes of the DAG graph: (1) the number of vertces (V G), (2) the average vertex degree (d avg), and (3) the number of topologcal levels (l(g)). We generate three categores of datasets as follows (let M =10 6 ): (C1) Fx d avg =3and l(g) =7, then: set V G =5M, 10M, 20M, 40M and 80M, respectvely. (C2) Fx V G =1M and l(g) =7, then: set d avg =10, 20, 30, 40 and 50, respectvely. (C3) Fx V G =1M and d avg =3, then: set l(g) =3, 7, 15, 31 and 63, respectvely. For the generaton of a DAG G wth V G vertces, l(g) levels, and average degree d avg, we frst create V G vertces and dstrbute them to the l(g) levels. Then, for each vertex v at each level, where 1 << l(g), we add one edge from a vertex selected randomly at level 1 to v, and add edges from v to (d avg 1) randomly selected vertces at level j>n G. To test query performance, we randomly generate 1 mllon queres for each dataset. Effect of number of vertces. Fgure 5 reports the performance results of processng the (C1) datasets, where we vary the number of vertces V G from 5M to 80M (M =10 6 ). For ndex constructon, TF-label s sgnfcantly faster than all other methods except. Compared wth, TF-label s slower when V G 20M, but s 3 tmes faster when V G 40M. When V G =80M, all other methods faled (we termnated after t took two orders of magntude longer tme than ours). could only handle 5M vertces, whle PathTree faled even wth 5M vertces (thus not shown n Fgure 5). Moreover, ScaPathTree and also cannot scale well, snce SCARAB faled to construct the backbone for such large datasets. The ndex sze of TF-label s about twce that of, and s 1.5 to 3 tmes smaller than that of the other methods (for the datasets they can handle). For query processng, TF-label s agan sgnfcantly faster than all the other methods. Moreover, we also see that s the slowest and s over an order of magntude slower than TF-label. When V G =40M, s 6400 tmes slower than TF-label. Overall, TF-label s shown to be much more scalable than the exstng methods wth the ncrease n the number of vertces,.e., also n the graph sze. The results also show that the ndexng performance of TF-label scales lnearly wth the ncrease n the graph sze, but remans reasonably stable n query performance. The reason that query tme does not ncreases much when the graph sze ncreases s because the average label sze remans stable, whch can be observed as the ndex sze ncreases only lnearly. Effect of average vertex degree. Fgure 6 reports the performance results of processng the (C2) datasets, where we vary the average vertex degree from 10 to 50.

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

CHAPTER 2 DECOMPOSITION OF GRAPHS

CHAPTER 2 DECOMPOSITION OF GRAPHS CHAPTER DECOMPOSITION OF GRAPHS. INTRODUCTION A graph H s called a Supersubdvson of a graph G f H s obtaned from G by replacng every edge uv of G by a bpartte graph,m (m may vary for each edge by dentfyng

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Non-Split Restrained Dominating Set of an Interval Graph Using an Algorithm

Non-Split Restrained Dominating Set of an Interval Graph Using an Algorithm Internatonal Journal of Advancements n Research & Technology, Volume, Issue, July- ISS - on-splt Restraned Domnatng Set of an Interval Graph Usng an Algorthm ABSTRACT Dr.A.Sudhakaraah *, E. Gnana Deepka,

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Constructing Minimum Connected Dominating Set: Algorithmic approach

Constructing Minimum Connected Dominating Set: Algorithmic approach Constructng Mnmum Connected Domnatng Set: Algorthmc approach G.N. Puroht and Usha Sharma Centre for Mathematcal Scences, Banasthal Unversty, Rajasthan 304022 usha.sharma94@yahoo.com Abstract: Connected

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search Can We Beat the Prefx Flterng? An Adaptve Framework for Smlarty Jon and Search Jannan Wang Guolang L Janhua Feng Department of Computer Scence and Technology, Tsnghua Natonal Laboratory for Informaton

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

All-Pairs Shortest Paths. Approximate All-Pairs shortest paths Approximate distance oracles Spanners and Emulators. Uri Zwick Tel Aviv University

All-Pairs Shortest Paths. Approximate All-Pairs shortest paths Approximate distance oracles Spanners and Emulators. Uri Zwick Tel Aviv University Approxmate All-Pars shortest paths Approxmate dstance oracles Spanners and Emulators Ur Zwck Tel Avv Unversty Summer School on Shortest Paths (PATH05 DIKU, Unversty of Copenhagen All-Pars Shortest Paths

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.

More information

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data Malaysan Journal of Mathematcal Scences 11(S) Aprl : 35 46 (2017) Specal Issue: The 2nd Internatonal Conference and Workshop on Mathematcal Analyss (ICWOMA 2016) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

the nber of vertces n the graph. spannng tree T beng part of a par of maxmally dstant trees s called extremal. Extremal trees are useful n the mxed an

the nber of vertces n the graph. spannng tree T beng part of a par of maxmally dstant trees s called extremal. Extremal trees are useful n the mxed an On Central Spannng Trees of a Graph S. Bezrukov Unverstat-GH Paderborn FB Mathematk/Informatk Furstenallee 11 D{33102 Paderborn F. Kaderal, W. Poguntke FernUnverstat Hagen LG Kommunkatonssysteme Bergscher

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

CE 221 Data Structures and Algorithms

CE 221 Data Structures and Algorithms CE 1 ata Structures and Algorthms Chapter 4: Trees BST Text: Read Wess, 4.3 Izmr Unversty of Economcs 1 The Search Tree AT Bnary Search Trees An mportant applcaton of bnary trees s n searchng. Let us assume

More information

Design and Analysis of Algorithms

Design and Analysis of Algorithms Desgn and Analyss of Algorthms Heaps and Heapsort Reference: CLRS Chapter 6 Topcs: Heaps Heapsort Prorty queue Huo Hongwe Recap and overvew The story so far... Inserton sort runnng tme of Θ(n 2 ); sorts

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

OPTIMIZATION TECHNIQUE FOR PRIME NUMBER LABELING OF DIRECTED ACYCLIC GRAPHS

OPTIMIZATION TECHNIQUE FOR PRIME NUMBER LABELING OF DIRECTED ACYCLIC GRAPHS Journal of Theoretcal and Appled Informaton Technology 5 th February 207. Vol.95. No.3 2005 ongong JATIT & LLS ISSN: 992-8645 www.jatt.org E-ISSN: 87-395 OPTIMIZATION TECHNIQUE FOR PRIME NUMBER LABELING

More information

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound CSE 326: Data Structures Qucksort Comparson Sortng Bound Steve Setz Wnter 2009 Qucksort Qucksort uses a dvde and conquer strategy, but does not requre the O(N) extra space that MergeSort does. Here s the

More information

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound CSE 326: Data Structures Qucksort Comparson Sortng Bound Bran Curless Sprng 2008 Announcements (5/14/08) Homework due at begnnng of class on Frday. Secton tomorrow: Graded homeworks returned More dscusson

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

F Geometric Mean Graphs

F Geometric Mean Graphs Avalable at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 1932-9466 Vol. 10, Issue 2 (December 2015), pp. 937-952 Applcatons and Appled Mathematcs: An Internatonal Journal (AAM) F Geometrc Mean Graphs A.

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Cordial and 3-Equitable Labeling for Some Star Related Graphs

Cordial and 3-Equitable Labeling for Some Star Related Graphs Internatonal Mathematcal Forum, 4, 009, no. 31, 1543-1553 Cordal and 3-Equtable Labelng for Some Star Related Graphs S. K. Vadya Department of Mathematcs, Saurashtra Unversty Rajkot - 360005, Gujarat,

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Report on On-line Graph Coloring

Report on On-line Graph Coloring 2003 Fall Semester Comp 670K Onlne Algorthm Report on LO Yuet Me (00086365) cndylo@ust.hk Abstract Onlne algorthm deals wth data that has no future nformaton. Lots of examples demonstrate that onlne algorthm

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss. Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs:

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

1 Introducton Gven a graph G = (V; E), a non-negatve cost on each edge n E, and a set of vertces Z V, the mnmum Stener problem s to nd a mnmum cost su

1 Introducton Gven a graph G = (V; E), a non-negatve cost on each edge n E, and a set of vertces Z V, the mnmum Stener problem s to nd a mnmum cost su Stener Problems on Drected Acyclc Graphs Tsan-sheng Hsu y, Kuo-Hu Tsa yz, Da-We Wang yz and D. T. Lee? September 1, 1995 Abstract In ths paper, we consder two varatons of the mnmum-cost Stener problem

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Transaction-Consistent Global Checkpoints in a Distributed Database System

Transaction-Consistent Global Checkpoints in a Distributed Database System Proceedngs of the World Congress on Engneerng 2008 Vol I Transacton-Consstent Global Checkponts n a Dstrbuted Database System Jang Wu, D. Manvannan and Bhavan Thurasngham Abstract Checkpontng and rollback

More information

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array Inserton Sort Dvde and Conquer Sortng CSE 6 Data Structures Lecture 18 What f frst k elements of array are already sorted? 4, 7, 1, 5, 1, 16 We can shft the tal of the sorted elements lst down and then

More information

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on

More information

Fast Computation of Shortest Path for Visiting Segments in the Plane

Fast Computation of Shortest Path for Visiting Segments in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 4 The Open Cybernetcs & Systemcs Journal, 04, 8, 4-9 Open Access Fast Computaton of Shortest Path for Vstng Segments n the Plane Ljuan Wang,, Bo Jang

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

Ramsey numbers of cubes versus cliques

Ramsey numbers of cubes versus cliques Ramsey numbers of cubes versus clques Davd Conlon Jacob Fox Choongbum Lee Benny Sudakov Abstract The cube graph Q n s the skeleton of the n-dmensonal cube. It s an n-regular graph on 2 n vertces. The Ramsey

More information

The Shortest Path of Touring Lines given in the Plane

The Shortest Path of Touring Lines given in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 262 The Open Cybernetcs & Systemcs Journal, 2015, 9, 262-267 The Shortest Path of Tourng Lnes gven n the Plane Open Access Ljuan Wang 1,2, Dandan He

More information

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach Data Representaton n Dgtal Desgn, a Sngle Converson Equaton and a Formal Languages Approach Hassan Farhat Unversty of Nebraska at Omaha Abstract- In the study of data representaton n dgtal desgn and computer

More information

A NOTE ON FUZZY CLOSURE OF A FUZZY SET

A NOTE ON FUZZY CLOSURE OF A FUZZY SET (JPMNT) Journal of Process Management New Technologes, Internatonal A NOTE ON FUZZY CLOSURE OF A FUZZY SET Bhmraj Basumatary Department of Mathematcal Scences, Bodoland Unversty, Kokrajhar, Assam, Inda,

More information

Priority queues and heaps Professors Clark F. Olson and Carol Zander

Priority queues and heaps Professors Clark F. Olson and Carol Zander Prorty queues and eaps Professors Clark F. Olson and Carol Zander Prorty queues A common abstract data type (ADT) n computer scence s te prorty queue. As you mgt expect from te name, eac tem n te prorty

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

UNIT 2 : INEQUALITIES AND CONVEX SETS

UNIT 2 : INEQUALITIES AND CONVEX SETS UNT 2 : NEQUALTES AND CONVEX SETS ' Structure 2. ntroducton Objectves, nequaltes and ther Graphs Convex Sets and ther Geometry Noton of Convex Sets Extreme Ponts of Convex Set Hyper Planes and Half Spaces

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

Brave New World Pseudocode Reference

Brave New World Pseudocode Reference Brave New World Pseudocode Reference Pseudocode s a way to descrbe how to accomplsh tasks usng basc steps lke those a computer mght perform. In ths week s lab, you'll see how a form of pseudocode can be

More information

Accounting for the Use of Different Length Scale Factors in x, y and z Directions

Accounting for the Use of Different Length Scale Factors in x, y and z Directions 1 Accountng for the Use of Dfferent Length Scale Factors n x, y and z Drectons Taha Soch (taha.soch@kcl.ac.uk) Imagng Scences & Bomedcal Engneerng, Kng s College London, The Rayne Insttute, St Thomas Hosptal,

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

SAO: A Stream Index for Answering Linear Optimization Queries

SAO: A Stream Index for Answering Linear Optimization Queries SAO: A Stream Index for Answerng near Optmzaton Queres Gang uo Kun-ung Wu Phlp S. Yu IBM T.J. Watson Research Center {luog, klwu, psyu}@us.bm.com Abstract near optmzaton queres retreve the top-k tuples

More information

A fault tree analysis strategy using binary decision diagrams

A fault tree analysis strategy using binary decision diagrams Loughborough Unversty Insttutonal Repostory A fault tree analyss strategy usng bnary decson dagrams Ths tem was submtted to Loughborough Unversty's Insttutonal Repostory by the/an author. Addtonal Informaton:

More information

Sorting: The Big Picture. The steps of QuickSort. QuickSort Example. QuickSort Example. QuickSort Example. Recursive Quicksort

Sorting: The Big Picture. The steps of QuickSort. QuickSort Example. QuickSort Example. QuickSort Example. Recursive Quicksort Sortng: The Bg Pcture Gven n comparable elements n an array, sort them n an ncreasng (or decreasng) order. Smple algorthms: O(n ) Inserton sort Selecton sort Bubble sort Shell sort Fancer algorthms: O(n

More information

An Approach in Coloring Semi-Regular Tilings on the Hyperbolic Plane

An Approach in Coloring Semi-Regular Tilings on the Hyperbolic Plane An Approach n Colorng Sem-Regular Tlngs on the Hyperbolc Plane Ma Louse Antonette N De Las Peñas, mlp@mathscmathadmueduph Glenn R Lago, glago@yahoocom Math Department, Ateneo de Manla Unversty, Loyola

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

Optimization Methods: Integer Programming Integer Linear Programming 1. Module 7 Lecture Notes 1. Integer Linear Programming

Optimization Methods: Integer Programming Integer Linear Programming 1. Module 7 Lecture Notes 1. Integer Linear Programming Optzaton Methods: Integer Prograng Integer Lnear Prograng Module Lecture Notes Integer Lnear Prograng Introducton In all the prevous lectures n lnear prograng dscussed so far, the desgn varables consdered

More information

A Topology-aware Random Walk

A Topology-aware Random Walk A Topology-aware Random Walk Inkwan Yu, Rchard Newman Dept. of CISE, Unversty of Florda, Ganesvlle, Florda, USA Abstract When a graph can be decomposed nto clusters of well connected subgraphs, t s possble

More information

Parallel Solutions of Indexed Recurrence Equations

Parallel Solutions of Indexed Recurrence Equations Parallel Solutons of Indexed Recurrence Equatons Yos Ben-Asher Dep of Math and CS Hafa Unversty 905 Hafa, Israel yos@mathcshafaacl Gad Haber IBM Scence and Technology 905 Hafa, Israel haber@hafascvnetbmcom

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

TF 2 P-growth: An Efficient Algorithm for Mining Frequent Patterns without any Thresholds

TF 2 P-growth: An Efficient Algorithm for Mining Frequent Patterns without any Thresholds TF 2 P-growth: An Effcent Algorthm for Mnng Frequent Patterns wthout any Thresholds Yu HIRATE, Ego IWAHASHI, and Hayato YAMANA Graduate School of Scence and Engneerng, Waseda Unversty {hrate, ego, yamana}@yama.nfo.waseda.ac.jp

More information

Math Homotopy Theory Additional notes

Math Homotopy Theory Additional notes Math 527 - Homotopy Theory Addtonal notes Martn Frankland February 4, 2013 The category Top s not Cartesan closed. problem. In these notes, we explan how to remedy that 1 Compactly generated spaces Ths

More information

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6) Harvard Unversty CS 101 Fall 2005, Shmon Schocken Assembler Elements of Computng Systems 1 Assembler (Ch. 6) Why care about assemblers? Because Assemblers employ some nfty trcks Assemblers are the frst

More information

Sorting. Sorting. Why Sort? Consistent Ordering

Sorting. Sorting. Why Sort? Consistent Ordering Sortng CSE 6 Data Structures Unt 15 Readng: Sectons.1-. Bubble and Insert sort,.5 Heap sort, Secton..6 Radx sort, Secton.6 Mergesort, Secton. Qucksort, Secton.8 Lower bound Sortng Input an array A of data

More information

NETWORKS of dynamical systems appear in a variety of

NETWORKS of dynamical systems appear in a variety of Necessary and Suffcent Topologcal Condtons for Identfablty of Dynamcal Networks Henk J. van Waarde, Petro Tes, and M. Kanat Camlbel arxv:807.09v [math.oc] 2 Jul 208 Abstract Ths paper deals wth dynamcal

More information

Bridges and cut-vertices of Intuitionistic Fuzzy Graph Structure

Bridges and cut-vertices of Intuitionistic Fuzzy Graph Structure Internatonal Journal of Engneerng, Scence and Mathematcs (UGC Approved) Journal Homepage: http://www.jesm.co.n, Emal: jesmj@gmal.com Double-Blnd Peer Revewed Refereed Open Access Internatonal Journal -

More information

Solving Route Planning Using Euler Path Transform

Solving Route Planning Using Euler Path Transform Solvng Route Plannng Usng Euler Path ransform Y-Chong Zeng Insttute of Informaton Scence Academa Snca awan ychongzeng@s.snca.edu.tw Abstract hs paper presents a method to solve route plannng problem n

More information

Memory Modeling in ESL-RTL Equivalence Checking

Memory Modeling in ESL-RTL Equivalence Checking 11.4 Memory Modelng n ESL-RTL Equvalence Checkng Alfred Koelbl 2025 NW Cornelus Pass Rd. Hllsboro, OR 97124 koelbl@synopsys.com Jerry R. Burch 2025 NW Cornelus Pass Rd. Hllsboro, OR 97124 burch@synopsys.com

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Line Clipping by Convex and Nonconvex Polyhedra in E 3

Line Clipping by Convex and Nonconvex Polyhedra in E 3 Lne Clppng by Convex and Nonconvex Polyhedra n E 3 Václav Skala 1 Department of Informatcs and Computer Scence Unversty of West Bohema Unverztní 22, Box 314, 306 14 Plzeò Czech Republc e-mal: skala@kv.zcu.cz

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information