Fast and Robust Distributed Subgraph Enumeration

Size: px
Start display at page:

Download "Fast and Robust Distributed Subgraph Enumeration"

Transcription

1 Fast and Robust Dstrbuted Subgraph Enumeraton Xuguang Ren Grffth Unversty, Australa Wook-Shn Han POSTECH, Publc of Korea Junhu Wang Grffth Unversty, Australa Jeffrey Xu Yu The Chnese Unversty of Hong Kong arxv: v1 [cs.db] 23 Jan 219 ABSTRACT We study the classc subgraph enumeraton problem under dstrbuted settngs. Exstng solutons ether suffer from severe memory crss or rely on large ndexes, whch makes them mpractcal for very large graphs. Most of them follow a synchronous model where the performance s often bottlenecked by the machne wth the worst performance. Motvated by ths, n ths paper, we propose RADS, a Robust Asynchronous Dstrbuted Subgraph enumeraton system. RADS frst dentfes results that can be found usng snglemachne algorthms. Ths strategy not only mproves the overall performance but also reduces network communcaton and memory cost. Moreover, RADS employs a novel regon-grouped mult-round expand verfy & flter framework whch does not need to shuffle and exchange the ntermedate results, nor does t need to replcate a large part of the data graph n each machne. Ths feature not only reduces network communcaton cost and memory usage, but also allows us to adopt smple strateges for memory control and load balancng, makng t more robust. Several heurstcs are also used n RADS to further mprove the performance. Our experments verfed the superorty of RADS to state-of-the-art subgraph enumeraton approaches. Keywords Dstrbuted System, Asynchronous, Subgraph Enumeraton 1. INTRODUCTION Subgraph enumeraton s the problem of fndng all occurrences of a query graph n a data graph. Its soluton s a bass for many other algorthms and t fnds numerous applcatons. Ths problem has been well studed under sngle machne settngs [1][19]. However n the real world, the data graphs are often fragmented and dstrbuted across dfferent stes. Ths phenomenon hghlghts the mportance of dstrbuted systems of subgraph enumeraton. Also, the ncreasng sze of modern graph makes t hard to load the whole graph nto memory, whch further strengthens the requrement of dstrbuted subgraph enumeraton. In recent years, several approaches and systems have been proposed [1, 21, 13, 15, 6, 5]. However, exstng systems ether need to exchange large ntermedate results (e.g., [13],[15] and [21]), or copy and replcate large parts of the data graph on each machne (e.g., [1] and [6, 5]), or rely on heavy ndexes (e.g., [18]). Both exchangng and cachng large ntermedate results and exchangng and cachng large parts of the data graph wll cause heavy burden on the network and on memory, n fact, when the graphs are large these systems tend to crash due to memory depleton. In addton, most of the current systems are synchronous, hence they suffer from synchronzaton delay, that s, the machnes must wat for each other for the completon of certan processng tasks, makng the overall performance equvalent to that of the slowest machne. More detals about exstng work can be found n Secton 8. It s observed n prevous work [15, 18] that when the data graph s large, the number of ntermedate results can be huge, makng the network communcaton cost a bottleneck and causng memory crash. On the other hand, systems that rely on replcaton of large parts of the data graph or heavy ndexes are mpractcal for large data graphs and lowend computer clusters. In ths paper, we present RADS, a Robust Asynchronous Dstrbuted Subgraph enumeraton system. Dfferent from prevous work, our system does not need to exchange ntermedate results or replcate large parts of the data graph. It does not rely on heavy ndexes or suffer from synchronzaton delay. Our system s also more robust due to our memory control strateges and easy for load balancng. To be specfc, we make the followng contrbutons: (1) We propose a novel dstrbuted subgraph enumeraton framework, where the machnes do not need to exchange ntermedate results, nor do they need to replcate large parts of the data graph. (2) We propose a method to dentfy embeddngs that can be found on each local machne ndependent of other machnes, and use sngle-machne algorthm to fnd them. Ths strategy not only mproves the overall performance, but also reduces network communcaton and memory cost.

2 (3) We propose effectve memory control strateges to mnmze the chance of memory crash, makng our system more robust. Our strategy also facltates workload balancng. (4) We propose optmzaton strateges to further mprove the performance. These nclude () a set of rules to compute an effcent executon plan, () a dynamc data structure to compactly store ntermedate results. (5) We conduct extensve experments whch demonstrate that our system s not only sgnfcantly faster than exstng solutons 1, but also more robust. Paper Organzaton In Secton 2, we present the prelmnares. In Secton 3, we present the archtecture and framework of our system RADS. In Secton 4, we present algorthms for computng the executon plan. In Secton 5, we present the embeddng tre data structure to compress our ntermedate results. Our memory control strategy s gven n Secton 6. We present our experments n Secton 7, dscuss related work n Secton 8 and conclude the paper n Secton 9. Some proofs, detaled algorthms and auxlary expermental results are gven n the appendx. 2. PRELIMINARIES Data Graph & Query Graph Both the data graph and query graph (a.k.a query pattern) are assumed to be unlabeled, undrected, and connected graphs. We use G = (V G, E G) and P = (V P, E P ) to denote the data graph and query graph respectvely, where V G and V P are the vertex sets, and E G and E P are the edge sets. We wll use data (resp. query) vertex to refer to vertces n the data (resp. query) graph. Generally, for any graph g, we use V g and E g to denote ts vertex set and edge set respectvely, and for any vertex v n g, we use adj(v) to denote v s neghbour set n g and use deg(v) to denote the degree of v. Subgraph Isomorphsm Gven a data graph G and a query pattern P, P s subgraph somorphc to G f there exsts an njectve functon f: V P V G such that for any edge (u 1, u 2) E P, there exsts an edge (f(u 1), f(u 2)) E G. The njectve functon s also known as an embeddng of P n G (or, from P to G), and t can be represented as a set of vertex pars (u, v) where u V P s mapped to v V G. We wll use R G(P ) to denote the set of all embeddngs of P n G. The problem of subgraph enumeraton s to fnd the set R G(P ). In the lterature, subgraph enumeraton s also referred to as subgraph somorphsm search [16][1][19] and subgraph lstng [12][21]. Partal Embeddng A partal embeddng of graph P n graph G s an embeddng n G of a vertex-nduced subgraph of P. A partal embeddng s a full embeddng f the vertexnduced subgraph s P tself. Symmetry Breakng A symmetry breakng technque based on automorphsm s conventonally used to reduce duplcate embeddngs [8]. As a result the data vertces n the fnal embeddngs should follow a preserved order of the query vertces. We apply ths technque n ths paper by default and we wll specfy the preserved order when necessary. 1 Except for some queres usng [18], whch reles on heavy ndexes. Graph Partton & Storage Gven a data graph G and m machnes {M 1,..., M m} n a dstrbuted envronment, a partton of G s denoted {G 1, G 2,..., G m} where G t s the partton located n the t th machne M t. In ths paper, we assume each partton s stored as an adjacency-lst. For any data vertex v, we assume ts adjacency-lst s stored n a sngle machne M t and we say v s owned by M t (or resdes n M t). We call v a foregn vertex of M t f v s not owned by M t. We say a data edge e s owned by (or resdes n) M t (denoted as e E Gt ) f ether end vertex of e resdes n M t. Note that an edge can resde n two dfferent machnes. For any v owned by M t, we call v a border vertex f any of ts neghbors s owned by other machnes than M t. Otherwse we call t a non-border vertex. We use VG b t to denote the set of all border vertces n M t. 3. RADS ARCHITECTURE In ths secton, we frst present an overvew of the archtecture of RADS, followed by the R-Meef framework of RADS. We gve a detaled mplementaton of R-Meef n Appendx B. 3.1 Archtecture Overvew Fgure 1: RADS Archtecture The archtecture of RADS s shown n Fgure 1. Gven a query pattern P, wthn each machne, RADS frst launches a process of sngle-machne enumeraton (SM-E) and a daemon thread, smultaneously. After SM-E fnshes, RADS launches a R-Meef thread subsequently. Note that the R-Meef threads of dfferent machnes may start at dfferent tme. Sngle-Machne Enumeraton The dea of SM-E s to try to fnd a set of local embeddngs usng a sngle-machne algorthm, such as TurboIso[1], whch does not nvolve any dstrbuted processng. The subsequent dstrbuted process only has to fnd the remanng embeddngs. Ths strategy can not only boost the overall enumeraton effcency but also sgnfcantly reduce the memory cost and communcaton cost of the subsequent dstrbuted process. Moreover the local embeddngs can be used to estmate the space cost of a regon group, whch wll help to effectvely control the memory usage (to be dscussed n Secton 6).

3 We frst defne the concepts of border dstance and span, whch wll be used to dentfy embeddngs that can be found by SM-E. Defnton 1 (Border Dstance). Gven a graph partton G t and data vertex v n G t, the border dstance of v w.r.t G t, denoted as BD Gt (v), s the mnmum shortest dstance between v and any border vertex of G t, that s BD Gt (v) = mn dst(v, v ) (1) v V G b t where dst(v, v ) s the shortest dstance between v and v. Defnton 2 (Span). Gven a query pattern P, the span of query vertex u, denoted as Span P (u), s the maxmum shortest dstance between u and any other vertex of P, that s Span P (u) = max u V P dst(u, u ) (2) Proposton 1. Gven a data vertex v of G t and a query vertex u of P, f Span P (u) BD Gt (v), then there wll be no embeddng f of P n G such that f(u) = v, and f(u ) s not owned by M t, where u P, u u. Proposton 1 states that f the border dstance of v s not smaller than the span of query vertex u, there wll be no cross-machne embeddngs (.e., embeddngs where the query vertces are mapped to data vertces resdng n dfferent machnes) whch map u to v. The proof of Proposton 1 s n the Appendx A.1. Let u start be the startng query vertex (namely, the frst query vertex to be mapped) and C(u start) be the canddate vertex set of u start n G t. Let C 1(u start) C(u start) be the subset of canddates whose border dstance s no less than the span of u start. Accordng to Proposton 1, any embeddng that maps u start to a vertex n C 1(u start) can be found usng a sngle-machne subgraph enumeraton algorthm over G t, ndependent of other machnes. In RADS, the canddates n C 1(u start) wll be processed by SM-E, and the other canddates wll be processed by the subsequent dstrbuted process. The SM-E process s smple, and we wll next focus on the dstrbuted process. For presentaton smplcty, from now on when we say a canddate vertex of u start, we mean a canddate vertex n C(u start) C 1(u start), unless explctly stated otherwse. The dstrbuted process conssts of some daemon threads and the subgraph enumeraton thread: Daemon Threads lsten to requests from other machnes and support four functonaltes: (1) verfye s to return the edge verfcaton results for a gven request consstng of vertex pars. For example, gven a request {(v, v 1), (v 2, v 3)} posted to M 1, M 1 wll return {true, false} f (v, v 1) s an edge n G 1 whle (v 2, v 3) s not. (2) fetchv s to return the adjacency-lsts of the requested vertces of the data graph. The requested vertces sent to machne M must resde n M. (3) checkr s to return the number of unprocessed regon groups (whch s a group of canddate data vertces of the startng query vertex, see Secton 3.2) of the local machne (.e., the machne on whch the thread s runnng). (4) sharer s to return an unprocessed regon group of the local machne to the requester machne. sharer wll also mark the regon group sent out as processed. R-Meef Thread s the core subgraph enumeraton thread. When necessary, the local R-Meef thread sends verfye requests and fetchv requests to the Daemon threads located n other machnes, and the other machnes respond to these requests accordngly. Once a local machne fnshes processng ts own regon groups, t wll broadcast a checkr request to the other machnes. Upon recevng the numbers of unfnshed regon groups from other machnes, t wll send a sharer request to the machne wth the maxmum number of unprocessed regon groups. Once t receves a regon group, t wll process t on the local machne. checkr and sharer are for load balancng purposes only, and they wll not be dscussed further n ths paper. 3.2 The R-Meef Framework Before presentng the detals of the R-Meef framework, we need the followng defntons. Defnton 3 (embeddng canddate). Gven a partton G t of data graph G located n machne M t and a query pattern P, an njectve functon f Gt : V P V G s called an embeddng canddate (EC) of P w.r.t G t f for any edge (u, u ) E P, there exsts an edge (f Gt (u), f Gt (u )) E Gt provded ether f Gt (u) V Gt or f Gt (u ) V Gt. We use R Gt (P ) to denote the set of ECs of P w.r.t G t. Note that for an EC f Gt and a query vertex u, f Gt (u) s not necessarly owned by G t. That s, the adjacency-lst of f Gt (u) may be stored n other machnes. For any query edge (u, u ), an EC only requres that the correspondng data edge (f Gt (u), f Gt (u )) exsts f at least one of f Gt (u) and f Gt (u ) resdes n G t. Therefore, an EC may not be an embeddng. Intutvely, the exstence of the edge (f Gt (u), f Gt (u )) can only be verfed n G t f one of ts end vertces resdes n G t. Otherwse the exstence of the edge cannot be verfed n M t, and we call such edges undetermned edges. Defnton 4. Gven an EC f Gt of query pattern P, for any edge (u, u ) E P, we say (f Gt (u), f Gt (u )) s an undetermned edge of f Gt f nether f Gt (u) nor f Gt (u ) s n G t. Example 1. Consder a partton G t of a data graph G and a trangle query pattern P where V P = {u, u 1, u 2}. The mappng f Gt = {(u, v ), (u, v 1), (u, v 2)} s an EC of P n G w.r.t G t f v V Gt, v 1 adj(v ) and v 2 adj(v ) and nether v 1 nor v 2 resdes n G t. (v 1, v 2) s an undetermned edge of f Gt. Obvously f we want to determne whether f Gt s actually an embeddng of the query pattern, we have to verfy ts undetermned edges n other machnes. For any undetermned edge e, f ts two end vertces resde n two dfferent machnes, we can use ether of them to verfy whether e E G or not. To do that, we need to send a verfye request to one of the machnes. Note that t s possble that an undetermned edge s shared by multple ECs. To reduce network traffc, we do

4 not send verfye requests once for each ndvdual EC, nstead, we buld an edge verfcaton ndex (EVI) and use t to dentfy ECs that share undetermned edges. We assume each EC s assgned an ID (We wll dscuss how to assgn such IDs and how to buld EVI n Secton 5). Defnton 5 (edge verfcaton ndex). Gven a set R Gt (P ) of ECs, the edge verfcaton ndex (EVI) of R Gt (P ) s a key-value map I where Example 3. Consder the query pattern n Fgure 2 (a), we may have a decomposton (dp, dp 1, dp 2, dp 3) where dp.pv = u, dp.lf = {u 1, u 2, u 7 }, dp 1.pv = u 1, dp 1.LF = {u 3, u 4}, dp 2.pv = u 2, dp 2.LF = {u 5, u 6}, and dp 3.pv = u, dp 3.LF = {u 8, u 9}. Note that the edge (u 4, u 5) s not n any decomposton unt. (1) for any tuple (e, IDs) I, the key e s a vertex par (v, v ). the value IDs s the set of IDs of the ECs n R Gt (P ) of whch e s an undetermned edge. (2) for any undetermned edge e of f Gt R Gt (P ), there exsts a unque tuple n I wth e as the key and the ID of f Gt n the value. Intutvely, the EVI groups the ECs that share each undetermned edge together. It s straghtforward to see: Proposton 2. Gven data graph G, query pattern P and an edge verfcaton ndex I, for any (e, IDs) I, f e / E G, then none of the ECs correspondng to IDs can be an embeddng of P n G. Example 2. Consder two embeddng canddates f Gt = {(u, v ), (u, v 1), (u, v 2)} and f G t = {(u, v 3), (u, v 1), (u, v 2)} of a trangle pattern P of a data graph G where V P = {u, u 1, u 2}. Assumng (v 1, v 2) s an undetermned edge, we can have an edge verfcaton ndex: I = {(v 1, v 2) < f Gt, f G t >} where f Gt, f Gt are represented by ther IDs n I. If (v 1, v 2) s verfed non-exstng, both f Gt and f Gt can be fltered out. Lke SEED and Twntwg, we decompose the pattern graph nto small decomposton unts. Defnton 6 (decomposton). A decomposton of query pattern P s a sequence of decomposton unts DE = (dp,..., dp l ) where every dp DE s a subgraph of P such that (1) The vertex set of dp conssts of a pvot vertex pv and a non-empty set LF of leaf 2 vertces, all of whch are vertces n V P ; and for every u LF, (pv, u ) E P. (2) The edge set of dp conssts of two parts, Edp star and Edp sb, where Edp star = u LF {(dp.pv, u )} s the set of edges between the pvot vertex and the leaf vertces, and Edp sb = u,u dp.lf {(u, u ) E P } s the set of edges between the leaf vertces. (3) dp DE (V dp ) = V P, and for < j, V dp dp j.lf =. Note condton (3) n the above defnton says the leaf vertces of each decomposton unt do not appear n the prevous unts. Unlke the decompostons n SEED [15] and TwnTwg [13], our decomposton unt s not restrcted to stars and clques, and dp DE (E dp ) may be a proper subset of E P. 2 In an abuse of the word leaf. Fgure 2: Runnng Example Gven a decomposton DE = (dp,..., dp l ) of pattern P, we defne a sequence of sub-query patterns P,..., P l, where P = dp, and for >, P conssts of the unon of P 1 and dp together wth the edges across the vertces of P 1 and dp, that s, V P = j V dp j, E P = j EP j {(u, u j) E P u P 1, u j dp.lf }. Note that (a) none of the leaf vertces of dp can be n P 1; and (b) P s the subgraph of P nduced by the vertex set V P, and P l = P. We say DE forms an executon plan f for every [1, l], the pvot vertex of dp s n P 1. Formally, we have Defnton 7 (executon plan). A decomposton DE = (dp,..., dp l ) of P s an executon plan (P L) f dp.pv V P 1 for all [1, l]. For example, the decomposton n Example 3 s an executon plan. Let P L = (dp,..., dp l ) be an executon plan. For each dp, we defne E cro dp = {(u, u j) E P u P 1, u j dp.lf }(for > ) the expanson edges, sblng edges, and cross-unt edges respectvely. The sblng edges and cross-unt edges are both called verfcaton edges. Consder dp n Example 3, we have Edp sb ={(u 1, u 2)}, Edp cro =. For dp 2, we have Edp sb 2 ={(u 5, u 6)}, Edp cro 2 ={(u 4, u 5)}. Note that the expanson edges of all the unts form a spannng tree of P, and the verfcaton edges are the edges not n the spannng tree. Wth the above concepts, we are ready to present the R-Meef framework. Gven query pattern P, data graph G We call the edges n E star dp, E sb dp and E cro dp and ts partton G t on machne M t, R-Meef fnds a set of embeddngs of P n G t accordng to an executon plan P L, whch provdes a processng order for the query pattern P. In our approach, each machne M t wll evaluate P n the frst round, and based on the results n round, t wll evaluate the next pattern P +1 n the next round. The fnal results wll be obtaned when P l s evaluated n all machnes (each machne computes a subset of the fnal embeddngs, the unon of whch s the fnal set of embeddngs of P n G). Moreover, n our approach, each machne M t starts by mappng dp.pv (whch s the u start n Secton 3.1) to a canddate vertex of dp.pv that resdes n M t. When the

5 number of such canddate vertces s large, there s a possblty of generatng too many ntermedate results (.e., ECs and embeddngs of P,... P l ). To prevent memory crash, we dvde the canddate vertex set of dp.pv nto dsjont regon groups RG = {rg,..., rg h }, and process each group separately. The workflow of R-Meef s as follows: (1) From the vertces resdng n M t, R-Meef dvdes the canddate vertces of dp.pv nto dfferent regon groups. Then t processes each group sequentally and separately. (2) For each regon group, R-Meef processes one unt at a round based on the executon plan P L. In the th round, the workflow can be llustrated n Fgure 3. Fgure 3: R-Meef workflow In Fgure 3, R Gt (P 1) represents the set of embeddng of P 1 generated and cached from the last round. For the frst round (.e., round ), R Gt (P 1) wll be ntalzed as {(dp.pv, v)} where v s a canddate vertex of dp.pv. By expandng R Gt (P 1), we get all the ECs of P w.r.t M t,.e., R Gt (P ). After verfcaton and flterng, we get all the embeddngs of P for ths regon group of M t. In each round, the expand and verfy & flter processes work as follows: Expand Gven an embeddng f of P 1 obtaned from the prevous round, dp.pv has already been matched to a data vertex v by f snce dp.pv P 1. By searchng the neghborhood of v, we expand f to fnd the ECs of P contanng (dp.pv, v) w.r.t M t. It s worth notng that f v does not resde n M t, we have to fetch ts adjacency-lst from other machnes. Dfferent embeddngs from prevous round may share some common foregn vertces to fetch n order to expand. To reduce network traffc, for all the embeddngs from last round, we gather all the vertces that need to be fetched and then fetch ther adjacency-lsts together by sendng a sngle fetchv request. One mportant assumpton here s that each machne has a record of the ownershp nformaton (.e., whch machne a data vertex resdes n) of all the vertces. Ths record can be constructed offlne as a map whose sze s V, whch can be saved together wth the adjacencylst and takes one extra byte space for each vertex. Verfy & Flter Upon havng a set of ECs (.e. R Gt (P )), we store them compactly n a embeddng tre and buld an EVI from them (the embeddng tre and EVI wll be further dscussed n Secton 5). Then we send a verfye request consstng of the keys of EVI,.e., undetermned data edges, to other machnes to verfy ther exstence. After we get the verfcaton results, each faled key ndcates that the correspondng ECs can be fltered out. The output of the fnal round s the set of embeddngs of query pattern P found by M t for ths regon group. Note that a detaled mplementaton and example of R- Meef s gven n Appendx B. Although the dea of our framework s straghtforward. However, n order to acheve the best performance, each crtcal component of t should be carefully desgned. In the followng sectons, we tackle the challenges one by one. 4. COMPUTING EXECUTION PLAN It s obvous that we may have multple vald executon plans for a query pattern and dfferent executon plans may have dfferent performance. The challenge s how to fnd the most effcent one among them? In ths secton, we present some heurstcs to fnd a good executon plan. 4.1 Mnmzng Number of Rounds Gven query pattern P and an executon plan P L, we have P L +1 rounds for each regon group, and once all the rounds are processed we wll get the set of fnal embeddngs. Also, wthn each round, the workload can be shared. To be specfc, a sngle undetermned edge e may be shared by multple ECs. If these embeddng canddates are generated n the same round, the verfcaton of e can be shared by all of them. The same apples to the foregn vertces where the cost of fetchng and memory space can be shared among multple embeddng canddates f they happen to be n the same round. Therefore, our frst heurstc s to mnmze the number of rounds (namely, the number of decomposton unts) so as to maxmze the workload sharng. Here we present a technque to compute a query executon plan, whch guarantees a mnmum number of rounds. Our technque s based on the concept of maxmum leaf spannng tree [7]. Defnton 8. A maxmum leaf spannng tree (MLST) of pattern P s a spannng tree of P wth the maxmum number of leafs (a leaf s a vertex wth degree 1). The number of leafs n a MLST of P s called the maxmum leaf number of P, denoted l P. A closely related concept s mnmum connected domnatng set. Defnton 9. A connected domnatng set (CDS) of P s a subset D of V P such that (1) D s a domnatng set of P, that s, any vertex of P s ether n D or adjacent to a vertex n D, and (2) the subgraph of P nduced by D s connected. A mnmum connected domnatng set (MCDS) s a CDS wth the smallest cardnalty among all CDSs. The number of vertces n a MCDS s called the connected domnaton number, denoted c P. It s shown n [4] that V P = c P + l P. Theorem 1. Gven a pattern P, any executon plan of P has at least c P decomposton unts, and there exsts an executon plan wth exactly c P decomposton unts. The proof of Theorem 1 s n the Appendx A.1. Theorem 1 ndcates that c P s the mnmum number of rounds of any executon plan. The above proof provdes a method to construct an executon plan wth c P rounds from a MLST. It s worth notng that the decomposton unts n the query plan constructed as n the proof have dstnct pvot vertces.

6 Example 4. Consder the pattern P, t can be easly verfed that the tree obtaned by erasng the edges (u 1, u 2), (u 3, u 4), (u 4, u 5), (u 5, u 6) and (u 8, u 9) s a MLST of P. Choosng u as the root, we wll get a mnmum round executon plan P L 1={dp, dp 1, dp 2} where dp.pv = u, dp.lf = {u 1, u 2, u 7, u 8, u 9}, dp 1.pv = u 1, dp 1.LF = {u 3, u 4} and dp 2.pv = u 2, dp 2.LF = {u 5, u 6}. If we choose u 1 as the root, we wll get a dfferent mnmum-round executon plan P L 2={dp, dp 1, dp 2}, where dp.pv = u 1, dp.lf = {u, u 3, u 4}, dp 1.pv = u, dp 1.LF = {u 2, u 7 u 8, u 9}, dp 2.pv = u 2, dp 2.LF = {u 5, u 6} 4.2 Mnmzng the span of dp.pv Gven a pattern P, multple executon plans may exst wth the mnmum number of rounds, whle ther dp.pv can be dfferent. When facng ths case, here we present our second heurstc whch s to choose the plan(s) whose dp.pv have the smallest span. Ths strategy wll maxmze the number of embeddngs that can be found usng SM-E. Recall the RADS archtecture where dp.pv s the startng query vertex u start, based on Proposton 1, we know that the more canddate vertces of dp.pv can be processed n SM- E, the more workload can be separated from the dstrbuted processng, and therefore the more communcaton cost and memory usage can be reduced. Fgure 4: A Query Pattern Consder the pattern n Fgure 4, the bold edges demonstrate a MLST based on whch both u 3 and u 4 can be chosen as dp.pv. And the executon plans from them have the same number of rounds. However, Span P (u 3) = 2 whle Span P (u 4) = 3. Therefore we choose the plan wth u 3 as the dp.pv. 4.3 Maxmzng Flterng Power Gven a pattern P, multple executon plans may exst wth the mnmum number of rounds and ther dp.pv have the same smallest span. Here we use the thrd heurstc whch s to choose plans wth more verfcaton edges n the earler rounds. The ntuton s to maxmze the flterng power of the verfcaton edges as early as possble. To ths end, we propose the followng score functon SC(P L) for an executon plan P L = {dp,..., dp l }: SC(P L) = dp P L 1 ( + 1) ρ ( Esb dp + Edp cro ) (3) Edp sb + Edp cro s the number of verfcaton edges n round, and ρ s a postve parameter used to tune the score functon. In our experments we use ρ = 1. The functon SC(P L) calculates a score by assgnng larger weghts to the verfcaton edges n earler rounds (snce 1 > 1 (+1) ρ (j+1) ρ f < j). Example 5. Consder the query plans P L 1 and P L 2 n Example 4. The total number of verfcaton edges n these plans are the same. In P L 1, the number of verfcatons edges for the frst, second and thrd round s 2, 1, 2 respectvely. In P L 2, the number of verfcaton edges for the three rounds s 1, 2, and 2 respectvely. Therefore, we prefer P L 1. Usng ρ = 1, we can calculate the scores of the two plans as follows: SC(P L 1) = 2/1 + 1/2 + 2/3 3.2 SC(P L 2) = 1/1 + 2/2 + 2/3 2.7 When several mnmum-round executon plans have the same score, we use another heurstc rule to choose the best one from them: the larger the degree of the pvot vertex, the earler we process the unt. The pvot vertex wth a larger degree has a stronger power to flter unpromsng canddates. To accommodate ths rule, we can modfy the score functon n (1) by addng another component as follows: SC(P L) = dp P L [ Esb dp + Edp cro + deg(dp.pv) ] (4) ( + 1) ρ ( + 1) To ths end, we have a set of rules to follow when to compute the executon plan. Snce the query vertex s normally very small. We can smply enumerate all the possble executon plans and choose the best accordng to those rules. 5. EMBEDDING TRIE As stated before, to save memory, the ntermedate results (whch nclude embeddngs and embeddng canddates generated n each round) are stored a compact data structure called an embeddng tre. Besdes the compresson, the challenges here are how to ensure each ntermedate result has a unque ID n the embeddng tre and the embeddng tre can be easly mantaned? Before we gve our soluton, we frst defne a matchng order, whch s the order followng whch the query vertces are matched n R-Meef. It s also the order the nodes n the embeddng tre are organzed. Defnton 1 (Matchng Order). Gven a query executon plan P L = {dp,..., dp l } of pattern P, the matchng order w. r. t P L s a relaton defned over the vertces of P that satsfes the followng condtons: (1) dp.pv dp j.pv f < j; (2) For any two vertces u 1 dp.lf and u 2 dp j.lf, u 1 u 2 f < j. (3) For [, l]: () dp.pv u for all u dp.lf ; () for any vertces u 1, u 2 dp.lf that are not the pvot vertces of other unts, u 1 u 2 f deg(u 1) deg(u 2), or deg(u 1) = deg(u 2) and the vertex ID of u 1 s less than that of u 2; () f u 1 dp.lf s a pvot vertex of another unt, and u 2 dp.lf s not a pvot vertex of another unt, then u 1 u 2. Intutvely the above relaton orders the vertces of P as follows: (a) Generally a vertex u 1 n dp s before a vertex u 2 n dp j f < j, except for the specal case where u 1 dp.lf and u 2 = dp j.pv. In ths specal case, u 2 may appear n the leaf of some prevous unt dp k (k ), and t may be arranged before u 1 accordng to Condton (2) or Condton (3) (). (b) Startng from dp, the vertex dp.pv s arranged before all other vertces. For the leaf vertces of dp,

7 t arranges those that are pvot vertces of other unts before those that are not (Condton (3)()), and for the former, t arranges them accordng to the ID of the unts for whch they are the pvot vertex 3 (Condton (1)); for the latter, t arranges them n descendng order of ther degree n the orgnal pattern P, and f they have the same degree t arranges them n the order of vertex ID (Condton (3) ()). For each subsequent dp, the pvot vertex must appear n the leaf of some prevous unt, hence ts poston has been fxed; and the leaf vertces of dp are arranged n the same way as the leaf vertces of dp. It s easy to verfy s a strct total order over V P. Followng the matchng order, the vertces of P can be arranged nto an ordered lst. Consder the executon plan P L 1 n Example 4. The vertces n the query can be arranged as (u, u 1, u 2, u 7, u 8, u 9, u 3, u 4, u 5, u 6) accordng to the matchng order. Let P L = {dp,..., dp l } be an executon plan, P be the subgraph of P nduced from the vertces n dp dp (as defned n Secton 3.2), and R be a set of results (.e., embeddngs or embeddng canddates) of P. For easy presentaton, we assume the vertces n P have been arranged nto the lst u, u 1,..., u n by the matchng order, that s, the query vertex at poston j s u j. Then each result of P can be represented as a lst of correspondng data vertces. These lsts can be merged nto a collecton of trees as follows: v: a data vertex parentn: a ponter pontng to ts parent node (the ponter of the root node s null). chldcount: the number of chld nodes of N. (3) If two nodes have the same parent, then they store dfferent data vertces. (4) Every leaf-to-root path represents a result n R, and every result n R s represented as a unque leaf-to-root path. (5) If we dvde the tree nodes nto dfferent levels such that the root nodes are at level, the chldren of the root nodes are at level 1 and so on, then the tree nodes at level j (j [, 1]) store the set of values {f(u j ) f R }. (1) Intally, each result f s treated as a tree T f, where the node at level j stores the data vertex f(u j ) for j [, n], and the root s the node at level. (2) If multple results map u to the same data vertex, merge the root nodes of ther trees. Ths parttons the results n R nto dfferent groups, each group wll be stored n a dstnct tree. (3) For each newly merged node N, f multple chldren of N correspond to the same vertex, merge these chldren nto a sngle chld of N. (4) Repeat step (3) untll no nodes can be merged. The collecton of trees obtaned above s a compact representaton of the results n R. Each leaf node n the tree unquely dentfes a result. The embeddng tre s a collecton of smlar trees. However, snce the purpose of the embeddng tre s to save space, we cannot get t by mergng the result lsts. Instead, we wll have to construct t by nsertng nodes one by one when results are generated, and removng nodes when results are elmnated. Next we formally defne embeddng tre and present the algorthms for the mantenance of the embeddng tre. 5.1 Structure of the Embeddng Tre Defnton 11 (Embeddng Tre). Gven a set R of results of P, the embeddng tre of R s a collecton of trees used to store the results n R such that: (1) Each tree represents a set of results that map u to the same data vertex. (2) Each tree node N has 3 Note that no two unts share the same pvot vertex. Fgure 5: Example of Embeddng Tre Example 6. Consder P n Example 7, where the vertces are ordered as u, u 1, u 2 accordng to the matchng order. There are three ECs of P : (v, v 1, v 2), (v, v 1, v 9) and (v, v 9, v 11). These results can be stored n a tree shown n Fgure 5(a). When the second EC s fltered out, we have R Gt (P ) compressed n a tree as shown n Fgure 5(b). The frst EC can be expanded to an EC of P 1 (where the lst of vertces of P 1 are u, u 1, u 2, u 3, u 4), whch s as shown n Fgure 5(c). Although the structure of embeddng tre s smple, t has some nce propertes: Compresson Storng the results n the embeddng tre saves space than storng them as a collecton of lsts. Unque ID For each result n the embeddng tre, the address of ts leaf node n memory can be used as the unque ID. Retreval Gven a partcular ID represented by a leaf node, we can easly follow ts ponter parentn step-by-step to retreve the correspondng result. Removal To remove a result wth a partcular ID, we can remove ts correspondng leaf node and decrease the chldcount of ts parent node by 1. If ChldCount of ths parent node reaches, we remove ths parent node. Ths process recursvely affects the ancestors of the leaf node.

8 5.2 Mantanng the embeddng tre Recall that n Algorthm 4, gven an embeddng f of P 1, the functon expandembedt re s used to search for the ECs of dp wthn the neghbourhood of the mapped data vertex of v pv, where v pv = f(dp.pv). Moreover, the expandembedt re functon handles the task of expandng the embeddng tre ET by concatenatng f wth each newly found EC of dp. If an EC s fltered out or f an embeddng cannot be expanded to a fnal result, the functon must remove t from ET. Now we present the detals of the expandembedt re functon n Algorthm 1. When dp.pv s mapped to the data vertex v = f(dp.pv) by an embeddng f of P 1, Algorthm 1 uses a backtrackng approach to fnd the ECs of P wthn the neghbourhood of v. The recursve procedure s gven n the subroutne adjenum. In each round of the recursve call, adjenum tres to match u to a canddate vertex v and add (u,v) to f, where u s a query vertex n dp.lf. When f s expanded to an EC of P, whch means an EC of dp s concatenated to the orgnal f, we add t nto ET by channg up the correspondng embeddng tre nodes. If f cannot be expanded nto an EC of P, we wll remove t from ET. Algorthm 1: expandembedtre Input: an embeddng f of P 1, local machne M t, unt dp, embeddng tre ET Output: expanded ET and an edge verfcaton ndex I 1 v f(dp.pv) 2 for each u dp.lf do 3 C(u) adj(v) 4 for each (u, u ) Edp cro do 5 f f(u ) resdes n M t then 6 C(u) adj(f(u )) C(u) 7 f C(u) = then 8 remove f from ET 9 return 1 u next vertex n query vertex lst 11 get N correspondng to f 12 adjenum(n, u) Lnes 1 to 9 of Algorthm 1 compute the canddate set for each u dp.lf as the ntersecton of the neghbor set of v = f(dp.pv) and the neghbor set of each f(u ), where (u, u) s a cross-unt edge and f(u ) s n M t. If any of the canddate sets s empty, t removes f from ET. Otherwse t passes on the next query vertex u and the ID of f (whch s a node n ET ) to the recursve subroutne adjenum. The subroutne adjenum s gven n Algorthm 2. It plays the same roles as the SubgraphSearch procedure n the backtrackng framework [16]. In Lne 1, adjenum creates an local varable F current wth default value false. The value ndcates whether f can be extended to an EC of P. For the leaf vertex u, adjenum frst creates a copy C r(u) of C(u), and then refnes the canddate vertex set C r(u) by consderng every sblng edge (u, u ) where u has already been mapped by f to f(u ). If f(u ) resdes n M t, C r(u) s shrank by an ntersecton wth adj(f(u )) (Lne 2 to 5). Then, for each vertex v n the refned set C r(u), t frst ntalzes a flag E wth the value true (Lne 7), ths value ndcates whether u can be potentally mapped to v. Then f v resdes n M t t wll check every verfcaton edge (u, u ) where u has been mapped to see f (v, f(u )) exsts, f one of such edge does not exst, t wll set E to false (Lnes 8 to 11), meanng u cannot be mapped to v. Ths part (Lnes 7 to 11) s lke the IsJonable functon n the backtrackng framework [16]. If E s stll true after the local verfcaton, we add (u, v) to f (Lne 13). Then we create a new tre node N for v wth N as ts parentn (Lne 14, 15). After that, f f grows to an EC of P, then for each undetermned edge e of f (both end vertces are not n the local machne), we add N to I[e] (Lne 17, 18). We also set the F current as true (Lne 19). If f s not an EC of P, whch means there are stll leaf vertces of dp not matched, we get the next leaf vertex u (Lne 21), and launch a recursve call of adjenum by passng t N and u (Lne 22). We record the return value from ts deeper adjenum as F deeper. If F deeper s true after all the recursve calls, whch means there are ECs wth v mapped to u n f, we ncrease chldcount of the parentnode N and add the newly created N to as a chld of N n ET (Lne 23 to 25). Then we backtrack by removng (u, v) from f, so that we can try to map u to another canddate vertex n C r(v). After we tred all the canddate vertces of C r(u), we return the value of F current (Lne 27). Algorthm 2: adjenum Input: Tre node N representng embeddng f of P 1, leaf vertex u of dp Output: expanded ET and an edge verfcaton ndex I 1 F current false 2 C r(u) C(u) 3 for each u mapped n f and (u, u ) Edp sb do 4 f f(u ) resdes n M t then 5 C r(u) adj(f(u )) C r(u) 6 for each v C r(u) do 7 E true 8 f v resdes n M t then 9 for each (u, u ) (Edp sb Edp cro ) and u mapped n f do 1 f (v, f(u )) not exsts then 11 E false 12 f E s true then 13 add (u, v) to f 14 create a tre node N 15 N.v v, N.parentN N 16 f f = V P then 17 for each undetermned edge e of f do 18 add N to I[e] 19 F current true 2 else 21 u next vertex n dp.lf 22 F deeper adjenum(n, u ) 23 f F deeper s true then 24 N.chldCount add N as a chld node of N n ET 26 remove (u, v) from f 27 return F current Note that the edge verfcaton ndex I s mantaned durng the expanson process. 6. MEMORY CONTROL STRATEGIES

9 Ths secton focuses on the challenge of robustness of R-Meef. Snce R-Meef stll caches fetched foregn vertces and ntermedate results n memory, memory consumpton s stll a crtcal ssue when the data graph s large. We propose a groupng strategy to keep the peak memory usage under the memory capacty of the local machne. Our dea s to dvde the canddate vertces of the frst query vertex dp.pv nto dsjont groups and process each group ndependently. In ths way, the overall cached data on each machne wll be dvded nto several parts, where each part s no larger than the avalable memory Φ. Fgure 6: Groupng Example A nave way of groupng the canddate vertces s to dvde them randomly. However, random groupng of the vertces may put vertces that are dssmlar to each other nto the same group, potentally resultng n more network communcaton cost. Consder the data graph n Fgure 6. Suppose the canddate vertex set s {v,v 1,v 2, v 3}. If we dvde t nto two groups {v, v 1} and {v 2, v 3}, then because v and v 1 share most neghbours, there s a good chance for the ECs of dp generated from v and v 1 to share common verfcaton edges, and share common foregn vertces that need to be fetched (e.g., f dp 1.pv s mapped to v 5 by ECs orgnated from v and v 1, and v 5 s not on the local machne). However, f we partton the canddate set nto {v, v 2} and {v 1, v 3}, then there s lttle chance for such sharng. Our goal s to fnd a way to partton the canddate vertces nto groups so that the chance of edge verfcaton sharng and foregn vertces sharng by the results n each group s maxmzed. Let C C(dp.pv) be the canddate set of dp.pv, and Φ be the avalable memory. Our method s to generate the groups one by one as follows. Frst we pck a random vertex v C and let rg = {v} be the ntal group. If the estmated memory requrement of the results orgnated from rg, denoted φ(rg) (we wll dscuss memory estmaton shortly), s less than Φ, we choose another canddate vertex n C rg that has the greatest proxmty to rg and add t to rg; f φ(rg) Φ we remove the last added vertex from rg. Ths generates the frst group. For the remanng canddate vertces we repeat the process, untl all canddate vertces are dvded nto groups. The detaled algorthm s gven n the Algorthm 3. Here an mportant concept s the the proxmty of a vertex v to a group of vertces, and we defne t as the percentage of v s neghbors that are also neghbors of some vertex n rg, that s, proxmty(v, rg) = adj(v) v rg adj(v ) adj(v) Intutvely the vertces put nto the same group are wthn a regon - each tme we wll choose a new vertex that has a dstance of at most 2 from one of the vertces already n the group (unless there are no such vertces). Therefore we call the group a regon group. (5) Estmatng memory usage In our system, the man memory consumpton comes from the ntermedate results and the fetched foregn vertces. The space cost of other data structures s trval. Consder the set of ntermedate results R orgnated from the group rg C. Recall that all results orgnated from the same canddate vertex of dp.pv are stored n the same tree, whle any results orgnated from dfferent canddate vertces are stored n dfferent trees. Therefore, f we know the space cost of the results orgnated from every canddate vertex, we can add them together to obtan the space cost of all results orgnated from rg. To estmate the space cost of the results orgnated from a sngle vertex, we use the average space cost of local embeddngs of a canddate vertex v C 1(u start) n embeddng tre format, whch can be obtaned when we conduct SM-E. Recall that for each v of C 1(u start) n SM-E, we fnd the local embeddngs orgnated from v followng a backtrackng approach. In each recursve step of the backtrackng approach, we may record the number of canddate vertces that are matched to the correspondng query vertex. The sum of all steps wll be the number of tre nodes f we group the those local embeddngs nto embeddng tre. Based on the sum, we know the space cost of local embeddngs orgnatng from v n the format of embeddng tre.. Next, we consder the space cost of the fetched foregn vertces n each round. Recall that when expandng the embeddngs of P 1 to ECs of P, we only need to fetch vertex v f there exsts f R Gt (P 1) such that f(dp.pv) = v. In the worst case, for every canddate vertex v of dp.pv, there exsts some f R Gt (P 1) whch maps dp.pv to v, and none of these canddate vertces of dp.pv resdes locally. Therefore the number of data vertces that need to be fetched equals to C(dp.pv) n the worst case. In practce, the space cost of C(dp.pv) s usually small compared wth that of the ntermedate results, and we can allocate a certan amount of memory for cachng the fetched data vertces. Note that when more data vertces need to be fetched, we may release some prevously cached data vertces f necessary. Therefore we can gnore the space cost of the fetched data vertces when we estmate the memory cost of each regon group. Algorthm 3: FndRegonGroups Input: the canddate vertex set C = C(dp.pv) on M t Output: A regon group rg 1 Pck a random vertex v C 2 rg {v} 3 C C {v} 4 whle C φ(rg) < Φ do 5 v arg max v C (proxmty(v, rg)) 6 rg rg {v} 7 C C {v} 8 f φ(rg) > Φ then 9 remove last added vertex from rg and put t back nto C 1 return rg 7. EXPERIMENT In ths secton, we present our expermental results. Envronment We conducted our experments n a cluster platform where each machne s equpped wth Intel CPU

10 wth 16 Cores and 16G memory. The operatng system of the cluster s Red Hat Enterprse Lnux 6.5. Algorthms We compared our system wth four state-ofthe-art dstrbuted subgraph enumeraton approaches: PSgL [21], the algorthm usng graph exploraton orgnally based on Pregel. TwnTwg [13], the algorthm usng jonng approach orgnally based on MapReduce. SEED [15], an upgraded verson of TwnTwg whle supportng clque decomposton unt. Crystal [18], the algorthm relyng on clque-ndex and compresson and orgnally usng MapReduce. We mplemented our approach n C++ wth the help of Mpch2 [9] and Boost lbrary [2]. We used Boost.Aso to acheve the asynchronous message lstenng and passng. We used TurboIso[1] as our SM-E processng algorthm. The performance of dstrbuted graph algorthms vares a lot dependng on dfferent programmng languages and dfferent underlne dstrbuted engne and fle systems [3]. It s not far enough to smply compare our approach wth the Pregel-based PSgL or other Hadoop-based approaches. Therefore to acheve a far comparson, we mplemented PSgL, TwnTwg and SEED usng C++ wth MPI lbrary. For Crystal, we chose to use the orgnal program provded by ts authors because our experments wth TwnTwg and SEED ndcate that our mplementaton and the orgnal mplementaton over Hadoop showed no sgnfcant dfference n terms of performance. In memory, we loaded the data graph n each node n the format of adjacency-lst for RADS, PSgL and TwnTwg. In order to support the clque decomposton unt of SEED, we also loaded the edges n-memory between the neghbours of a vertex along wth the adjacency-lst of the vertex. Dataset & Queres We used four real datasets n our experments: DBLP, RoadNet, LveJournal and UK22. The profles of these data sets are gven n Table 1. The dameter n Table 1 s the longest shortest path between any two data vertces. We parttoned each data graph usng the multlevel k-way partton algorthm provded by Mets [11]. DBLP s a relatvely small data graph whch can be loaded nto memory wthout parttonng, however, we stll partton t here. One may argue when the data graph s small, we can use sngle-machne enumeraton algorthms. However, our purpose of usng DBLP here s not to test whch algorthm s better when the graph can be loaded as a whole, but s to test whether the dstrbuted approaches can fully utlze the memory when there s enough space avalable. RoadNet s a larger but much sparser data graph than the others, consequently the number of embeddngs of each query s smaller. Therefore t can be used to llustrate whether a subgraph enumeraton soluton has good flterng power to flter out false embeddngs early. In contrast, the two denser data graphs, lvejournal and UK22, are used to test the algorthms ablty to handle denser graphs wth huge numbers of embeddngs. On dsk, our data graphs are stored n plan text format where each lne represents an adjacency-lst of a vertex. The approach of Crystal reles on the clque-ndex of the data graph whch should be pre-constructed and stored on dsk. Table 1: Profles of datasets Dataset(G) V E Avg. degree Dameter RoadNet 56M 717M K DBLP.3M 1.M LveJournal 4.8M 42.9M UK M 298.1M In Table 2, we present the dsk space cost of the ndex fles generated by the program of Crystal (M for Mega Bytes, G for Gga Bytes). Table 2: Illustraton of the Sze of Index Fles of Crystal Dataset(G) Data Graph Fle Sze Index Fle Sze DBLP 13M 21M RoadNet 2.3G 16.9G LveJournal 51M 6.5G UK 4.1G 6G The queres we used are gven n Fgure 7. Fgure 7: Query Set We evaluate the performance, measured by tme elapsed and communcaton cost, of the fve approaches n Secton 7.1. The cluster we used for ths experment conssts of 1 nodes. Due to space lmt, more expermental results, ncludng executon plan evaluaton and scalablty test etc., are presented n Appendx C. 7.1 Performance Comparson We compare the performance of fve subgraph enumeraton approaches by measurng the tme elapsed (n seconds) and the volume of exchanged data of processng each query pattern. The results of DBLP, RoadNet, LveJournal and UK22 are gven n 8, Fgure 9, Fgure 1 and Fgure 11, respectvely. We mark the result as empty when the test fals due to out-of-memory errors. When any bar reaches the upper bound, t means the correspondng values s beyond the upper bound value shown n the chart. Exp-1:RoadNet The results over the RoadNet dataset are gven n Fgure 8. As can be seen from the fgure, RADS and PSgL are sgnfcantly faster than the other three methods (by more than 1 order of magntude). RADS and PSgL are usng graph exploraton whle the others are usng jon-based methods. Therefore, both RADS and PSgL demonstrated effcent flterng power. Snce jon-based methods need to group the ntermedate results based on keys so as to jon them together, the performance was sgnfcantly dragged down when dealng wth sparse graphs compared wth RADS and PSgL.

11 5 Seed Twntwg Crystal Pads Psgl 175 Seed Twntwg Crystal Pads Psgl 4 15 Tme Elapsed(s) Tme Elapsed(s) q1 q2 q3 q4 q5 q6 q7 q8 q1 q2 q3 q4 q5 q6 q7 q8 (a) Tme cost (a) Tme cost Communcaton Cost(MB) Seed Twntwg Crystal Pads Psgl Communcaton Cost(MB) Seed Twntwg Crystal Pads Psgl q1 q2 q3 q4 q5 q6 q7 q8 q1 q2 q3 q4 q5 q6 q7 q8 (b) Communcaton Cost (b) Communcaton Cost Fgure 8: Performance over RoadNet Fgure 9: Performance over DBLP It s worth notng that PSgL was verfed slower than TwnTwg and SEED n [13][15]. Ths may be because the datasets used n TwnTwg and SEED are much denser than RoadNet, hence a huge number of embeddngs wll be generated. The grouped ntermedate results of TwnTwg and SEED sgnfcantly reduced the cost of network traffc. Another nterestng observaton s that although Crystal has heavy ndexes, ts performance s much worse than PSgL and RADS. The reason s that the number of clques n RoadNet s relatvely small consderng the graph sze. Moreover, there are no clques wth more than two vertces n queres q 1, q 3, q 6, q 7 and q 8. In such cases, the clque ndex cannot help to mprove the performance. As shown n Fgure 8(b), the communcaton cost s not large for any of the approaches (less than 5M for most queres). In partcular, for RADS, the communcaton cost s almost whch s manly because most data vertces can be processed by SM-E, as such no network communcaton s requred. Exp-2:DBLP The result over DBLP s shown n Fgure 9. As aforementoned, DBLP s smaller but much denser than RoadNet. The number of ntermedate results generated n DBLP are much larger than that n RoadNet, as mpled by the data communcaton cost shown n Fgure 9 (b). Snce PSgL does not consder any compresson or groupng over ntermedate results, the communcaton cost of PSgL s much hgher than the other approaches (more than 2M for queres after q 4). Consequently, the tme delay due to shufflng the ntermedate results caused bad performance for PSgL. However, PSgL s stll faster than SEED and TwnTwg. Ths may be because the tme cost of groupng ntermedate results of TwnTwg and SEED s heavy as well. It s worth notng that the communcaton cost of our RADS s qute small (less than 5M). Ths s because of the cachng strategy of RADS where most foregn vertces are only fetched once and cached n the local machne. If most vertces are cached, there wll be no further communcaton cost. The tme effcency of RADS s better than Crystal even for queres q 2,q 4 and q 5 where the trangle crystal can be drectly loaded from ndex wthout any computaton. Exp-3:LveJournal As shown n Fgure 1, for LveJournal, SEED, TwnTwg and PSgL start becomng mpractcal for queres from q 3 to q 8. It took them more than 1 thousand seconds n order to process each of those queres. Due to the huge number of ntermedate results generated, the communcaton cost ncreased sgnfcantly as well, especally for PSgL whose communcaton cost was beyond control when the query vertces reach 6. The method of Crystal acheved good performance for queres q 2, q 4 and q 5. Ths s manly because Crystal smply retreved the cached embeddngs of the trangle to match the vertces (u, u 1, u 2) of those 3 queres. However, when dealng wth the queres wth no good crystals (q 6, q 7 and q 8), our method sgnfcantly outperformed Crystal. One mportant thng to note s that the other three methods (SEED, TwnTwg and PSgL) are senstve to the end vertces, such as u 5 n q 5. Both tme cost and communcaton cost ncreased sgnfcantly from q 4 to q 5. RADS processes those end vertces last by smply enumeratng the combnatons wthout cachng any results related to them. The end vertces wthn Crystal wll be bud vertces whch only requres smple combnatons. As ndcated by query q 5 where ther processng tme ncreased slghtly from that of q 4, RADS and Crystal are ncely tuned to handle end vertces. Exp-4:UK22 As shown n Fgure 11, TwnTwg, SEED and PSgL faled the tests of queres after q 3 due to memory falure caused by huge number of ntermedate results. The communcaton cost of all other methods are sgnfcantly larger than RADS (more than 2 orders of magntude), we omt the chart for communcaton cost here. Smlar to

12 Tme Elapsed(s) Communcaton Cost(MB) Seed Twntwg Crystal Pads Psgl q1 q2 q3 q4 q5 q6 q7 q8 (a) Tme cost Seed Twntwg Crystal Pads Psgl q1 q2 q3 q4 q5 q6 q7 q8 (b) Communcaton Cost Fgure 1: Performance over LveJournal that of LveJournal, the processng tme of Crystal s better than that of RADS for the queres wth clques. Ths s because Crystal drectly retreves the embeddngs of the clques from the ndex. However, for queres wthout good crystals, our approach demonstrates better performance. As shown n Table 2, the ndex fles of Crystal s more than 1 tmes larger than the orgnal data graph. Another advantage of RADS over Crystal s our memory control strateges ensures t s more robust: we tred to set a memory upper bound of 8G and test query q 6, Crystal starts crashng due to memory leaks, whle RADS successfully fnshed the query for ths test. Tme Elapsed(s) Seed Twntwg Crystal Pads Psgl q1 q2 q3 q4 q5 q6 q7 q8 Fgure 11: Tme cost over UK22 8. RELATED WORK The works most closely related to ours are TwnTwg [13], SEED [15] and PSgL [21]. Both [13] and [15] use mult-round two-way jons. [13] uses the same data parttonng as n our work, and t decomposes the query graph P nto a set of small trees dp,..., dp k such that the unon of these trees s equal to P. Snce the decomposton unts are trees, a set of embeddngs of dp can be obtaned on each machne wthout consultng other machnes, and the unon of the embeddngs on all machnes s the set of all embeddngs of dp over G. In the frst round, the embeddngs of dp and dp 1 are joned to obtan that of P 1; n each subsequent round, the embeddngs of P 1 and dp are joned to obtan that of P. Snce the embeddngs of P 1 on one machne must be joned wth the embeddngs of dp on every machne, all the ntermedate results (.e., embeddngs of P 1 and dp ) must be cached and then shuffled based on the jon key and re-dstrbuted to the machnes. Synchronzaton s necessary snce shufflng and re-dstrbuton can only start when all machnes have the ntermedate results ready. [15] s smlar to [13], except that t allows decomposton unts to be clques as well as trees, and t uses bushy jon rather than left-deep jons 4. To compute the ntermedate results for these unts, t adopts a slghtly dfferent data partton strategy: t uses star-clquepreserved parttons. Both TwnTwg and SEED may generate huge ntermedate results, and shufflng, re-dstrbuton and synchronzaton cost a lost of tme. Our approach s dfferent n that we do not use jons, nstead we use expand-verfyflter on each machne, as such we generate less ntermedate results, and we do not need to re-dstrbute them to dfferent machnes. PSgL [21] s based on Pregel [17]. It maps the query vertces one at a tme followng breath-frst traversal, so that partal matches are expanded repeatedly untl the fnal results are obtaned. In ths way t avods explct jons smlar to our approach. However, there are mportant dfferences between PSgL and our system (RADS). (1) In each step of expanson, PSgL needs to shuffle and send the partal matches (ntermedate results) to other machnes, whle RADS does not need to do so. (2) PSgL stores each (partal) match as a node of a statc result tree, whle RADS stores the results n a dynamc and compact data structure. (3) There s no memory control n PSgL. Also closely related to our work are [6] and [5], whch ntroduce systems for parallelzng seral graph algorthms, ncludng (but not lmted to) subgraph somorphsm search algorthms. These systems partton the data graph nto dfferent machnes, but do not partton the query graph. Each machne evaluates the query pattern on ts own machne usng a seral algorthm (e.g., VF2) ndependently of others, but before that t must copy parts of the data graph from other machnes. These parts of the graph are determned as follows. For each boundary vertex v on the current machne, t copes the nodes and edges wthn a dstance d from v, where d s the dameter of the query graph. The fnal results are obtaned by collectng the fnal results from all machnes. Obvously, f the query graph dameter s large, and the data graph dameter s small (e.g., those of socal network graphs), or there are many boundary vertces nvolved, then the entre partton of the neghborng machne may have to be fetched. Ths wll generate heavy network traffc as well as burden on the memory of the local machne. The work [1] treats the query pattern as a conjunctve query, where each predcate represents an edge, and computes the results as a mult-way jon n a sngle round of map and reduce. As observed n [14], the problem wth ths approach s that most edges have to duplcated over sev- 4 There are ndependent optmzaton strateges n each paper, of course.

13 eral machnes n the map phase, hence there s a scalablty problem when the query pattern s complex. Qao et al [18] represent the set I P of all embeddngs of pattern P n a compressed form, code(i P ), based on a mnmum vertex cover of P. It decomposes the query graph P nto a core core(p ) and a set of so-called crystals {p 1,..., p k }, such that code(i P ) can be obtaned by jonng the compressed results of core(p ) and {p 1,..., p k }. Ths jon process can be parallelzed n map-reduce. The compressed results of core(p ) and the crystals can be obtaned from the compressed results of components of P. To expedte query processng, t bulds an ndex of all clques of the data graph, as shown n Table 2. Although no shufflng of ntermedate results s requred, the ndexes of [18] can be many tmes larger than the data graph, and computng/mantanng such bg ndexes can be very expensve, makng t less practcal. BgJon, one of the algorthms proposed n [2], treats a subgraph query as a jon of E P bnary relatons where each relaton represents an edge n P. Smlar to RADS and PSgL, t generates results by expandng partal results a vertex at a tme, assumng a fxed order of the query vertces. BgJon targets achevng worst-case optmalty. Dfferent from our work, t stll needs to shuffle and exchange ntermedate results, and therefore synchronzaton before that. 9. CONCLUSION We presented a practcal asynchronous subgraph enumeraton system RADS whose core s based on a new framework R-Meef(regon-grouped mult-round expand verfy & flter). By processng the data vertces far away from the border usng the sngle-machne algorthms, we solated a large part of vertces whch does not have to nvolve n the dstrbuted process. By passng verfcaton results of foregn edges and adjacency-lst of foregn vertces, RADS sgnfcantly reduced the network communcaton cost. We also proposed a compact format to store the generated ntermedate results. Our query executon plan and several memory control strateges ncludng foregn vertex cachng and regon groups are desgned to mprove the effcency and robustness of RADS. Our experment results have verfed the superorty of RADS compared wth state-of-the-art subgraph enumeraton approaches. 1. ADDITIONAL AUTHORS 11. REFERENCES [1] F. N. Afrat, D. Fotaks, and J. D. Ullman. Enumeratng subgraph nstances usng map-reduce. In ICDE, pages 62 73, 213. [2] K. Ammar, F. McSherry, S. Salhoglu, and M. Joglekar. Dstrbuted evaluaton of subgraph queres usng worst-case optmal and low-memory dataflows. PVLDB, 11(6):691 74, 218. [3] K. Ammar and M. T. Özsu. Expermental analyss of dstrbuted graph systems. PVLDB, 11(1): , 218. [4] R. J. Douglas. Np-completeness and degree restrcted spannng trees. Dscrete Mathematcs, 15(1-3):41 47, [5] W. Fan, P. Lu, X. Luo, J. Xu, Q. Yn, W. Yu, and R. Xu. Adaptve asynchronous parallelzaton of graph algorthms. In SIGMOD, pages , 218. [6] W. Fan, J. Xu, Y. Wu, W. Yu, J. Jang, Z. Zheng, B. Zhang, Y. Cao, and C. Tan. Parallelzng sequental graph computatons. In SIGMOD, pages , 217. [7] H. Fernau, J. Knes, D. Kratsch, A. Langer, M. Ledloff, D. Rable, and P. Rossmanth. An exact algorthm for the maxmum leaf spannng tree problem. Theoretcal Computer Scence, 412(45): , 211. [8] J. A. Grochow and M. Kells. Network motf dscovery usng subgraph enumeraton and symmetry-breakng. In RECOMB, volume 4453, pages 92 16, 27. [9] W. Gropp. MPICH2: A new start for mp mplementatons. In PVM/MPI, pages 7 7, 22. [1] W.-S. Han, J. Lee, and J.-H. Lee. Turbo so: towards ultrafast and robust subgraph somorphsm search n large graph databases. In SIGMOD, pages , 213. [11] G. Karyps and V. Kumar. Mets unstructured graph parttonng and sparse matrx orderng system, verson 2.. Techncal report, [12] H. Km, J. Lee, S. S. Bhowmck, W. Han, J. Lee, S. Ko, and M. H. A. Jarrah. DUALSIM: parallel subgraph enumeraton n a massve graph on a sngle machne. In SIGMOD, pages , 216. [13] L. La, L. Qn, X. Ln, and L. Chang. Scalable subgraph enumeraton n mapreduce. PVLDB, 8(1): , 215. [14] L. La, L. Qn, X. Ln, and L. Chang. Scalable subgraph enumeraton n mapreduce: a cost-orented approach. VLDB J., 26(3): , 217. [15] L. La, L. Qn, X. Ln, Y. Zhang, and L. Chang. Scalable dstrbuted subgraph enumeraton. PVLDB, 1(3): , 216. [16] J. Lee, W. Han, R. Kasperovcs, and J. Lee. An n-depth comparson of subgraph somorphsm algorthms n graph databases. PVLDB, 6(2): , 212. [17] G. Malewcz, M. H. Austern, A. J. C. Bk, J. C. Dehnert, I. Horn, N. Leser, and G. Czajkowsk. Pregel: a system for large-scale graph processng. In PODS, page 6, 29. [18] M. Qao, H. Zhang, and H. Cheng. Subgraph matchng: on compresson and computaton. PVLDB, 11(2): , 217. [19] X. Ren and J. Wang. Explotng vertex relatonshps n speedng up subgraph somorphsm over large graphs. PVLDB, 8(5): , 215. [2] B. Schlng. The Boost C++ Lbrares. XML Press, 211. [21] Y. Shao, B. Cu, L. Chen, L. Ma, J. Yao, and N. Xu. Parallel subgraph lstng n a large-scale graph. In SIGMOD, pages , 214. APPENDIX A. PROOFS A.1 Proof of Proposton 1

14 Proof. Suppose there s an embeddng f such that f(u) = v, f(u ) = v. We show dst(v, v ) BD Gt (v), therefore v must be on M t. Any shortest path from u to u wll be mapped by f to a path n G, therefore dst(v, v ) dst(u, u ) Span P (u). By assumpton, Span P (u) BD Gt (v), therefore dst(v, v ) BD Gt (v). A.2 Proof of Theorem 1 Proof. Suppose {dp,..., dp k } s an executon plan. The plan has k + 1 decomposton unts. Clearly the pvot vertces of the decomposton unts form a connected domnatng set of P. Therefore, k + 1 c P. Ths proves any executon plan has at least c P decomposton unts. Now suppose T s a MLST of P. From V P = c P + l P we know the number of non-leaf vertces n T s c P. We can construct an executon plan by choosng one of the non-leaf vertces v as dp.pv, and all neghbors of v n T as the vertces n dp.lf. Regardng v as the root of the spannng tree T, we then choose each of the non-leaf chldren v of v n T as the pvot vertex of the next decomposton unt dp.pv, and all chldren of v as the vertces n dp.lf. Repeat ths process untl every non-leaf vertex of T becomes the pvot vertex of a decomposton unt. Ths decomposton has exactly c P unts, and t forms an executon plan. Ths shows that there exsts an executon plan wth c P decomposton unts. B. IMPLEMENTATION OF R-Meef We present the mplementaton of R-Meef as shown n Algorthm 4. Algorthm 4: R-Meef Framework Input: Query pattern P, partton G t on machne M t, executon plan P L Output: R Gt (P ) 1 RG = {rg... rg k } regongroups ( C(dp.pv, M t) ) 2 for each regon group rg RG do 3 nt embeddng tre ET wth sze V P 4 nt edge verfcaton ndex I 5 for each data v rg do 6 f (dp.pv, v) 7 I expandembedt re(f, M t, dp, ET ) 8 R verfyf oregne(i) 9 flterf aledembed(r, I, ET ) 1 for Round = 1 to P L do 11 clear I 12 fetchf oregnv () 13 for each f I do 14 I expandembedt re(f, M t, dp, ET ) 15 R verfyf oregne(i) 16 flterf aledembed(r, I, ET ) 17 R G(P ) R G(P ) ET 18 clear ET Wthn each machne, we group the canddate data vertces of dp.pv wthn M t nto regon groups (Lne 1). For each regon group rg, a mult-round mappng process s conducted (Lne 2 to 18). Wthn each round, we use a data structure ET (embeddng tre) to save the generated ntermedate results,.e., embeddngs and embeddng canddates (Lne 3). The edge verfcaton ndex I s ntalzed n Lne 4, whch wll be reset for each round of processng (lne 11). (1) Frst Round (round ) Startng from each canddate v of rg, we match v to dp.pv n the executon plan. After the pvot vertex s matched, we fnd all the ECs of dp wth respect to M t and compress them nto ET. We use a functon expandembedt re to represent ths process (Lne 7). For each EC compressed n ET, ts undetermned edges need to be verfed n order to determne whether ths EC s an embeddng of dp. We record ths nformaton n the edge verfcaton ndex I, whch s constructed n the expandembedt re functon. After we have the EVI I n M t, we send a verfye request to verfy those undetermned edges wthn I n the machne whch has the ablty to verfy t (functon verfyf oregne n Lne 8). After the edges n I are all verfed, we remove the faled ECs from ET (Lne 9). (2) Other Rounds For each of the remanng rounds of the executon plan, we frst clear the EVI I from prevous round (Lne 11). In the th round, we want to fnd all the ECs of P based on the embeddngs n R Gt (P 1) (where dp.pv has been matched). The process s to expand every embeddng f of R Gt (P 1) wth each embeddng canddate of dp wthn the neghbourhood of f(dp.pv). If not all the data vertces matched to dp.pv by the ECs n R Gt (P 1) resde n M t, we wll have to fetch the adjacency-lsts of those foregn vertces from other machnes n order to expand from them. A sub-procedure fetchf oregnv s used to represent ths process (Lne 12). After fetchng, for each embeddng f of R Gt (P 1), we fnd all the ECs of P by expandng from f(dp.pv) (Lne 14). The found ECs are compressed nto ET. Then verf oregne and flterf aledembed are called to make sure that the faled ECs are fltered out from the embeddng tre, whch wll only contan the actual embeddngs of P,.e., R Gt (P ) (Lne 15, 16). After all the rounds of ths regon group have fnshed, we have a set of embeddngs of P compressed nto ET. The results obtaned from all the regon groups are put together to obtan the embeddngs found by M t. One mportant thng to note s that f a foregn vertex s already cached n the local machne, for the undetermned edges attached to ths vertex, we can verfy them locally wthout sendng requests to other machnes. Also we do not re-fetch any foregn vertex f t s already cached prevously. Example 7. Consder the data graph G n Fgure 2, where the vertces marked wth dashed border lnes resde n M 1 and the other vertces resde n M 2. Consder the pattern P and executon plan P L gven n Example 3. We assume the preserved orders due to symmetry breakng are: u 1 < u 2, u 3 < u 6, u 4 < u 5 and u 8 < u 9. There are two vertces {v, v 2} n M and two vertces {v 1, v 1} n M 2 wth a degree not smaller than that of dp.pv. Therefore n M 1, we have C(dp.pv) = {v, v 2} and n M 2 we have C(dp.pv) = {v 1, v 1}. After groupng, assume we have RG = {rg, rg 1} where rg = {v } and rg 1 = {v 2} n M 1, and RG = {rg } where rg = {v 1, v 1} n M 2. Consder the regon group rg n M 1. In round, we frst match v to dp.pv. Expandng from v, we may have ECs ncludng by not lmt to (we lock u 7 to v 7 for easy demonstraton):

15 f G1 = {(u, v ), (u 1, v 1), (u 2, v 2), (u 7, v 7)} f G 1 = {(u, v ), (u 1, v 1), (u 2, v 9), (u 7, v 7)} f G 1 = {(u, v ), (u 1, v 9), (u 2, v 11), (u 7, v 7)} We compress these ECs nto ET. Note that a mappng such as {(u, v ), (u 1, v 1), (u 2, v 11), (u 7, v 7)} s not an EC of dp w.r.t M 1 snce (v 1, v 11) can be locally verfed to be nonexstent. Snce the undermned edge (v 1, v 9) of f G 1 cannot be determned n M 1, we put {(v 1, v 9), < f G 1 >} nto the EVI I. We then ask M 2 to verfy the exstence of the edge. M 2 returns false, therefore f G 1 wll be removed from ET. In round 1, we have two embeddngs R Gt (P )= {f G1, f G 1 } to start wth. To extend f G1 and f G 1, we need to fetch the adjacency-lsts of v 1 and v 9 respectvely. We send a sngle fetchv request to fetch the adjacency-lsts of v 1 and v 9 from M 2. After expanson from v 1, we get a sngle embeddng {(u, v ), (u 1, v 1), (u 2, v 2), (u 3, v 3), (u 4, v 4), (u 7, v 7) } n R Gt (P 1). There s no embeddng of P 1 expanded from v 9. Hence f G 1 wll be removed from the embeddng tre. In round 2, we expand from v 2 to get the ECs of P 2. dp 2.pv was already mapped to v 2 as seen above, and v 2 has neghbors v 5, v 6 and v 1 that are not matched to any query vertces. Snce there are sblng edge (u 5, u 6) and cross-unt edge (u 4, u 5) n P 2, we need to verfy the exstence of (v 4, v 5) and (v 5, v 6) f we want to map u 5 to v 5 and map u 6 to v 6. The exstence of both (v 4, v 5) and (v 5, v 6) can be verfed locally. Smlarly f we want to map u 5 to v 5, u 6 to v 1, we wll have to verfy the exstence of (v 5, v 1), and so on. It can be locally verfed that (v 5, v 1) does not exst, and remotely verfed that (v 6, v 1) does not exst. Therefore, at the end of ths round, we wll get a sngle embeddng for P 2 whch extends the embeddng for P 1 by mappng u 5, u 6 to v 5, v 6 respectvely. We expand the embeddng tre accordngly. Followng the above process, after we process the last round, we have an embeddng of P startng from regon group rg n machne M 1 wll be saved n ET : f G1 = {(u, v ), (u 1, v 1), (u 2, v 2), (u 3, v 3), (u 4, v 4), (u 5, v 5), (u 6, v 6), (u 7, v 7), (u 8, v 9), (u 9, v 11)} C. MORE EXPERIMENTAL RESULTS C.1 Scalablty Test We compare the scalablty of the fve approaches by varyng the number of nodes n the cluster (5, 1, 15), 3 cases n total. The queres we processed are shown n Fgure 7. Instead of reportng the processng tme, here we report the rato between the total processng tme of all queres usng 5 nodes and that of the other two cases, whch we call scalablty rato. The results are as shown n Fgure 12. The most mportant thng to observe s that our approach demonstrates lnear speed-up when the number of nodes s ncreased for Roadnet and DBLP. The reason for Roadnet s because most vertces of each partton are far away from the border, therefore the majorty of embeddngs can be found by SM-E. Each machne of our approach are almost ndependent except for some workload sharng. As for DBLP, whch s a small graph, almost all vertces can be cached n memory, RADS takes full advantage of t. Because TwnTwg, SEED and PSgL faled some queres for LveJournal and UK22, we omt ther scalablty results n those two datasets. The dfference between Crystal and RADS s not much whle RADS s better for both. C.2 Effectveness of Query Executon Plan Scalablty Rato Scalablty Rato Seed Twntwg Crystal Pads Psgl Crystal (a) Roadnet (c) LveJournal Pads Scalablty Rato Scalablty Rato Seed Twntwg Crystal Pads Psgl Crystal Fgure 12: Scalablty Test (b) DBLP Pads (d) UK22 To valdate the effectveness of our strategy for choosng query executon plan, we compare the processng tme of RADS wth two other baselne plans whch are generated by replacng the executon plan of RADS wth the executon plans RanS and RanM, respectvely. RanS represents a plan consstng of random star decomposton unts (no lmt on the sze of the star) and RanM represents plan wth mnmum number of rounds wthout consderng the strateges n Secton 4.3. The cluster we used for ths test conssts of 1 nodes. In order to cover more random query plans, we run each test 5 tmes and report the average. The queres are as shown n Fgure 7. For queres q 1 to q 3, the query plans generated n the above three mplementatons are almost the same. Therefore, we omt the data for those three queres. Tme Elapsed(s) Tme Elapsed(s x1) RanS RanM Pads q4 q5 q6 q7 q8 (a) Roadnet RanS RanM Pads q4 q5 q6 q7 q8 (c) LveJournal Tme Elapsed(s) Tme Elapsed(s) RanS RanM Pads q4 q5 q6 q7 q8 (b) DBLP RanS RanM Pads q4 q5 q6 q7 q8 (d) UK22 Fgure 13: Effectveness of Executon Plan The results of Roadnet, DBLP, LveJournal and UK22 are as shown n Fgure 13. For RoadNet, t s not surprsng to see that the processng tme are almost the same for the 3 executon plans. Ths s because most vertces of each RoadNet partton can be processed by SM-E, and dfferent dstrbuted query executon plans have lttle effect over the total processng tme. For all other three data sets, t s obvous that our fully optmzed executon plan s playng an mportant role n mprovng the query processng tme, especally when dealng wth large graphs such as LveJournal

16 and UK22 where large volumes of network communcaton are generated and can be shared. C.3 Effectveness of Compresson To show the effectveness of our compresson strategy, we conducted an experment to compare the space cost of the smple embeddng-lst (EL) wth that of our embeddng tre (ET). We use the RoadNet and DBLP data sets for ths test. The queres are as shown n Fgure 7. We omt the test over the other two data sets because the uncompressed volume of the results are too bg. Tme Elapsed(s) Tme Elapsed(s) Seed Crystal Pads q1 q2 q3 q4 (a) Roadnet Seed Crystal Pads Tme Elapsed(s) Tme Elapsed(s) Seed Crystal Pads q1 q2 q3 q4 (b) DBLP Seed Crystal Pads Table 3: Compresson on Roadnet(Mb) 2 1 q1 q2 q3 q4 1 4 q1 q2 q3 q4 Query q 1 q 2 q 3 q 4 q 5 q 6 q 7 q 8 EL ET (c) LveJournal (d) UK22 Fgure 15: Results of queres wth more clque Table 4: Compresson on DBLP (Gb) Query q 1 q 2 q 3 q 4 q 5 q 6 q 7 q 8 EL ET.8M Crystal for the queres q 1, q 2 and q 4. Ths s reasonable because of the heavy clque ndex of Crystal. However, RADS has a notceable mprovement over Crystal when processng q 3, where the verfcaton edges helped RADS fltered a lot of unpromsng canddates. The results are as shown n Table 3 and Table 4, respectvely. For RoadNet the ntermedate results generated by Queres 7 and 8 are neglgble, therefore they are not lsted. The results for both datasets demonstrate a good compresson rato. It s worth notng that the compresson ratos of all queres over RoadNet are smaller than that over DBLP. Ths s because the embeddngs of Roadnet are very dverse and they do not share a lot of common vertces. C.4 More Query Processng Results Fgure 14: Queres wth more clques As aforementoned, SEED supports clque as decomposton unt and Crystal ndexes the clques n the graph storage. Both methods shall have advantages when processng queres wth more clques. It s noted that most of the queres n Fgure 7 do not contan any clque. For sound farness, we also tested some queres from [18] for the methods of SEED, Crystal and RADS. The queres are as shown n Fgure 14, all of whch have clques. In contrast to the experment n Secton 7.1, for SEED, here we also used the program mplemented by ts orgnal authors. Ths wll guarantee both SEED and Crystal have ther maxmum optmzed performance when processng those queres. The results are as shown n Fgure 15. We omt the results of SEED for UK22 snce ts tme cost s much hgher compared wth the other two methods. Beng consstent wth the result n Secton 7.1, RADS performs constantly faster than SEED and Crystal when runnng on Roadnet (more than 1 order of magntude) and on DBLP. For other datasets, RADS s stll better than SEED for all queres, whle worse than

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound CSE 326: Data Structures Qucksort Comparson Sortng Bound Steve Setz Wnter 2009 Qucksort Qucksort uses a dvde and conquer strategy, but does not requre the O(N) extra space that MergeSort does. Here s the

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound CSE 326: Data Structures Qucksort Comparson Sortng Bound Bran Curless Sprng 2008 Announcements (5/14/08) Homework due at begnnng of class on Frday. Secton tomorrow: Graded homeworks returned More dscusson

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

Ramsey numbers of cubes versus cliques

Ramsey numbers of cubes versus cliques Ramsey numbers of cubes versus clques Davd Conlon Jacob Fox Choongbum Lee Benny Sudakov Abstract The cube graph Q n s the skeleton of the n-dmensonal cube. It s an n-regular graph on 2 n vertces. The Ramsey

More information

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array Inserton Sort Dvde and Conquer Sortng CSE 6 Data Structures Lecture 18 What f frst k elements of array are already sorted? 4, 7, 1, 5, 1, 16 We can shft the tal of the sorted elements lst down and then

More information

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search

Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search Can We Beat the Prefx Flterng? An Adaptve Framework for Smlarty Jon and Search Jannan Wang Guolang L Janhua Feng Department of Computer Scence and Technology, Tsnghua Natonal Laboratory for Informaton

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

CHAPTER 2 DECOMPOSITION OF GRAPHS

CHAPTER 2 DECOMPOSITION OF GRAPHS CHAPTER DECOMPOSITION OF GRAPHS. INTRODUCTION A graph H s called a Supersubdvson of a graph G f H s obtaned from G by replacng every edge uv of G by a bpartte graph,m (m may vary for each edge by dentfyng

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

F Geometric Mean Graphs

F Geometric Mean Graphs Avalable at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 1932-9466 Vol. 10, Issue 2 (December 2015), pp. 937-952 Applcatons and Appled Mathematcs: An Internatonal Journal (AAM) F Geometrc Mean Graphs A.

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

CE 221 Data Structures and Algorithms

CE 221 Data Structures and Algorithms CE 1 ata Structures and Algorthms Chapter 4: Trees BST Text: Read Wess, 4.3 Izmr Unversty of Economcs 1 The Search Tree AT Bnary Search Trees An mportant applcaton of bnary trees s n searchng. Let us assume

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss. Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs:

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

ELEC 377 Operating Systems. Week 6 Class 3

ELEC 377 Operating Systems. Week 6 Class 3 ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems

More information

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints TPL-ware Dsplacement-drven Detaled Placement Refnement wth Colorng Constrants Tao Ln Iowa State Unversty tln@astate.edu Chrs Chu Iowa State Unversty cnchu@astate.edu BSTRCT To mnmze the effect of process

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT

DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT Bran J. Wolf, Joseph L. Hammond, and Harlan B. Russell Dept. of Electrcal and Computer Engneerng, Clemson Unversty,

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming CS 4/560 Desgn and Analyss of Algorthms Kent State Unversty Dept. of Math & Computer Scence LECT-6 Dynamc Programmng 2 Dynamc Programmng Dynamc Programmng, lke the dvde-and-conquer method, solves problems

More information

Non-Split Restrained Dominating Set of an Interval Graph Using an Algorithm

Non-Split Restrained Dominating Set of an Interval Graph Using an Algorithm Internatonal Journal of Advancements n Research & Technology, Volume, Issue, July- ISS - on-splt Restraned Domnatng Set of an Interval Graph Usng an Algorthm ABSTRACT Dr.A.Sudhakaraah *, E. Gnana Deepka,

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introducton 1.1 Parallel Processng There s a contnual demand for greater computatonal speed from a computer system than s currently possble (.e. sequental systems). Areas need great computatonal

More information

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example Unversty of Brtsh Columba CPSC, Intro to Computaton Jan-Apr Tamara Munzner News Assgnment correctons to ASCIIArtste.java posted defntely read WebCT bboards Arrays Lecture, Tue Feb based on sldes by Kurt

More information

AADL : about scheduling analysis

AADL : about scheduling analysis AADL : about schedulng analyss Schedulng analyss, what s t? Embedded real-tme crtcal systems have temporal constrants to meet (e.g. deadlne). Many systems are bult wth operatng systems provdng multtaskng

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries Run-Tme Operator State Spllng for Memory Intensve Long-Runnng Queres Bn Lu, Yal Zhu, and lke A. Rundenstener epartment of Computer Scence, Worcester Polytechnc Insttute Worcester, Massachusetts, USA {bnlu,

More information

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments Comparson of Heurstcs for Schedulng Independent Tasks on Heterogeneous Dstrbuted Envronments Hesam Izakan¹, Ath Abraham², Senor Member, IEEE, Václav Snášel³ ¹ Islamc Azad Unversty, Ramsar Branch, Ramsar,

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Greedy Technique - Definition

Greedy Technique - Definition Greedy Technque Greedy Technque - Defnton The greedy method s a general algorthm desgn paradgm, bult on the follong elements: confguratons: dfferent choces, collectons, or values to fnd objectve functon:

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Constructing Minimum Connected Dominating Set: Algorithmic approach

Constructing Minimum Connected Dominating Set: Algorithmic approach Constructng Mnmum Connected Domnatng Set: Algorthmc approach G.N. Puroht and Usha Sharma Centre for Mathematcal Scences, Banasthal Unversty, Rajasthan 304022 usha.sharma94@yahoo.com Abstract: Connected

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

5 The Primal-Dual Method

5 The Primal-Dual Method 5 The Prmal-Dual Method Orgnally desgned as a method for solvng lnear programs, where t reduces weghted optmzaton problems to smpler combnatoral ones, the prmal-dual method (PDM) has receved much attenton

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION Overvew 2 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION Introducton Mult- Smulator MASIM Theoretcal Work and Smulaton Results Concluson Jay Wagenpfel, Adran Trachte Motvaton and Tasks Basc Setup

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Sorting. Sorting. Why Sort? Consistent Ordering

Sorting. Sorting. Why Sort? Consistent Ordering Sortng CSE 6 Data Structures Unt 15 Readng: Sectons.1-. Bubble and Insert sort,.5 Heap sort, Secton..6 Radx sort, Secton.6 Mergesort, Secton. Qucksort, Secton.8 Lower bound Sortng Input an array A of data

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vidyanagar

CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vidyanagar CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vdyanagar Faculty Name: Am D. Trved Class: SYBCA Subject: US03CBCA03 (Advanced Data & Fle Structure) *UNIT 1 (ARRAYS AND TREES) **INTRODUCTION TO ARRAYS If we want

More information

1 Introducton Gven a graph G = (V; E), a non-negatve cost on each edge n E, and a set of vertces Z V, the mnmum Stener problem s to nd a mnmum cost su

1 Introducton Gven a graph G = (V; E), a non-negatve cost on each edge n E, and a set of vertces Z V, the mnmum Stener problem s to nd a mnmum cost su Stener Problems on Drected Acyclc Graphs Tsan-sheng Hsu y, Kuo-Hu Tsa yz, Da-We Wang yz and D. T. Lee? September 1, 1995 Abstract In ths paper, we consder two varatons of the mnmum-cost Stener problem

More information

Report on On-line Graph Coloring

Report on On-line Graph Coloring 2003 Fall Semester Comp 670K Onlne Algorthm Report on LO Yuet Me (00086365) cndylo@ust.hk Abstract Onlne algorthm deals wth data that has no future nformaton. Lots of examples demonstrate that onlne algorthm

More information

SAO: A Stream Index for Answering Linear Optimization Queries

SAO: A Stream Index for Answering Linear Optimization Queries SAO: A Stream Index for Answerng near Optmzaton Queres Gang uo Kun-ung Wu Phlp S. Yu IBM T.J. Watson Research Center {luog, klwu, psyu}@us.bm.com Abstract near optmzaton queres retreve the top-k tuples

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6) Harvard Unversty CS 101 Fall 2005, Shmon Schocken Assembler Elements of Computng Systems 1 Assembler (Ch. 6) Why care about assemblers? Because Assemblers employ some nfty trcks Assemblers are the frst

More information

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution Dynamc Voltage Scalng of Supply and Body Bas Explotng Software Runtme Dstrbuton Sungpack Hong EE Department Stanford Unversty Sungjoo Yoo, Byeong Bn, Kyu-Myung Cho, Soo-Kwan Eo Samsung Electroncs Taehwan

More information

Needed Information to do Allocation

Needed Information to do Allocation Complexty n the Database Allocaton Desgn Must tae relatonshp between fragments nto account Cost of ntegrty enforcements Constrants on response-tme, storage, and processng capablty Needed Informaton to

More information

Research of Dynamic Access to Cloud Database Based on Improved Pheromone Algorithm

Research of Dynamic Access to Cloud Database Based on Improved Pheromone Algorithm , pp.197-202 http://dx.do.org/10.14257/dta.2016.9.5.20 Research of Dynamc Access to Cloud Database Based on Improved Pheromone Algorthm Yongqang L 1 and Jn Pan 2 1 (Software Technology Vocatonal College,

More information

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION 24 CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION The present chapter proposes an IPSO approach for multprocessor task schedulng problem wth two classfcatons, namely, statc ndependent tasks and

More information

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING An Improved K-means Algorthm based on Cloud Platform for Data Mnng Bn Xa *, Yan Lu 2. School of nformaton and management scence, Henan Agrcultural Unversty, Zhengzhou, Henan 450002, P.R. Chna 2. College

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

A Comparison of Top-k Temporal Keyword Querying over Versioned Text Collections

A Comparison of Top-k Temporal Keyword Querying over Versioned Text Collections A Comparson of Top-k Temporal Keyword Queryng over Versoned Text Collectons Wenyu Huo and Vassls J. Tsotras Department of Computer Scence and Engneerng Unversty of Calforna, Rversde Rversde, CA, USA {whuo,tsotras}@cs.ucr.edu

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Design and Analysis of Algorithms

Design and Analysis of Algorithms Desgn and Analyss of Algorthms Heaps and Heapsort Reference: CLRS Chapter 6 Topcs: Heaps Heapsort Prorty queue Huo Hongwe Recap and overvew The story so far... Inserton sort runnng tme of Θ(n 2 ); sorts

More information

An Efficient Garbage Collection for Flash Memory-Based Virtual Memory Systems

An Efficient Garbage Collection for Flash Memory-Based Virtual Memory Systems S. J and D. Shn: An Effcent Garbage Collecton for Flash Memory-Based Vrtual Memory Systems 2355 An Effcent Garbage Collecton for Flash Memory-Based Vrtual Memory Systems Seunggu J and Dongkun Shn, Member,

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

such that is accepted of states in , where Finite Automata Lecture 2-1: Regular Languages be an FA. A string is the transition function,

such that is accepted of states in , where Finite Automata Lecture 2-1: Regular Languages be an FA. A string is the transition function, * Lecture - Regular Languages S Lecture - Fnte Automata where A fnte automaton s a -tuple s a fnte set called the states s a fnte set called the alphabet s the transton functon s the ntal state s the set

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information