Fast and Robust Distributed Subgraph Enumeration

Size: px

Start display at page:

Download "Fast and Robust Distributed Subgraph Enumeration"

Beverly Gardner
5 years ago
Views:

1 Fast and Robust Dstrbuted Subgraph Enumeraton Xuguang Ren Grffth Unversty, Australa Wook-Shn Han POSTECH, Publc of Korea Junhu Wang Grffth Unversty, Australa Jeffrey Xu Yu The Chnese Unversty of Hong Kong arxv: v1 [cs.db] 23 Jan 219 ABSTRACT We study the classc subgraph enumeraton problem under dstrbuted settngs. Exstng solutons ether suffer from severe memory crss or rely on large ndexes, whch makes them mpractcal for very large graphs. Most of them follow a synchronous model where the performance s often bottlenecked by the machne wth the worst performance. Motvated by ths, n ths paper, we propose RADS, a Robust Asynchronous Dstrbuted Subgraph enumeraton system. RADS frst dentfes results that can be found usng snglemachne algorthms. Ths strategy not only mproves the overall performance but also reduces network communcaton and memory cost. Moreover, RADS employs a novel regon-grouped mult-round expand verfy & flter framework whch does not need to shuffle and exchange the ntermedate results, nor does t need to replcate a large part of the data graph n each machne. Ths feature not only reduces network communcaton cost and memory usage, but also allows us to adopt smple strateges for memory control and load balancng, makng t more robust. Several heurstcs are also used n RADS to further mprove the performance. Our experments verfed the superorty of RADS to state-of-the-art subgraph enumeraton approaches. Keywords Dstrbuted System, Asynchronous, Subgraph Enumeraton 1. INTRODUCTION Subgraph enumeraton s the problem of fndng all occurrences of a query graph n a data graph. Its soluton s a bass for many other algorthms and t fnds numerous applcatons. Ths problem has been well studed under sngle machne settngs [1][19]. However n the real world, the data graphs are often fragmented and dstrbuted across dfferent stes. Ths phenomenon hghlghts the mportance of dstrbuted systems of subgraph enumeraton. Also, the ncreasng sze of modern graph makes t hard to load the whole graph nto memory, whch further strengthens the requrement of dstrbuted subgraph enumeraton. In recent years, several approaches and systems have been proposed [1, 21, 13, 15, 6, 5]. However, exstng systems ether need to exchange large ntermedate results (e.g., [13],[15] and [21]), or copy and replcate large parts of the data graph on each machne (e.g., [1] and [6, 5]), or rely on heavy ndexes (e.g., [18]). Both exchangng and cachng large ntermedate results and exchangng and cachng large parts of the data graph wll cause heavy burden on the network and on memory, n fact, when the graphs are large these systems tend to crash due to memory depleton. In addton, most of the current systems are synchronous, hence they suffer from synchronzaton delay, that s, the machnes must wat for each other for the completon of certan processng tasks, makng the overall performance equvalent to that of the slowest machne. More detals about exstng work can be found n Secton 8. It s observed n prevous work [15, 18] that when the data graph s large, the number of ntermedate results can be huge, makng the network communcaton cost a bottleneck and causng memory crash. On the other hand, systems that rely on replcaton of large parts of the data graph or heavy ndexes are mpractcal for large data graphs and lowend computer clusters. In ths paper, we present RADS, a Robust Asynchronous Dstrbuted Subgraph enumeraton system. Dfferent from prevous work, our system does not need to exchange ntermedate results or replcate large parts of the data graph. It does not rely on heavy ndexes or suffer from synchronzaton delay. Our system s also more robust due to our memory control strateges and easy for load balancng. To be specfc, we make the followng contrbutons: (1) We propose a novel dstrbuted subgraph enumeraton framework, where the machnes do not need to exchange ntermedate results, nor do they need to replcate large parts of the data graph. (2) We propose a method to dentfy embeddngs that can be found on each local machne ndependent of other machnes, and use sngle-machne algorthm to fnd them. Ths strategy not only mproves the overall performance, but also reduces network communcaton and memory cost.

(3) We propose effectve memory control strateges to mnmze the chance of memory crash, makng our system more robust. Our strategy also facltates workload balancng.

These nclude () a set of rules to compute an effcent executon plan, () a dynamc data structure to compactly store ntermedate results.

2 (3) We propose effectve memory control strateges to mnmze the chance of memory crash, makng our system more robust. Our strategy also facltates workload balancng. (4) We propose optmzaton strateges to further mprove the performance. These nclude () a set of rules to compute an effcent executon plan, () a dynamc data structure to compactly store ntermedate results. (5) We conduct extensve experments whch demonstrate that our system s not only sgnfcantly faster than exstng solutons 1, but also more robust. Paper Organzaton In Secton 2, we present the prelmnares. In Secton 3, we present the archtecture and framework of our system RADS. In Secton 4, we present algorthms for computng the executon plan. In Secton 5, we present the embeddng tre data structure to compress our ntermedate results. Our memory control strategy s gven n Secton 6. We present our experments n Secton 7, dscuss related work n Secton 8 and conclude the paper n Secton 9. Some proofs, detaled algorthms and auxlary expermental results are gven n the appendx. 2. PRELIMINARIES Data Graph & Query Graph Both the data graph and query graph (a.k.a query pattern) are assumed to be unlabeled, undrected, and connected graphs. We use G = (V G, E G) and P = (V P, E P ) to denote the data graph and query graph respectvely, where V G and V P are the vertex sets, and E G and E P are the edge sets. We wll use data (resp. query) vertex to refer to vertces n the data (resp. query) graph. Generally, for any graph g, we use V g and E g to denote ts vertex set and edge set respectvely, and for any vertex v n g, we use adj(v) to denote v s neghbour set n g and use deg(v) to denote the degree of v. Subgraph Isomorphsm Gven a data graph G and a query pattern P, P s subgraph somorphc to G f there exsts an njectve functon f: V P V G such that for any edge (u 1, u 2) E P, there exsts an edge (f(u 1), f(u 2)) E G. The njectve functon s also known as an embeddng of P n G (or, from P to G), and t can be represented as a set of vertex pars (u, v) where u V P s mapped to v V G. We wll use R G(P ) to denote the set of all embeddngs of P n G. The problem of subgraph enumeraton s to fnd the set R G(P ). In the lterature, subgraph enumeraton s also referred to as subgraph somorphsm search [16][1][19] and subgraph lstng [12][21]. Partal Embeddng A partal embeddng of graph P n graph G s an embeddng n G of a vertex-nduced subgraph of P. A partal embeddng s a full embeddng f the vertexnduced subgraph s P tself. Symmetry Breakng A symmetry breakng technque based on automorphsm s conventonally used to reduce duplcate embeddngs [8]. As a result the data vertces n the fnal embeddngs should follow a preserved order of the query vertces. We apply ths technque n ths paper by default and we wll specfy the preserved order when necessary. 1 Except for some queres usng [18], whch reles on heavy ndexes. Graph Partton & Storage Gven a data graph G and m machnes {M 1,..., M m} n a dstrbuted envronment, a partton of G s denoted {G 1, G 2,..., G m} where G t s the partton located n the t th machne M t. In ths paper, we assume each partton s stored as an adjacency-lst. For any data vertex v, we assume ts adjacency-lst s stored n a sngle machne M t and we say v s owned by M t (or resdes n M t). We call v a foregn vertex of M t f v s not owned by M t. We say a data edge e s owned by (or resdes n) M t (denoted as e E Gt ) f ether end vertex of e resdes n M t. Note that an edge can resde n two dfferent machnes. For any v owned by M t, we call v a border vertex f any of ts neghbors s owned by other machnes than M t. Otherwse we call t a non-border vertex. We use VG b t to denote the set of all border vertces n M t. 3. RADS ARCHITECTURE In ths secton, we frst present an overvew of the archtecture of RADS, followed by the R-Meef framework of RADS. We gve a detaled mplementaton of R-Meef n Appendx B. 3.1 Archtecture Overvew Fgure 1: RADS Archtecture The archtecture of RADS s shown n Fgure 1. Gven a query pattern P, wthn each machne, RADS frst launches a process of sngle-machne enumeraton (SM-E) and a daemon thread, smultaneously. After SM-E fnshes, RADS launches a R-Meef thread subsequently. Note that the R-Meef threads of dfferent machnes may start at dfferent tme. Sngle-Machne Enumeraton The dea of SM-E s to try to fnd a set of local embeddngs usng a sngle-machne algorthm, such as TurboIso[1], whch does not nvolve any dstrbuted processng. The subsequent dstrbuted process only has to fnd the remanng embeddngs. Ths strategy can not only boost the overall enumeraton effcency but also sgnfcantly reduce the memory cost and communcaton cost of the subsequent dstrbuted process. Moreover the local embeddngs can be used to estmate the space cost of a regon group, whch wll help to effectvely control the memory usage (to be dscussed n Secton 6).

3 We frst defne the concepts of border dstance and span, whch wll be used to dentfy embeddngs that can be found by SM-E. Defnton 1 (Border Dstance). Gven a graph partton G t and data vertex v n G t, the border dstance of v w.r.t G t, denoted as BD Gt (v), s the mnmum shortest dstance between v and any border vertex of G t, that s BD Gt (v) = mn dst(v, v ) (1) v V G b t where dst(v, v ) s the shortest dstance between v and v. Defnton 2 (Span). Gven a query pattern P, the span of query vertex u, denoted as Span P (u), s the maxmum shortest dstance between u and any other vertex of P, that s Span P (u) = max u V P dst(u, u ) (2) Proposton 1. Gven a data vertex v of G t and a query vertex u of P, f Span P (u) BD Gt (v), then there wll be no embeddng f of P n G such that f(u) = v, and f(u ) s not owned by M t, where u P, u u. Proposton 1 states that f the border dstance of v s not smaller than the span of query vertex u, there wll be no cross-machne embeddngs (.e., embeddngs where the query vertces are mapped to data vertces resdng n dfferent machnes) whch map u to v. The proof of Proposton 1 s n the Appendx A.1. Let u start be the startng query vertex (namely, the frst query vertex to be mapped) and C(u start) be the canddate vertex set of u start n G t. Let C 1(u start) C(u start) be the subset of canddates whose border dstance s no less than the span of u start. Accordng to Proposton 1, any embeddng that maps u start to a vertex n C 1(u start) can be found usng a sngle-machne subgraph enumeraton algorthm over G t, ndependent of other machnes. In RADS, the canddates n C 1(u start) wll be processed by SM-E, and the other canddates wll be processed by the subsequent dstrbuted process. The SM-E process s smple, and we wll next focus on the dstrbuted process. For presentaton smplcty, from now on when we say a canddate vertex of u start, we mean a canddate vertex n C(u start) C 1(u start), unless explctly stated otherwse. The dstrbuted process conssts of some daemon threads and the subgraph enumeraton thread: Daemon Threads lsten to requests from other machnes and support four functonaltes: (1) verfye s to return the edge verfcaton results for a gven request consstng of vertex pars. For example, gven a request {(v, v 1), (v 2, v 3)} posted to M 1, M 1 wll return {true, false} f (v, v 1) s an edge n G 1 whle (v 2, v 3) s not. (2) fetchv s to return the adjacency-lsts of the requested vertces of the data graph. The requested vertces sent to machne M must resde n M. (3) checkr s to return the number of unprocessed regon groups (whch s a group of canddate data vertces of the startng query vertex, see Secton 3.2) of the local machne (.e., the machne on whch the thread s runnng). (4) sharer s to return an unprocessed regon group of the local machne to the requester machne. sharer wll also mark the regon group sent out as processed. R-Meef Thread s the core subgraph enumeraton thread. When necessary, the local R-Meef thread sends verfye requests and fetchv requests to the Daemon threads located n other machnes, and the other machnes respond to these requests accordngly. Once a local machne fnshes processng ts own regon groups, t wll broadcast a checkr request to the other machnes. Upon recevng the numbers of unfnshed regon groups from other machnes, t wll send a sharer request to the machne wth the maxmum number of unprocessed regon groups. Once t receves a regon group, t wll process t on the local machne. checkr and sharer are for load balancng purposes only, and they wll not be dscussed further n ths paper. 3.2 The R-Meef Framework Before presentng the detals of the R-Meef framework, we need the followng defntons. Defnton 3 (embeddng canddate). Gven a partton G t of data graph G located n machne M t and a query pattern P, an njectve functon f Gt : V P V G s called an embeddng canddate (EC) of P w.r.t G t f for any edge (u, u ) E P, there exsts an edge (f Gt (u), f Gt (u )) E Gt provded ether f Gt (u) V Gt or f Gt (u ) V Gt. We use R Gt (P ) to denote the set of ECs of P w.r.t G t. Note that for an EC f Gt and a query vertex u, f Gt (u) s not necessarly owned by G t. That s, the adjacency-lst of f Gt (u) may be stored n other machnes. For any query edge (u, u ), an EC only requres that the correspondng data edge (f Gt (u), f Gt (u )) exsts f at least one of f Gt (u) and f Gt (u ) resdes n G t. Therefore, an EC may not be an embeddng. Intutvely, the exstence of the edge (f Gt (u), f Gt (u )) can only be verfed n G t f one of ts end vertces resdes n G t. Otherwse the exstence of the edge cannot be verfed n M t, and we call such edges undetermned edges. Defnton 4. Gven an EC f Gt of query pattern P, for any edge (u, u ) E P, we say (f Gt (u), f Gt (u )) s an undetermned edge of f Gt f nether f Gt (u) nor f Gt (u ) s n G t. Example 1. Consder a partton G t of a data graph G and a trangle query pattern P where V P = {u, u 1, u 2}. The mappng f Gt = {(u, v ), (u, v 1), (u, v 2)} s an EC of P n G w.r.t G t f v V Gt, v 1 adj(v ) and v 2 adj(v ) and nether v 1 nor v 2 resdes n G t. (v 1, v 2) s an undetermned edge of f Gt. Obvously f we want to determne whether f Gt s actually an embeddng of the query pattern, we have to verfy ts undetermned edges n other machnes. For any undetermned edge e, f ts two end vertces resde n two dfferent machnes, we can use ether of them to verfy whether e E G or not. To do that, we need to send a verfye request to one of the machnes. Note that t s possble that an undetermned edge s shared by multple ECs. To reduce network traffc, we do

4 not send verfye requests once for each ndvdual EC, nstead, we buld an edge verfcaton ndex (EVI) and use t to dentfy ECs that share undetermned edges. We assume each EC s assgned an ID (We wll dscuss how to assgn such IDs and how to buld EVI n Secton 5). Defnton 5 (edge verfcaton ndex). Gven a set R Gt (P ) of ECs, the edge verfcaton ndex (EVI) of R Gt (P ) s a key-value map I where Example 3. Consder the query pattern n Fgure 2 (a), we may have a decomposton (dp, dp 1, dp 2, dp 3) where dp.pv = u, dp.lf = {u 1, u 2, u 7 }, dp 1.pv = u 1, dp 1.LF = {u 3, u 4}, dp 2.pv = u 2, dp 2.LF = {u 5, u 6}, and dp 3.pv = u, dp 3.LF = {u 8, u 9}. Note that the edge (u 4, u 5) s not n any decomposton unt. (1) for any tuple (e, IDs) I, the key e s a vertex par (v, v ). the value IDs s the set of IDs of the ECs n R Gt (P ) of whch e s an undetermned edge. (2) for any undetermned edge e of f Gt R Gt (P ), there exsts a unque tuple n I wth e as the key and the ID of f Gt n the value. Intutvely, the EVI groups the ECs that share each undetermned edge together. It s straghtforward to see: Proposton 2. Gven data graph G, query pattern P and an edge verfcaton ndex I, for any (e, IDs) I, f e / E G, then none of the ECs correspondng to IDs can be an embeddng of P n G. Example 2. Consder two embeddng canddates f Gt = {(u, v ), (u, v 1), (u, v 2)} and f G t = {(u, v 3), (u, v 1), (u, v 2)} of a trangle pattern P of a data graph G where V P = {u, u 1, u 2}. Assumng (v 1, v 2) s an undetermned edge, we can have an edge verfcaton ndex: I = {(v 1, v 2) < f Gt, f G t >} where f Gt, f Gt are represented by ther IDs n I. If (v 1, v 2) s verfed non-exstng, both f Gt and f Gt can be fltered out. Lke SEED and Twntwg, we decompose the pattern graph nto small decomposton unts. Defnton 6 (decomposton). A decomposton of query pattern P s a sequence of decomposton unts DE = (dp,..., dp l ) where every dp DE s a subgraph of P such that (1) The vertex set of dp conssts of a pvot vertex pv and a non-empty set LF of leaf 2 vertces, all of whch are vertces n V P ; and for every u LF, (pv, u ) E P. (2) The edge set of dp conssts of two parts, Edp star and Edp sb, where Edp star = u LF {(dp.pv, u )} s the set of edges between the pvot vertex and the leaf vertces, and Edp sb = u,u dp.lf {(u, u ) E P } s the set of edges between the leaf vertces. (3) dp DE (V dp ) = V P, and for < j, V dp dp j.lf =. Note condton (3) n the above defnton says the leaf vertces of each decomposton unt do not appear n the prevous unts. Unlke the decompostons n SEED [15] and TwnTwg [13], our decomposton unt s not restrcted to stars and clques, and dp DE (E dp ) may be a proper subset of E P. 2 In an abuse of the word leaf. Fgure 2: Runnng Example Gven a decomposton DE = (dp,..., dp l ) of pattern P, we defne a sequence of sub-query patterns P,..., P l, where P = dp, and for >, P conssts of the unon of P 1 and dp together wth the edges across the vertces of P 1 and dp, that s, V P = j V dp j, E P = j EP j {(u, u j) E P u P 1, u j dp.lf }. Note that (a) none of the leaf vertces of dp can be n P 1; and (b) P s the subgraph of P nduced by the vertex set V P, and P l = P. We say DE forms an executon plan f for every [1, l], the pvot vertex of dp s n P 1. Formally, we have Defnton 7 (executon plan). A decomposton DE = (dp,..., dp l ) of P s an executon plan (P L) f dp.pv V P 1 for all [1, l]. For example, the decomposton n Example 3 s an executon plan. Let P L = (dp,..., dp l ) be an executon plan. For each dp, we defne E cro dp = {(u, u j) E P u P 1, u j dp.lf }(for > ) the expanson edges, sblng edges, and cross-unt edges respectvely. The sblng edges and cross-unt edges are both called verfcaton edges. Consder dp n Example 3, we have Edp sb ={(u 1, u 2)}, Edp cro =. For dp 2, we have Edp sb 2 ={(u 5, u 6)}, Edp cro 2 ={(u 4, u 5)}. Note that the expanson edges of all the unts form a spannng tree of P, and the verfcaton edges are the edges not n the spannng tree. Wth the above concepts, we are ready to present the R-Meef framework. Gven query pattern P, data graph G We call the edges n E star dp, E sb dp and E cro dp and ts partton G t on machne M t, R-Meef fnds a set of embeddngs of P n G t accordng to an executon plan P L, whch provdes a processng order for the query pattern P. In our approach, each machne M t wll evaluate P n the frst round, and based on the results n round, t wll evaluate the next pattern P +1 n the next round. The fnal results wll be obtaned when P l s evaluated n all machnes (each machne computes a subset of the fnal embeddngs, the unon of whch s the fnal set of embeddngs of P n G). Moreover, n our approach, each machne M t starts by mappng dp.pv (whch s the u start n Secton 3.1) to a canddate vertex of dp.pv that resdes n M t. When the

5 number of such canddate vertces s large, there s a possblty of generatng too many ntermedate results (.e., ECs and embeddngs of P,... P l ). To prevent memory crash, we dvde the canddate vertex set of dp.pv nto dsjont regon groups RG = {rg,..., rg h }, and process each group separately. The workflow of R-Meef s as follows: (1) From the vertces resdng n M t, R-Meef dvdes the canddate vertces of dp.pv nto dfferent regon groups. Then t processes each group sequentally and separately. (2) For each regon group, R-Meef processes one unt at a round based on the executon plan P L. In the th round, the workflow can be llustrated n Fgure 3. Fgure 3: R-Meef workflow In Fgure 3, R Gt (P 1) represents the set of embeddng of P 1 generated and cached from the last round. For the frst round (.e., round ), R Gt (P 1) wll be ntalzed as {(dp.pv, v)} where v s a canddate vertex of dp.pv. By expandng R Gt (P 1), we get all the ECs of P w.r.t M t,.e., R Gt (P ). After verfcaton and flterng, we get all the embeddngs of P for ths regon group of M t. In each round, the expand and verfy & flter processes work as follows: Expand Gven an embeddng f of P 1 obtaned from the prevous round, dp.pv has already been matched to a data vertex v by f snce dp.pv P 1. By searchng the neghborhood of v, we expand f to fnd the ECs of P contanng (dp.pv, v) w.r.t M t. It s worth notng that f v does not resde n M t, we have to fetch ts adjacency-lst from other machnes. Dfferent embeddngs from prevous round may share some common foregn vertces to fetch n order to expand. To reduce network traffc, for all the embeddngs from last round, we gather all the vertces that need to be fetched and then fetch ther adjacency-lsts together by sendng a sngle fetchv request. One mportant assumpton here s that each machne has a record of the ownershp nformaton (.e., whch machne a data vertex resdes n) of all the vertces. Ths record can be constructed offlne as a map whose sze s V, whch can be saved together wth the adjacencylst and takes one extra byte space for each vertex. Verfy & Flter Upon havng a set of ECs (.e. R Gt (P )), we store them compactly n a embeddng tre and buld an EVI from them (the embeddng tre and EVI wll be further dscussed n Secton 5). Then we send a verfye request consstng of the keys of EVI,.e., undetermned data edges, to other machnes to verfy ther exstence. After we get the verfcaton results, each faled key ndcates that the correspondng ECs can be fltered out. The output of the fnal round s the set of embeddngs of query pattern P found by M t for ths regon group. Note that a detaled mplementaton and example of R- Meef s gven n Appendx B. Although the dea of our framework s straghtforward. However, n order to acheve the best performance, each crtcal component of t should be carefully desgned. In the followng sectons, we tackle the challenges one by one. 4. COMPUTING EXECUTION PLAN It s obvous that we may have multple vald executon plans for a query pattern and dfferent executon plans may have dfferent performance. The challenge s how to fnd the most effcent one among them? In ths secton, we present some heurstcs to fnd a good executon plan. 4.1 Mnmzng Number of Rounds Gven query pattern P and an executon plan P L, we have P L +1 rounds for each regon group, and once all the rounds are processed we wll get the set of fnal embeddngs. Also, wthn each round, the workload can be shared. To be specfc, a sngle undetermned edge e may be shared by multple ECs. If these embeddng canddates are generated n the same round, the verfcaton of e can be shared by all of them. The same apples to the foregn vertces where the cost of fetchng and memory space can be shared among multple embeddng canddates f they happen to be n the same round. Therefore, our frst heurstc s to mnmze the number of rounds (namely, the number of decomposton unts) so as to maxmze the workload sharng. Here we present a technque to compute a query executon plan, whch guarantees a mnmum number of rounds. Our technque s based on the concept of maxmum leaf spannng tree [7]. Defnton 8. A maxmum leaf spannng tree (MLST) of pattern P s a spannng tree of P wth the maxmum number of leafs (a leaf s a vertex wth degree 1). The number of leafs n a MLST of P s called the maxmum leaf number of P, denoted l P. A closely related concept s mnmum connected domnatng set. Defnton 9. A connected domnatng set (CDS) of P s a subset D of V P such that (1) D s a domnatng set of P, that s, any vertex of P s ether n D or adjacent to a vertex n D, and (2) the subgraph of P nduced by D s connected. A mnmum connected domnatng set (MCDS) s a CDS wth the smallest cardnalty among all CDSs. The number of vertces n a MCDS s called the connected domnaton number, denoted c P. It s shown n [4] that V P = c P + l P. Theorem 1. Gven a pattern P, any executon plan of P has at least c P decomposton unts, and there exsts an executon plan wth exactly c P decomposton unts. The proof of Theorem 1 s n the Appendx A.1. Theorem 1 ndcates that c P s the mnmum number of rounds of any executon plan. The above proof provdes a method to construct an executon plan wth c P rounds from a MLST. It s worth notng that the decomposton unts n the query plan constructed as n the proof have dstnct pvot vertces.

6 Example 4. Consder the pattern P, t can be easly verfed that the tree obtaned by erasng the edges (u 1, u 2), (u 3, u 4), (u 4, u 5), (u 5, u 6) and (u 8, u 9) s a MLST of P. Choosng u as the root, we wll get a mnmum round executon plan P L 1={dp, dp 1, dp 2} where dp.pv = u, dp.lf = {u 1, u 2, u 7, u 8, u 9}, dp 1.pv = u 1, dp 1.LF = {u 3, u 4} and dp 2.pv = u 2, dp 2.LF = {u 5, u 6}. If we choose u 1 as the root, we wll get a dfferent mnmum-round executon plan P L 2={dp, dp 1, dp 2}, where dp.pv = u 1, dp.lf = {u, u 3, u 4}, dp 1.pv = u, dp 1.LF = {u 2, u 7 u 8, u 9}, dp 2.pv = u 2, dp 2.LF = {u 5, u 6} 4.2 Mnmzng the span of dp.pv Gven a pattern P, multple executon plans may exst wth the mnmum number of rounds, whle ther dp.pv can be dfferent. When facng ths case, here we present our second heurstc whch s to choose the plan(s) whose dp.pv have the smallest span. Ths strategy wll maxmze the number of embeddngs that can be found usng SM-E. Recall the RADS archtecture where dp.pv s the startng query vertex u start, based on Proposton 1, we know that the more canddate vertces of dp.pv can be processed n SM- E, the more workload can be separated from the dstrbuted processng, and therefore the more communcaton cost and memory usage can be reduced. Fgure 4: A Query Pattern Consder the pattern n Fgure 4, the bold edges demonstrate a MLST based on whch both u 3 and u 4 can be chosen as dp.pv. And the executon plans from them have the same number of rounds. However, Span P (u 3) = 2 whle Span P (u 4) = 3. Therefore we choose the plan wth u 3 as the dp.pv. 4.3 Maxmzng Flterng Power Gven a pattern P, multple executon plans may exst wth the mnmum number of rounds and ther dp.pv have the same smallest span. Here we use the thrd heurstc whch s to choose plans wth more verfcaton edges n the earler rounds. The ntuton s to maxmze the flterng power of the verfcaton edges as early as possble. To ths end, we propose the followng score functon SC(P L) for an executon plan P L = {dp,..., dp l }: SC(P L) = dp P L 1 ( + 1) ρ ( Esb dp + Edp cro ) (3) Edp sb + Edp cro s the number of verfcaton edges n round, and ρ s a postve parameter used to tune the score functon. In our experments we use ρ = 1. The functon SC(P L) calculates a score by assgnng larger weghts to the verfcaton edges n earler rounds (snce 1 > 1 (+1) ρ (j+1) ρ f < j). Example 5. Consder the query plans P L 1 and P L 2 n Example 4. The total number of verfcaton edges n these plans are the same. In P L 1, the number of verfcatons edges for the frst, second and thrd round s 2, 1, 2 respectvely. In P L 2, the number of verfcaton edges for the three rounds s 1, 2, and 2 respectvely. Therefore, we prefer P L 1. Usng ρ = 1, we can calculate the scores of the two plans as follows: SC(P L 1) = 2/1 + 1/2 + 2/3 3.2 SC(P L 2) = 1/1 + 2/2 + 2/3 2.7 When several mnmum-round executon plans have the same score, we use another heurstc rule to choose the best one from them: the larger the degree of the pvot vertex, the earler we process the unt. The pvot vertex wth a larger degree has a stronger power to flter unpromsng canddates. To accommodate ths rule, we can modfy the score functon n (1) by addng another component as follows: SC(P L) = dp P L [ Esb dp + Edp cro + deg(dp.pv) ] (4) ( + 1) ρ ( + 1) To ths end, we have a set of rules to follow when to compute the executon plan. Snce the query vertex s normally very small. We can smply enumerate all the possble executon plans and choose the best accordng to those rules. 5. EMBEDDING TRIE As stated before, to save memory, the ntermedate results (whch nclude embeddngs and embeddng canddates generated n each round) are stored a compact data structure called an embeddng tre. Besdes the compresson, the challenges here are how to ensure each ntermedate result has a unque ID n the embeddng tre and the embeddng tre can be easly mantaned? Before we gve our soluton, we frst defne a matchng order, whch s the order followng whch the query vertces are matched n R-Meef. It s also the order the nodes n the embeddng tre are organzed. Defnton 1 (Matchng Order). Gven a query executon plan P L = {dp,..., dp l } of pattern P, the matchng order w. r. t P L s a relaton defned over the vertces of P that satsfes the followng condtons: (1) dp.pv dp j.pv f < j; (2) For any two vertces u 1 dp.lf and u 2 dp j.lf, u 1 u 2 f < j. (3) For [, l]: () dp.pv u for all u dp.lf ; () for any vertces u 1, u 2 dp.lf that are not the pvot vertces of other unts, u 1 u 2 f deg(u 1) deg(u 2), or deg(u 1) = deg(u 2) and the vertex ID of u 1 s less than that of u 2; () f u 1 dp.lf s a pvot vertex of another unt, and u 2 dp.lf s not a pvot vertex of another unt, then u 1 u 2. Intutvely the above relaton orders the vertces of P as follows: (a) Generally a vertex u 1 n dp s before a vertex u 2 n dp j f < j, except for the specal case where u 1 dp.lf and u 2 = dp j.pv. In ths specal case, u 2 may appear n the leaf of some prevous unt dp k (k ), and t may be arranged before u 1 accordng to Condton (2) or Condton (3) (). (b) Startng from dp, the vertex dp.pv s arranged before all other vertces. For the leaf vertces of dp,

t arranges those that are pvot vertces of other unts before those that are not (Condton (3)()), and for the former, t arranges them accordng to the ID of the unts for whch they are the pvot vertex 3

For each subsequent dp, the pvot vertex must appear n the leaf of some prevous unt, hence ts poston has been fxed; and the leaf vertces of dp are arranged n the same way as the leaf vertces of dp.

7 t arranges those that are pvot vertces of other unts before those that are not (Condton (3)()), and for the former, t arranges them accordng to the ID of the unts for whch they are the pvot vertex 3 (Condton (1)); for the latter, t arranges them n descendng order of ther degree n the orgnal pattern P, and f they have the same degree t arranges them n the order of vertex ID (Condton (3) ()). For each subsequent dp, the pvot vertex must appear n the leaf of some prevous unt, hence ts poston has been fxed; and the leaf vertces of dp are arranged n the same way as the leaf vertces of dp. It s easy to verfy s a strct total order over V P. Followng the matchng order, the vertces of P can be arranged nto an ordered lst. Consder the executon plan P L 1 n Example 4. The vertces n the query can be arranged as (u, u 1, u 2, u 7, u 8, u 9, u 3, u 4, u 5, u 6) accordng to the matchng order. Let P L = {dp,..., dp l } be an executon plan, P be the subgraph of P nduced from the vertces n dp dp (as defned n Secton 3.2), and R be a set of results (.e., embeddngs or embeddng canddates) of P. For easy presentaton, we assume the vertces n P have been arranged nto the lst u, u 1,..., u n by the matchng order, that s, the query vertex at poston j s u j. Then each result of P can be represented as a lst of correspondng data vertces. These lsts can be merged nto a collecton of trees as follows: v: a data vertex parentn: a ponter pontng to ts parent node (the ponter of the root node s null). chldcount: the number of chld nodes of N. (3) If two nodes have the same parent, then they store dfferent data vertces. (4) Every leaf-to-root path represents a result n R, and every result n R s represented as a unque leaf-to-root path. (5) If we dvde the tree nodes nto dfferent levels such that the root nodes are at level, the chldren of the root nodes are at level 1 and so on, then the tree nodes at level j (j [, 1]) store the set of values {f(u j ) f R }. (1) Intally, each result f s treated as a tree T f, where the node at level j stores the data vertex f(u j ) for j [, n], and the root s the node at level. (2) If multple results map u to the same data vertex, merge the root nodes of ther trees. Ths parttons the results n R nto dfferent groups, each group wll be stored n a dstnct tree. (3) For each newly merged node N, f multple chldren of N correspond to the same vertex, merge these chldren nto a sngle chld of N. (4) Repeat step (3) untll no nodes can be merged. The collecton of trees obtaned above s a compact representaton of the results n R. Each leaf node n the tree unquely dentfes a result. The embeddng tre s a collecton of smlar trees. However, snce the purpose of the embeddng tre s to save space, we cannot get t by mergng the result lsts. Instead, we wll have to construct t by nsertng nodes one by one when results are generated, and removng nodes when results are elmnated. Next we formally defne embeddng tre and present the algorthms for the mantenance of the embeddng tre. 5.1 Structure of the Embeddng Tre Defnton 11 (Embeddng Tre). Gven a set R of results of P, the embeddng tre of R s a collecton of trees used to store the results n R such that: (1) Each tree represents a set of results that map u to the same data vertex. (2) Each tree node N has 3 Note that no two unts share the same pvot vertex. Fgure 5: Example of Embeddng Tre Example 6. Consder P n Example 7, where the vertces are ordered as u, u 1, u 2 accordng to the matchng order. There are three ECs of P : (v, v 1, v 2), (v, v 1, v 9) and (v, v 9, v 11). These results can be stored n a tree shown n Fgure 5(a). When the second EC s fltered out, we have R Gt (P ) compressed n a tree as shown n Fgure 5(b). The frst EC can be expanded to an EC of P 1 (where the lst of vertces of P 1 are u, u 1, u 2, u 3, u 4), whch s as shown n Fgure 5(c). Although the structure of embeddng tre s smple, t has some nce propertes: Compresson Storng the results n the embeddng tre saves space than storng them as a collecton of lsts. Unque ID For each result n the embeddng tre, the address of ts leaf node n memory can be used as the unque ID. Retreval Gven a partcular ID represented by a leaf node, we can easly follow ts ponter parentn step-by-step to retreve the correspondng result. Removal To remove a result wth a partcular ID, we can remove ts correspondng leaf node and decrease the chldcount of ts parent node by 1. If ChldCount of ths parent node reaches, we remove ths parent node. Ths process recursvely affects the ancestors of the leaf node.

8 5.2 Mantanng the embeddng tre Recall that n Algorthm 4, gven an embeddng f of P 1, the functon expandembedt re s used to search for the ECs of dp wthn the neghbourhood of the mapped data vertex of v pv, where v pv = f(dp.pv). Moreover, the expandembedt re functon handles the task of expandng the embeddng tre ET by concatenatng f wth each newly found EC of dp. If an EC s fltered out or f an embeddng cannot be expanded to a fnal result, the functon must remove t from ET. Now we present the detals of the expandembedt re functon n Algorthm 1. When dp.pv s mapped to the data vertex v = f(dp.pv) by an embeddng f of P 1, Algorthm 1 uses a backtrackng approach to fnd the ECs of P wthn the neghbourhood of v. The recursve procedure s gven n the subroutne adjenum. In each round of the recursve call, adjenum tres to match u to a canddate vertex v and add (u,v) to f, where u s a query vertex n dp.lf. When f s expanded to an EC of P, whch means an EC of dp s concatenated to the orgnal f, we add t nto ET by channg up the correspondng embeddng tre nodes. If f cannot be expanded nto an EC of P, we wll remove t from ET. Algorthm 1: expandembedtre Input: an embeddng f of P 1, local machne M t, unt dp, embeddng tre ET Output: expanded ET and an edge verfcaton ndex I 1 v f(dp.pv) 2 for each u dp.lf do 3 C(u) adj(v) 4 for each (u, u ) Edp cro do 5 f f(u ) resdes n M t then 6 C(u) adj(f(u )) C(u) 7 f C(u) = then 8 remove f from ET 9 return 1 u next vertex n query vertex lst 11 get N correspondng to f 12 adjenum(n, u) Lnes 1 to 9 of Algorthm 1 compute the canddate set for each u dp.lf as the ntersecton of the neghbor set of v = f(dp.pv) and the neghbor set of each f(u ), where (u, u) s a cross-unt edge and f(u ) s n M t. If any of the canddate sets s empty, t removes f from ET. Otherwse t passes on the next query vertex u and the ID of f (whch s a node n ET ) to the recursve subroutne adjenum. The subroutne adjenum s gven n Algorthm 2. It plays the same roles as the SubgraphSearch procedure n the backtrackng framework [16]. In Lne 1, adjenum creates an local varable F current wth default value false. The value ndcates whether f can be extended to an EC of P. For the leaf vertex u, adjenum frst creates a copy C r(u) of C(u), and then refnes the canddate vertex set C r(u) by consderng every sblng edge (u, u ) where u has already been mapped by f to f(u ). If f(u ) resdes n M t, C r(u) s shrank by an ntersecton wth adj(f(u )) (Lne 2 to 5). Then, for each vertex v n the refned set C r(u), t frst ntalzes a flag E wth the value true (Lne 7), ths value ndcates whether u can be potentally mapped to v. Then f v resdes n M t t wll check every verfcaton edge (u, u ) where u has been mapped to see f (v, f(u )) exsts, f one of such edge does not exst, t wll set E to false (Lnes 8 to 11), meanng u cannot be mapped to v. Ths part (Lnes 7 to 11) s lke the IsJonable functon n the backtrackng framework [16]. If E s stll true after the local verfcaton, we add (u, v) to f (Lne 13). Then we create a new tre node N for v wth N as ts parentn (Lne 14, 15). After that, f f grows to an EC of P, then for each undetermned edge e of f (both end vertces are not n the local machne), we add N to I[e] (Lne 17, 18). We also set the F current as true (Lne 19). If f s not an EC of P, whch means there are stll leaf vertces of dp not matched, we get the next leaf vertex u (Lne 21), and launch a recursve call of adjenum by passng t N and u (Lne 22). We record the return value from ts deeper adjenum as F deeper. If F deeper s true after all the recursve calls, whch means there are ECs wth v mapped to u n f, we ncrease chldcount of the parentnode N and add the newly created N to as a chld of N n ET (Lne 23 to 25). Then we backtrack by removng (u, v) from f, so that we can try to map u to another canddate vertex n C r(v). After we tred all the canddate vertces of C r(u), we return the value of F current (Lne 27). Algorthm 2: adjenum Input: Tre node N representng embeddng f of P 1, leaf vertex u of dp Output: expanded ET and an edge verfcaton ndex I 1 F current false 2 C r(u) C(u) 3 for each u mapped n f and (u, u ) Edp sb do 4 f f(u ) resdes n M t then 5 C r(u) adj(f(u )) C r(u) 6 for each v C r(u) do 7 E true 8 f v resdes n M t then 9 for each (u, u ) (Edp sb Edp cro ) and u mapped n f do 1 f (v, f(u )) not exsts then 11 E false 12 f E s true then 13 add (u, v) to f 14 create a tre node N 15 N.v v, N.parentN N 16 f f = V P then 17 for each undetermned edge e of f do 18 add N to I[e] 19 F current true 2 else 21 u next vertex n dp.lf 22 F deeper adjenum(n, u ) 23 f F deeper s true then 24 N.chldCount add N as a chld node of N n ET 26 remove (u, v) from f 27 return F current Note that the edge verfcaton ndex I s mantaned durng the expanson process. 6. MEMORY CONTROL STRATEGIES

9 Ths secton focuses on the challenge of robustness of R-Meef. Snce R-Meef stll caches fetched foregn vertces and ntermedate results n memory, memory consumpton s stll a crtcal ssue when the data graph s large. We propose a groupng strategy to keep the peak memory usage under the memory capacty of the local machne. Our dea s to dvde the canddate vertces of the frst query vertex dp.pv nto dsjont groups and process each group ndependently. In ths way, the overall cached data on each machne wll be dvded nto several parts, where each part s no larger than the avalable memory Φ. Fgure 6: Groupng Example A nave way of groupng the canddate vertces s to dvde them randomly. However, random groupng of the vertces may put vertces that are dssmlar to each other nto the same group, potentally resultng n more network communcaton cost. Consder the data graph n Fgure 6. Suppose the canddate vertex set s {v,v 1,v 2, v 3}. If we dvde t nto two groups {v, v 1} and {v 2, v 3}, then because v and v 1 share most neghbours, there s a good chance for the ECs of dp generated from v and v 1 to share common verfcaton edges, and share common foregn vertces that need to be fetched (e.g., f dp 1.pv s mapped to v 5 by ECs orgnated from v and v 1, and v 5 s not on the local machne). However, f we partton the canddate set nto {v, v 2} and {v 1, v 3}, then there s lttle chance for such sharng. Our goal s to fnd a way to partton the canddate vertces nto groups so that the chance of edge verfcaton sharng and foregn vertces sharng by the results n each group s maxmzed. Let C C(dp.pv) be the canddate set of dp.pv, and Φ be the avalable memory. Our method s to generate the groups one by one as follows. Frst we pck a random vertex v C and let rg = {v} be the ntal group. If the estmated memory requrement of the results orgnated from rg, denoted φ(rg) (we wll dscuss memory estmaton shortly), s less than Φ, we choose another canddate vertex n C rg that has the greatest proxmty to rg and add t to rg; f φ(rg) Φ we remove the last added vertex from rg. Ths generates the frst group. For the remanng canddate vertces we repeat the process, untl all canddate vertces are dvded nto groups. The detaled algorthm s gven n the Algorthm 3. Here an mportant concept s the the proxmty of a vertex v to a group of vertces, and we defne t as the percentage of v s neghbors that are also neghbors of some vertex n rg, that s, proxmty(v, rg) = adj(v) v rg adj(v ) adj(v) Intutvely the vertces put nto the same group are wthn a regon - each tme we wll choose a new vertex that has a dstance of at most 2 from one of the vertces already n the group (unless there are no such vertces). Therefore we call the group a regon group. (5) Estmatng memory usage In our system, the man memory consumpton comes from the ntermedate results and the fetched foregn vertces. The space cost of other data structures s trval. Consder the set of ntermedate results R orgnated from the group rg C. Recall that all results orgnated from the same canddate vertex of dp.pv are stored n the same tree, whle any results orgnated from dfferent canddate vertces are stored n dfferent trees. Therefore, f we know the space cost of the results orgnated from every canddate vertex, we can add them together to obtan the space cost of all results orgnated from rg. To estmate the space cost of the results orgnated from a sngle vertex, we use the average space cost of local embeddngs of a canddate vertex v C 1(u start) n embeddng tre format, whch can be obtaned when we conduct SM-E. Recall that for each v of C 1(u start) n SM-E, we fnd the local embeddngs orgnated from v followng a backtrackng approach. In each recursve step of the backtrackng approach, we may record the number of canddate vertces that are matched to the correspondng query vertex. The sum of all steps wll be the number of tre nodes f we group the those local embeddngs nto embeddng tre. Based on the sum, we know the space cost of local embeddngs orgnatng from v n the format of embeddng tre.. Next, we consder the space cost of the fetched foregn vertces n each round. Recall that when expandng the embeddngs of P 1 to ECs of P, we only need to fetch vertex v f there exsts f R Gt (P 1) such that f(dp.pv) = v. In the worst case, for every canddate vertex v of dp.pv, there exsts some f R Gt (P 1) whch maps dp.pv to v, and none of these canddate vertces of dp.pv resdes locally. Therefore the number of data vertces that need to be fetched equals to C(dp.pv) n the worst case. In practce, the space cost of C(dp.pv) s usually small compared wth that of the ntermedate results, and we can allocate a certan amount of memory for cachng the fetched data vertces. Note that when more data vertces need to be fetched, we may release some prevously cached data vertces f necessary. Therefore we can gnore the space cost of the fetched data vertces when we estmate the memory cost of each regon group. Algorthm 3: FndRegonGroups Input: the canddate vertex set C = C(dp.pv) on M t Output: A regon group rg 1 Pck a random vertex v C 2 rg {v} 3 C C {v} 4 whle C φ(rg) < Φ do 5 v arg max v C (proxmty(v, rg)) 6 rg rg {v} 7 C C {v} 8 f φ(rg) > Φ then 9 remove last added vertex from rg and put t back nto C 1 return rg 7. EXPERIMENT In ths secton, we present our expermental results. Envronment We conducted our experments n a cluster platform where each machne s equpped wth Intel CPU

10 wth 16 Cores and 16G memory. The operatng system of the cluster s Red Hat Enterprse Lnux 6.5. Algorthms We compared our system wth four state-ofthe-art dstrbuted subgraph enumeraton approaches: PSgL [21], the algorthm usng graph exploraton orgnally based on Pregel. TwnTwg [13], the algorthm usng jonng approach orgnally based on MapReduce. SEED [15], an upgraded verson of TwnTwg whle supportng clque decomposton unt. Crystal [18], the algorthm relyng on clque-ndex and compresson and orgnally usng MapReduce. We mplemented our approach n C++ wth the help of Mpch2 [9] and Boost lbrary [2]. We used Boost.Aso to acheve the asynchronous message lstenng and passng. We used TurboIso[1] as our SM-E processng algorthm. The performance of dstrbuted graph algorthms vares a lot dependng on dfferent programmng languages and dfferent underlne dstrbuted engne and fle systems [3]. It s not far enough to smply compare our approach wth the Pregel-based PSgL or other Hadoop-based approaches. Therefore to acheve a far comparson, we mplemented PSgL, TwnTwg and SEED usng C++ wth MPI lbrary. For Crystal, we chose to use the orgnal program provded by ts authors because our experments wth TwnTwg and SEED ndcate that our mplementaton and the orgnal mplementaton over Hadoop showed no sgnfcant dfference n terms of performance. In memory, we loaded the data graph n each node n the format of adjacency-lst for RADS, PSgL and TwnTwg. In order to support the clque decomposton unt of SEED, we also loaded the edges n-memory between the neghbours of a vertex along wth the adjacency-lst of the vertex. Dataset & Queres We used four real datasets n our experments: DBLP, RoadNet, LveJournal and UK22. The profles of these data sets are gven n Table 1. The dameter n Table 1 s the longest shortest path between any two data vertces. We parttoned each data graph usng the multlevel k-way partton algorthm provded by Mets [11]. DBLP s a relatvely small data graph whch can be loaded nto memory wthout parttonng, however, we stll partton t here. One may argue when the data graph s small, we can use sngle-machne enumeraton algorthms. However, our purpose of usng DBLP here s not to test whch algorthm s better when the graph can be loaded as a whole, but s to test whether the dstrbuted approaches can fully utlze the memory when there s enough space avalable. RoadNet s a larger but much sparser data graph than the others, consequently the number of embeddngs of each query s smaller. Therefore t can be used to llustrate whether a subgraph enumeraton soluton has good flterng power to flter out false embeddngs early. In contrast, the two denser data graphs, lvejournal and UK22, are used to test the algorthms ablty to handle denser graphs wth huge numbers of embeddngs. On dsk, our data graphs are stored n plan text format where each lne represents an adjacency-lst of a vertex. The approach of Crystal reles on the clque-ndex of the data graph whch should be pre-constructed and stored on dsk. Table 1: Profles of datasets Dataset(G) V E Avg. degree Dameter RoadNet 56M 717M K DBLP.3M 1.M LveJournal 4.8M 42.9M UK M 298.1M In Table 2, we present the dsk space cost of the ndex fles generated by the program of Crystal (M for Mega Bytes, G for Gga Bytes). Table 2: Illustraton of the Sze of Index Fles of Crystal Dataset(G) Data Graph Fle Sze Index Fle Sze DBLP 13M 21M RoadNet 2.3G 16.9G LveJournal 51M 6.5G UK 4.1G 6G The queres we used are gven n Fgure 7. Fgure 7: Query Set We evaluate the performance, measured by tme elapsed and communcaton cost, of the fve approaches n Secton 7.1. The cluster we used for ths experment conssts of 1 nodes. Due to space lmt, more expermental results, ncludng executon plan evaluaton and scalablty test etc., are presented n Appendx C. 7.1 Performance Comparson We compare the performance of fve subgraph enumeraton approaches by measurng the tme elapsed (n seconds) and the volume of exchanged data of processng each query pattern. The results of DBLP, RoadNet, LveJournal and UK22 are gven n 8, Fgure 9, Fgure 1 and Fgure 11, respectvely. We mark the result as empty when the test fals due to out-of-memory errors. When any bar reaches the upper bound, t means the correspondng values s beyond the upper bound value shown n the chart. Exp-1:RoadNet The results over the RoadNet dataset are gven n Fgure 8. As can be seen from the fgure, RADS and PSgL are sgnfcantly faster than the other three methods (by more than 1 order of magntude). RADS and PSgL are usng graph exploraton whle the others are usng jon-based methods. Therefore, both RADS and PSgL demonstrated effcent flterng power. Snce jon-based methods need to group the ntermedate results based on keys so as to jon them together, the performance was sgnfcantly dragged down when dealng wth sparse graphs compared wth RADS and PSgL.

11 5 Seed Twntwg Crystal Pads Psgl 175 Seed Twntwg Crystal Pads Psgl 4 15 Tme Elapsed(s) Tme Elapsed(s) q1 q2 q3 q4 q5 q6 q7 q8 q1 q2 q3 q4 q5 q6 q7 q8 (a) Tme cost (a) Tme cost Communcaton Cost(MB) Seed Twntwg Crystal Pads Psgl Communcaton Cost(MB) Seed Twntwg Crystal Pads Psgl q1 q2 q3 q4 q5 q6 q7 q8 q1 q2 q3 q4 q5 q6 q7 q8 (b) Communcaton Cost (b) Communcaton Cost Fgure 8: Performance over RoadNet Fgure 9: Performance over DBLP It s worth notng that PSgL was verfed slower than TwnTwg and SEED n [13][15]. Ths may be because the datasets used n TwnTwg and SEED are much denser than RoadNet, hence a huge number of embeddngs wll be generated. The grouped ntermedate results of TwnTwg and SEED sgnfcantly reduced the cost of network traffc. Another nterestng observaton s that although Crystal has heavy ndexes, ts performance s much worse than PSgL and RADS. The reason s that the number of clques n RoadNet s relatvely small consderng the graph sze. Moreover, there are no clques wth more than two vertces n queres q 1, q 3, q 6, q 7 and q 8. In such cases, the clque ndex cannot help to mprove the performance. As shown n Fgure 8(b), the communcaton cost s not large for any of the approaches (less than 5M for most queres). In partcular, for RADS, the communcaton cost s almost whch s manly because most data vertces can be processed by SM-E, as such no network communcaton s requred. Exp-2:DBLP The result over DBLP s shown n Fgure 9. As aforementoned, DBLP s smaller but much denser than RoadNet. The number of ntermedate results generated n DBLP are much larger than that n RoadNet, as mpled by the data communcaton cost shown n Fgure 9 (b). Snce PSgL does not consder any compresson or groupng over ntermedate results, the communcaton cost of PSgL s much hgher than the other approaches (more than 2M for queres after q 4). Consequently, the tme delay due to shufflng the ntermedate results caused bad performance for PSgL. However, PSgL s stll faster than SEED and TwnTwg. Ths may be because the tme cost of groupng ntermedate results of TwnTwg and SEED s heavy as well. It s worth notng that the communcaton cost of our RADS s qute small (less than 5M). Ths s because of the cachng strategy of RADS where most foregn vertces are only fetched once and cached n the local machne. If most vertces are cached, there wll be no further communcaton cost. The tme effcency of RADS s better than Crystal even for queres q 2,q 4 and q 5 where the trangle crystal can be drectly loaded from ndex wthout any computaton. Exp-3:LveJournal As shown n Fgure 1, for LveJournal, SEED, TwnTwg and PSgL start becomng mpractcal for queres from q 3 to q 8. It took them more than 1 thousand seconds n order to process each of those queres. Due to the huge number of ntermedate results generated, the communcaton cost ncreased sgnfcantly as well, especally for PSgL whose communcaton cost was beyond control when the query vertces reach 6. The method of Crystal acheved good performance for queres q 2, q 4 and q 5. Ths s manly because Crystal smply retreved the cached embeddngs of the trangle to match the vertces (u, u 1, u 2) of those 3 queres. However, when dealng wth the queres wth no good crystals (q 6, q 7 and q 8), our method sgnfcantly outperformed Crystal. One mportant thng to note s that the other three methods (SEED, TwnTwg and PSgL) are senstve to the end vertces, such as u 5 n q 5. Both tme cost and communcaton cost ncreased sgnfcantly from q 4 to q 5. RADS processes those end vertces last by smply enumeratng the combnatons wthout cachng any results related to them. The end vertces wthn Crystal wll be bud vertces whch only requres smple combnatons. As ndcated by query q 5 where ther processng tme ncreased slghtly from that of q 4, RADS and Crystal are ncely tuned to handle end vertces. Exp-4:UK22 As shown n Fgure 11, TwnTwg, SEED and PSgL faled the tests of queres after q 3 due to memory falure caused by huge number of ntermedate results. The communcaton cost of all other methods are sgnfcantly larger than RADS (more than 2 orders of magntude), we omt the chart for communcaton cost here. Smlar to

12 Tme Elapsed(s) Communcaton Cost(MB) Seed Twntwg Crystal Pads Psgl q1 q2 q3 q4 q5 q6 q7 q8 (a) Tme cost Seed Twntwg Crystal Pads Psgl q1 q2 q3 q4 q5 q6 q7 q8 (b) Communcaton Cost Fgure 1: Performance over LveJournal that of LveJournal, the processng tme of Crystal s better than that of RADS for the queres wth clques. Ths s because Crystal drectly retreves the embeddngs of the clques from the ndex. However, for queres wthout good crystals, our approach demonstrates better performance. As shown n Table 2, the ndex fles of Crystal s more than 1 tmes larger than the orgnal data graph. Another advantage of RADS over Crystal s our memory control strateges ensures t s more robust: we tred to set a memory upper bound of 8G and test query q 6, Crystal starts crashng due to memory leaks, whle RADS successfully fnshed the query for ths test. Tme Elapsed(s) Seed Twntwg Crystal Pads Psgl q1 q2 q3 q4 q5 q6 q7 q8 Fgure 11: Tme cost over UK22 8. RELATED WORK The works most closely related to ours are TwnTwg [13], SEED [15] and PSgL [21]. Both [13] and [15] use mult-round two-way jons. [13] uses the same data parttonng as n our work, and t decomposes the query graph P nto a set of small trees dp,..., dp k such that the unon of these trees s equal to P. Snce the decomposton unts are trees, a set of embeddngs of dp can be obtaned on each machne wthout consultng other machnes, and the unon of the embeddngs on all machnes s the set of all embeddngs of dp over G. In the frst round, the embeddngs of dp and dp 1 are joned to obtan that of P 1; n each subsequent round, the embeddngs of P 1 and dp are joned to obtan that of P. Snce the embeddngs of P 1 on one machne must be joned wth the embeddngs of dp on every machne, all the ntermedate results (.e., embeddngs of P 1 and dp ) must be cached and then shuffled based on the jon key and re-dstrbuted to the machnes. Synchronzaton s necessary snce shufflng and re-dstrbuton can only start when all machnes have the ntermedate results ready. [15] s smlar to [13], except that t allows decomposton unts to be clques as well as trees, and t uses bushy jon rather than left-deep jons 4. To compute the ntermedate results for these unts, t adopts a slghtly dfferent data partton strategy: t uses star-clquepreserved parttons. Both TwnTwg and SEED may generate huge ntermedate results, and shufflng, re-dstrbuton and synchronzaton cost a lost of tme. Our approach s dfferent n that we do not use jons, nstead we use expand-verfyflter on each machne, as such we generate less ntermedate results, and we do not need to re-dstrbute them to dfferent machnes. PSgL [21] s based on Pregel [17]. It maps the query vertces one at a tme followng breath-frst traversal, so that partal matches are expanded repeatedly untl the fnal results are obtaned. In ths way t avods explct jons smlar to our approach. However, there are mportant dfferences between PSgL and our system (RADS). (1) In each step of expanson, PSgL needs to shuffle and send the partal matches (ntermedate results) to other machnes, whle RADS does not need to do so. (2) PSgL stores each (partal) match as a node of a statc result tree, whle RADS stores the results n a dynamc and compact data structure. (3) There s no memory control n PSgL. Also closely related to our work are [6] and [5], whch ntroduce systems for parallelzng seral graph algorthms, ncludng (but not lmted to) subgraph somorphsm search algorthms. These systems partton the data graph nto dfferent machnes, but do not partton the query graph. Each machne evaluates the query pattern on ts own machne usng a seral algorthm (e.g., VF2) ndependently of others, but before that t must copy parts of the data graph from other machnes. These parts of the graph are determned as follows. For each boundary vertex v on the current machne, t copes the nodes and edges wthn a dstance d from v, where d s the dameter of the query graph. The fnal results are obtaned by collectng the fnal results from all machnes. Obvously, f the query graph dameter s large, and the data graph dameter s small (e.g., those of socal network graphs), or there are many boundary vertces nvolved, then the entre partton of the neghborng machne may have to be fetched. Ths wll generate heavy network traffc as well as burden on the memory of the local machne. The work [1] treats the query pattern as a conjunctve query, where each predcate represents an edge, and computes the results as a mult-way jon n a sngle round of map and reduce. As observed n [14], the problem wth ths approach s that most edges have to duplcated over sev- 4 There are ndependent optmzaton strateges n each paper, of course.

13 eral machnes n the map phase, hence there s a scalablty problem when the query pattern s complex. Qao et al [18] represent the set I P of all embeddngs of pattern P n a compressed form, code(i P ), based on a mnmum vertex cover of P. It decomposes the query graph P nto a core core(p ) and a set of so-called crystals {p 1,..., p k }, such that code(i P ) can be obtaned by jonng the compressed results of core(p ) and {p 1,..., p k }. Ths jon process can be parallelzed n map-reduce. The compressed results of core(p ) and the crystals can be obtaned from the compressed results of components of P. To expedte query processng, t bulds an ndex of all clques of the data graph, as shown n Table 2. Although no shufflng of ntermedate results s requred, the ndexes of [18] can be many tmes larger than the data graph, and computng/mantanng such bg ndexes can be very expensve, makng t less practcal. BgJon, one of the algorthms proposed n [2], treats a subgraph query as a jon of E P bnary relatons where each relaton represents an edge n P. Smlar to RADS and PSgL, t generates results by expandng partal results a vertex at a tme, assumng a fxed order of the query vertces. BgJon targets achevng worst-case optmalty. Dfferent from our work, t stll needs to shuffle and exchange ntermedate results, and therefore synchronzaton before that. 9. CONCLUSION We presented a practcal asynchronous subgraph enumeraton system RADS whose core s based on a new framework R-Meef(regon-grouped mult-round expand verfy & flter). By processng the data vertces far away from the border usng the sngle-machne algorthms, we solated a large part of vertces whch does not have to nvolve n the dstrbuted process. By passng verfcaton results of foregn edges and adjacency-lst of foregn vertces, RADS sgnfcantly reduced the network communcaton cost. We also proposed a compact format to store the generated ntermedate results. Our query executon plan and several memory control strateges ncludng foregn vertex cachng and regon groups are desgned to mprove the effcency and robustness of RADS. Our experment results have verfed the superorty of RADS compared wth state-of-the-art subgraph enumeraton approaches. 1. ADDITIONAL AUTHORS 11. REFERENCES [1] F. N. Afrat, D. Fotaks, and J. D. Ullman. Enumeratng subgraph nstances usng map-reduce. In ICDE, pages 62 73, 213. [2] K. Ammar, F. McSherry, S. Salhoglu, and M. Joglekar. Dstrbuted evaluaton of subgraph queres usng worst-case optmal and low-memory dataflows. PVLDB, 11(6):691 74, 218. [3] K. Ammar and M. T. Özsu. Expermental analyss of dstrbuted graph systems. PVLDB, 11(1): , 218. [4] R. J. Douglas. Np-completeness and degree restrcted spannng trees. Dscrete Mathematcs, 15(1-3):41 47, [5] W. Fan, P. Lu, X. Luo, J. Xu, Q. Yn, W. Yu, and R. Xu. Adaptve asynchronous parallelzaton of graph algorthms. In SIGMOD, pages , 218. [6] W. Fan, J. Xu, Y. Wu, W. Yu, J. Jang, Z. Zheng, B. Zhang, Y. Cao, and C. Tan. Parallelzng sequental graph computatons. In SIGMOD, pages , 217. [7] H. Fernau, J. Knes, D. Kratsch, A. Langer, M. Ledloff, D. Rable, and P. Rossmanth. An exact algorthm for the maxmum leaf spannng tree problem. Theoretcal Computer Scence, 412(45): , 211. [8] J. A. Grochow and M. Kells. Network motf dscovery usng subgraph enumeraton and symmetry-breakng. In RECOMB, volume 4453, pages 92 16, 27. [9] W. Gropp. MPICH2: A new start for mp mplementatons. In PVM/MPI, pages 7 7, 22. [1] W.-S. Han, J. Lee, and J.-H. Lee. Turbo so: towards ultrafast and robust subgraph somorphsm search n large graph databases. In SIGMOD, pages , 213. [11] G. Karyps and V. Kumar. Mets unstructured graph parttonng and sparse matrx orderng system, verson 2.. Techncal report, [12] H. Km, J. Lee, S. S. Bhowmck, W. Han, J. Lee, S. Ko, and M. H. A. Jarrah. DUALSIM: parallel subgraph enumeraton n a massve graph on a sngle machne. In SIGMOD, pages , 216. [13] L. La, L. Qn, X. Ln, and L. Chang. Scalable subgraph enumeraton n mapreduce. PVLDB, 8(1): , 215. [14] L. La, L. Qn, X. Ln, and L. Chang. Scalable subgraph enumeraton n mapreduce: a cost-orented approach. VLDB J., 26(3): , 217. [15] L. La, L. Qn, X. Ln, Y. Zhang, and L. Chang. Scalable dstrbuted subgraph enumeraton. PVLDB, 1(3): , 216. [16] J. Lee, W. Han, R. Kasperovcs, and J. Lee. An n-depth comparson of subgraph somorphsm algorthms n graph databases. PVLDB, 6(2): , 212. [17] G. Malewcz, M. H. Austern, A. J. C. Bk, J. C. Dehnert, I. Horn, N. Leser, and G. Czajkowsk. Pregel: a system for large-scale graph processng. In PODS, page 6, 29. [18] M. Qao, H. Zhang, and H. Cheng. Subgraph matchng: on compresson and computaton. PVLDB, 11(2): , 217. [19] X. Ren and J. Wang. Explotng vertex relatonshps n speedng up subgraph somorphsm over large graphs. PVLDB, 8(5): , 215. [2] B. Schlng. The Boost C++ Lbrares. XML Press, 211. [21] Y. Shao, B. Cu, L. Chen, L. Ma, J. Yao, and N. Xu. Parallel subgraph lstng n a large-scale graph. In SIGMOD, pages , 214. APPENDIX A. PROOFS A.1 Proof of Proposton 1

14 Proof. Suppose there s an embeddng f such that f(u) = v, f(u ) = v. We show dst(v, v ) BD Gt (v), therefore v must be on M t. Any shortest path from u to u wll be mapped by f to a path n G, therefore dst(v, v ) dst(u, u ) Span P (u). By assumpton, Span P (u) BD Gt (v), therefore dst(v, v ) BD Gt (v). A.2 Proof of Theorem 1 Proof. Suppose {dp,..., dp k } s an executon plan. The plan has k + 1 decomposton unts. Clearly the pvot vertces of the decomposton unts form a connected domnatng set of P. Therefore, k + 1 c P. Ths proves any executon plan has at least c P decomposton unts. Now suppose T s a MLST of P. From V P = c P + l P we know the number of non-leaf vertces n T s c P. We can construct an executon plan by choosng one of the non-leaf vertces v as dp.pv, and all neghbors of v n T as the vertces n dp.lf. Regardng v as the root of the spannng tree T, we then choose each of the non-leaf chldren v of v n T as the pvot vertex of the next decomposton unt dp.pv, and all chldren of v as the vertces n dp.lf. Repeat ths process untl every non-leaf vertex of T becomes the pvot vertex of a decomposton unt. Ths decomposton has exactly c P unts, and t forms an executon plan. Ths shows that there exsts an executon plan wth c P decomposton unts. B. IMPLEMENTATION OF R-Meef We present the mplementaton of R-Meef as shown n Algorthm 4. Algorthm 4: R-Meef Framework Input: Query pattern P, partton G t on machne M t, executon plan P L Output: R Gt (P ) 1 RG = {rg... rg k } regongroups ( C(dp.pv, M t) ) 2 for each regon group rg RG do 3 nt embeddng tre ET wth sze V P 4 nt edge verfcaton ndex I 5 for each data v rg do 6 f (dp.pv, v) 7 I expandembedt re(f, M t, dp, ET ) 8 R verfyf oregne(i) 9 flterf aledembed(r, I, ET ) 1 for Round = 1 to P L do 11 clear I 12 fetchf oregnv () 13 for each f I do 14 I expandembedt re(f, M t, dp, ET ) 15 R verfyf oregne(i) 16 flterf aledembed(r, I, ET ) 17 R G(P ) R G(P ) ET 18 clear ET Wthn each machne, we group the canddate data vertces of dp.pv wthn M t nto regon groups (Lne 1). For each regon group rg, a mult-round mappng process s conducted (Lne 2 to 18). Wthn each round, we use a data structure ET (embeddng tre) to save the generated ntermedate results,.e., embeddngs and embeddng canddates (Lne 3). The edge verfcaton ndex I s ntalzed n Lne 4, whch wll be reset for each round of processng (lne 11). (1) Frst Round (round ) Startng from each canddate v of rg, we match v to dp.pv n the executon plan. After the pvot vertex s matched, we fnd all the ECs of dp wth respect to M t and compress them nto ET. We use a functon expandembedt re to represent ths process (Lne 7). For each EC compressed n ET, ts undetermned edges need to be verfed n order to determne whether ths EC s an embeddng of dp. We record ths nformaton n the edge verfcaton ndex I, whch s constructed n the expandembedt re functon. After we have the EVI I n M t, we send a verfye request to verfy those undetermned edges wthn I n the machne whch has the ablty to verfy t (functon verfyf oregne n Lne 8). After the edges n I are all verfed, we remove the faled ECs from ET (Lne 9). (2) Other Rounds For each of the remanng rounds of the executon plan, we frst clear the EVI I from prevous round (Lne 11). In the th round, we want to fnd all the ECs of P based on the embeddngs n R Gt (P 1) (where dp.pv has been matched). The process s to expand every embeddng f of R Gt (P 1) wth each embeddng canddate of dp wthn the neghbourhood of f(dp.pv). If not all the data vertces matched to dp.pv by the ECs n R Gt (P 1) resde n M t, we wll have to fetch the adjacency-lsts of those foregn vertces from other machnes n order to expand from them. A sub-procedure fetchf oregnv s used to represent ths process (Lne 12). After fetchng, for each embeddng f of R Gt (P 1), we fnd all the ECs of P by expandng from f(dp.pv) (Lne 14). The found ECs are compressed nto ET. Then verf oregne and flterf aledembed are called to make sure that the faled ECs are fltered out from the embeddng tre, whch wll only contan the actual embeddngs of P,.e., R Gt (P ) (Lne 15, 16). After all the rounds of ths regon group have fnshed, we have a set of embeddngs of P compressed nto ET. The results obtaned from all the regon groups are put together to obtan the embeddngs found by M t. One mportant thng to note s that f a foregn vertex s already cached n the local machne, for the undetermned edges attached to ths vertex, we can verfy them locally wthout sendng requests to other machnes. Also we do not re-fetch any foregn vertex f t s already cached prevously. Example 7. Consder the data graph G n Fgure 2, where the vertces marked wth dashed border lnes resde n M 1 and the other vertces resde n M 2. Consder the pattern P and executon plan P L gven n Example 3. We assume the preserved orders due to symmetry breakng are: u 1 < u 2, u 3 < u 6, u 4 < u 5 and u 8 < u 9. There are two vertces {v, v 2} n M and two vertces {v 1, v 1} n M 2 wth a degree not smaller than that of dp.pv. Therefore n M 1, we have C(dp.pv) = {v, v 2} and n M 2 we have C(dp.pv) = {v 1, v 1}. After groupng, assume we have RG = {rg, rg 1} where rg = {v } and rg 1 = {v 2} n M 1, and RG = {rg } where rg = {v 1, v 1} n M 2. Consder the regon group rg n M 1. In round, we frst match v to dp.pv. Expandng from v, we may have ECs ncludng by not lmt to (we lock u 7 to v 7 for easy demonstraton):

15 f G1 = {(u, v ), (u 1, v 1), (u 2, v 2), (u 7, v 7)} f G 1 = {(u, v ), (u 1, v 1), (u 2, v 9), (u 7, v 7)} f G 1 = {(u, v ), (u 1, v 9), (u 2, v 11), (u 7, v 7)} We compress these ECs nto ET. Note that a mappng such as {(u, v ), (u 1, v 1), (u 2, v 11), (u 7, v 7)} s not an EC of dp w.r.t M 1 snce (v 1, v 11) can be locally verfed to be nonexstent. Snce the undermned edge (v 1, v 9) of f G 1 cannot be determned n M 1, we put {(v 1, v 9), < f G 1 >} nto the EVI I. We then ask M 2 to verfy the exstence of the edge. M 2 returns false, therefore f G 1 wll be removed from ET. In round 1, we have two embeddngs R Gt (P )= {f G1, f G 1 } to start wth. To extend f G1 and f G 1, we need to fetch the adjacency-lsts of v 1 and v 9 respectvely. We send a sngle fetchv request to fetch the adjacency-lsts of v 1 and v 9 from M 2. After expanson from v 1, we get a sngle embeddng {(u, v ), (u 1, v 1), (u 2, v 2), (u 3, v 3), (u 4, v 4), (u 7, v 7) } n R Gt (P 1). There s no embeddng of P 1 expanded from v 9. Hence f G 1 wll be removed from the embeddng tre. In round 2, we expand from v 2 to get the ECs of P 2. dp 2.pv was already mapped to v 2 as seen above, and v 2 has neghbors v 5, v 6 and v 1 that are not matched to any query vertces. Snce there are sblng edge (u 5, u 6) and cross-unt edge (u 4, u 5) n P 2, we need to verfy the exstence of (v 4, v 5) and (v 5, v 6) f we want to map u 5 to v 5 and map u 6 to v 6. The exstence of both (v 4, v 5) and (v 5, v 6) can be verfed locally. Smlarly f we want to map u 5 to v 5, u 6 to v 1, we wll have to verfy the exstence of (v 5, v 1), and so on. It can be locally verfed that (v 5, v 1) does not exst, and remotely verfed that (v 6, v 1) does not exst. Therefore, at the end of ths round, we wll get a sngle embeddng for P 2 whch extends the embeddng for P 1 by mappng u 5, u 6 to v 5, v 6 respectvely. We expand the embeddng tre accordngly. Followng the above process, after we process the last round, we have an embeddng of P startng from regon group rg n machne M 1 wll be saved n ET : f G1 = {(u, v ), (u 1, v 1), (u 2, v 2), (u 3, v 3), (u 4, v 4), (u 5, v 5), (u 6, v 6), (u 7, v 7), (u 8, v 9), (u 9, v 11)} C. MORE EXPERIMENTAL RESULTS C.1 Scalablty Test We compare the scalablty of the fve approaches by varyng the number of nodes n the cluster (5, 1, 15), 3 cases n total. The queres we processed are shown n Fgure 7. Instead of reportng the processng tme, here we report the rato between the total processng tme of all queres usng 5 nodes and that of the other two cases, whch we call scalablty rato. The results are as shown n Fgure 12. The most mportant thng to observe s that our approach demonstrates lnear speed-up when the number of nodes s ncreased for Roadnet and DBLP. The reason for Roadnet s because most vertces of each partton are far away from the border, therefore the majorty of embeddngs can be found by SM-E. Each machne of our approach are almost ndependent except for some workload sharng. As for DBLP, whch s a small graph, almost all vertces can be cached n memory, RADS takes full advantage of t. Because TwnTwg, SEED and PSgL faled some queres for LveJournal and UK22, we omt ther scalablty results n those two datasets. The dfference between Crystal and RADS s not much whle RADS s better for both. C.2 Effectveness of Query Executon Plan Scalablty Rato Scalablty Rato Seed Twntwg Crystal Pads Psgl Crystal (a) Roadnet (c) LveJournal Pads Scalablty Rato Scalablty Rato Seed Twntwg Crystal Pads Psgl Crystal Fgure 12: Scalablty Test (b) DBLP Pads (d) UK22 To valdate the effectveness of our strategy for choosng query executon plan, we compare the processng tme of RADS wth two other baselne plans whch are generated by replacng the executon plan of RADS wth the executon plans RanS and RanM, respectvely. RanS represents a plan consstng of random star decomposton unts (no lmt on the sze of the star) and RanM represents plan wth mnmum number of rounds wthout consderng the strateges n Secton 4.3. The cluster we used for ths test conssts of 1 nodes. In order to cover more random query plans, we run each test 5 tmes and report the average. The queres are as shown n Fgure 7. For queres q 1 to q 3, the query plans generated n the above three mplementatons are almost the same. Therefore, we omt the data for those three queres. Tme Elapsed(s) Tme Elapsed(s x1) RanS RanM Pads q4 q5 q6 q7 q8 (a) Roadnet RanS RanM Pads q4 q5 q6 q7 q8 (c) LveJournal Tme Elapsed(s) Tme Elapsed(s) RanS RanM Pads q4 q5 q6 q7 q8 (b) DBLP RanS RanM Pads q4 q5 q6 q7 q8 (d) UK22 Fgure 13: Effectveness of Executon Plan The results of Roadnet, DBLP, LveJournal and UK22 are as shown n Fgure 13. For RoadNet, t s not surprsng to see that the processng tme are almost the same for the 3 executon plans. Ths s because most vertces of each RoadNet partton can be processed by SM-E, and dfferent dstrbuted query executon plans have lttle effect over the total processng tme. For all other three data sets, t s obvous that our fully optmzed executon plan s playng an mportant role n mprovng the query processng tme, especally when dealng wth large graphs such as LveJournal

16 and UK22 where large volumes of network communcaton are generated and can be shared. C.3 Effectveness of Compresson To show the effectveness of our compresson strategy, we conducted an experment to compare the space cost of the smple embeddng-lst (EL) wth that of our embeddng tre (ET). We use the RoadNet and DBLP data sets for ths test. The queres are as shown n Fgure 7. We omt the test over the other two data sets because the uncompressed volume of the results are too bg. Tme Elapsed(s) Tme Elapsed(s) Seed Crystal Pads q1 q2 q3 q4 (a) Roadnet Seed Crystal Pads Tme Elapsed(s) Tme Elapsed(s) Seed Crystal Pads q1 q2 q3 q4 (b) DBLP Seed Crystal Pads Table 3: Compresson on Roadnet(Mb) 2 1 q1 q2 q3 q4 1 4 q1 q2 q3 q4 Query q 1 q 2 q 3 q 4 q 5 q 6 q 7 q 8 EL ET (c) LveJournal (d) UK22 Fgure 15: Results of queres wth more clque Table 4: Compresson on DBLP (Gb) Query q 1 q 2 q 3 q 4 q 5 q 6 q 7 q 8 EL ET.8M Crystal for the queres q 1, q 2 and q 4. Ths s reasonable because of the heavy clque ndex of Crystal. However, RADS has a notceable mprovement over Crystal when processng q 3, where the verfcaton edges helped RADS fltered a lot of unpromsng canddates. The results are as shown n Table 3 and Table 4, respectvely. For RoadNet the ntermedate results generated by Queres 7 and 8 are neglgble, therefore they are not lsted. The results for both datasets demonstrate a good compresson rato. It s worth notng that the compresson ratos of all queres over RoadNet are smaller than that over DBLP. Ths s because the embeddngs of Roadnet are very dverse and they do not share a lot of common vertces. C.4 More Query Processng Results Fgure 14: Queres wth more clques As aforementoned, SEED supports clque as decomposton unt and Crystal ndexes the clques n the graph storage. Both methods shall have advantages when processng queres wth more clques. It s noted that most of the queres n Fgure 7 do not contan any clque. For sound farness, we also tested some queres from [18] for the methods of SEED, Crystal and RADS. The queres are as shown n Fgure 14, all of whch have clques. In contrast to the experment n Secton 7.1, for SEED, here we also used the program mplemented by ts orgnal authors. Ths wll guarantee both SEED and Crystal have ther maxmum optmzed performance when processng those queres. The results are as shown n Fgure 15. We omt the results of SEED for UK22 snce ts tme cost s much hgher compared wth the other two methods. Beng consstent wth the result n Secton 7.1, RADS performs constantly faster than SEED and Crystal when runnng on Roadnet (more than 1 order of magntude) and on DBLP. For other datasets, RADS s stll better than SEED for all queres, whle worse than

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy