GIVEN two graphs G and H, the subgraph isomorphism

Size: px
Start display at page:

Download "GIVEN two graphs G and H, the subgraph isomorphism"

Transcription

1 IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS 1 Fndng and countng tree-lke subgraphs usng MapReduce Zhao Zhao, Langsh Chen, Mha Avram, Meng L, Guanyng Wang, Al Butt, Maleq Khan, Madhav Marathe, Judy Qu, Anl Vullkant Abstract Several varants of the subgraph somorphsm problem, e.g., fndng, countng and estmatng frequences of subgraphs n networks arse n a number of real world applcatons, such as genetc network analyss n bonformatcs, web analyss, dsease dffuson predcton and socal network analyss. These problems are computatonally challengng to scale to very large networks wth mllons of nodes. In ths paper, we present SAHAD, a MapReduce based algorthm for detectng and countng trees of bounded sze usng the elegant color codng technque, developed by N. Alon, R. Yuster and U. Zwck, Journal of the ACM (JACM) SAHAD s a randomzed algorthm, and we show rgorous bounds on the approxmaton qualty and the performance. We mplement SAHAD on two dfferent frameworks: the standard Hadoop model and Harp, whch s more of a hgh performance computng envronment, and evaluate ts performance on a varety of synthetc and real networks. SAHAD scales to very large networks comprsng of nodes and edges and tree-lke (acyclc) templates wth up to 1 nodes. Further, we extend our results by mplementng our algorthm usng the Harp framework. The new mplementaton gves two orders of magntude mprovement n performance over the standard Hadoop mplementaton and acheves comparable or even better performance than start-of-the-art MPI soluton. Index Terms subgraph somorphsm, graph parttonng, MapReduce, Hadoop, Harp 1 INTRODUCTION GIVEN two graphs G and H, the subgraph somorphsm problem asks f H s somorphc to a subgraph of G. The countng problem assocated wth ths seeks to count the number of copes of H n G. These and other varants are fundamental problems n Network Scence and have a wde range of applcatons n areas such as bonformatcs, socal networks, semantc web, transportaton and publc health. Analysts n these areas tend to search for meanngful patterns n networked data; and these patterns are often specfc subgraphs such as trees. Three dfferent varants of subgraph analyss problems have been studed extensvely. The frst verson nvolves countng specfc subgraphs, whch has applcatons n bonformatcs [4], [16]. The second nvolves fndng the most frequent subgraphs ether n a sngle network or n a famly of networks ths has been used n fndng patterns n bonformatcs (e.g., []), recommendaton networks [], chemcal structure analyss [3], and detectng memory leaks [5]. The thrd nvolves fndng subgraphs whch are ether over-represented or underrepresented, compared to random networks wth smlar Zhao Zhao, Al Butt, Madhav Marathe and Anl Vullkant are wth the Network Dynamcs and Smulaton Scence Laboratory, Bocomplexty Insttute & Department of Computer Scence, Vrgna Tech, VA, 461. E-mal: zhaozhao@vt.edu, butta@cs.vt.edu, mmarathe@vt.edu, vsakumar@vt.edu Maleq Khan s wth the Department of Electrcal Engneerng and Computer Scence, Texas A&M Unversty-Kngsvlle. E-mal: maleq.khan@tamuk.edu Langsh Chen, Meng L, and Mha Avram are wth the Computer Scence Department, Indana Unversty. Emal: lc37@ndana.edu, l56@umal.u.edu, mavram@umal.u.edu Judy Qu s wth the Intellgent Systems Engneerng Department, Indana Unversty. Emal: xqu@ndana.edu Guanyng Wang s workng wth Google Inc. Emal: wang.guanyng@gmal.com propertes such subgraphs are referred to as motfs. Mlo et al. [6] dentfy motfs n many networks, such as protenproten nteracton (PPI) networks, ecosystem food webs and neuronal connectvty networks. Subgraph counts have also been used n characterzng networks [8]. The Subgraph Isomorphsm problem and ts varants s well known to be computatonally challengng. In general the decson verson of the problem s NP-hard, and the countng problem s #P-hard. Extensve work has been done n theoretcal computer scence on ths problem; we refer the reader to the recent papers by [1], [1], [4] for an extensve dscusson on the decson and countng complexty of the problem and tractable results for varous parameterzed versons of the problem. The prmary focus of ths paper s on the three mentoned varants of the subgraph somorphsm problem when k, the number of nodes n the template H, s fxed. Lettng n be the number of nodes n G, one can mmedately get smple algorthms wth runnng tme O(n k ) to fnd and count the number of copes of template H n G. Note that n ths paper we focus on non-nduced subgraph matchng. When the template s a tree or has a bounded treewdth, Alon et al. [4] present an elegant randomzed approxmaton algorthm wth runnng tme O(k E k e k log (1/δ) 1 ε ), where ε and δ are error and confdence parameters, respectvely, based on the color codng technque. There result was sgnfcantly mproved by Kouts and Wllams [19] who gave an algorthm wth runnng tme of O( k E ). A lot of practcal heurstcs have also been developed for varous versons of these problems, especally for the frequent subgraph mnng problem. An example s the Apror method, whch uses a level-wse exploraton of the template [18], [], n generatng canddates for subgraphs at each level; these have been made to run faster by better

2 IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS prunng and exploraton technques, e.g., [], [], [4]. Other approaches n relatonal databases and data mnng nvolve queres for specfcally labeled subgraphs, and have combned relatonal database technques wth careful depthfrst exploraton, e.g., [8], [31], [3]. Most of these approaches are sequental, and generally scale to modest sze graphs G and templates H. Parallelsm s necessary to scale to much larger networks and templates. In general, these approaches are hard to parallelze as t s dffcult to decompose the task nto ndependent subtasks. Furthermore, t s not clear f canddate generaton approaches [], [], [4] can be parallelzed and scaled to large graphs and computng clusters. Two recent approaches for parallel algorthms, related to ths work, are [8], [41]. The approach of Bröcheler et al. [8] requres a complex preprocessng and enumeraton process, whch has hgh end-to-end tme, whle the approach of [41] nvolves an MPI-based mplementaton wth a very hgh communcaton overhead for larger templates. Two other papers [7], [36] develop MapReduce based algorthms for approxmately countng the number of trangles wth a work complexty bound of O( E ). The development of parallel algorthms for subgraph analyss wth rgorous polynomal work complexty, whch are mplementable on heterogeneous computng resources remans an open problem. Due to the complexty of enumeratng subgraphs, people propose to compute some metrcs of the subgraph whch s ant-monotone to the subgraph sze. The algorthm reported n [3] s capable of computng subgraph support on large networks wth up to 1 Bllon edges. However, t requres each machne to have a copy of the graph n memory whch lmts ts scalablty to larger graphs. Addtonally, computng support requres much less computatonal effort than countng subgraphs. Another recent work also employs MapReduce to match subgraphs [35] whch scales to networks wth up to 3 mllon edges. Other approaches studed n the context of data mnng and databases, e.g., [8], [31], [3], are capable of processng large networks, but are usually slow due to lmtatons of database technques for processng networks. Our contrbutons. In ths paper, we present SAHAD, a new algorthm for Subgraph Analyss usng Hadoop, wth rgorously provable polynomal work complexty for several varants of the subgraph somorphsm problem when H s a tree. SAHAD scales to very large graphs, and because of the Hadoop mplementaton, runs flexbly on a varety of computng resources, ncludng Amazon EC cloud. We also adapt SAHAD n the Harp [9] framework to utlze ts advanced MPI-lke collectve communcaton. It scales to graphs wth up to 1. bllon edges. Our specfc contrbutons are dscussed below. 1. SAHAD s the frst MapReduce-based algorthm for fndng and countng labeled trees n very large networks. The only pror Hadoop based approaches have been on trangles [7], [36], [37] on very large networks, or more general subgraphs on relatvely small networks [3]. Our man techncal contrbuton s the development of a Hadoop verson of the color codng algorthm of Alon et al. [4], [5], whch s a (sequental) randomzed approxmaton algorthm for subgraph countng. It s a randomzed approxmaton algorthm that for any ε, δ, gves a (1±ε) approxmaton to the number of embeddngs wth probablty at least 1 δ. We prove that the work complexty of SAHAD s O(k E G k e k log (1/δ) 1 ε ), whch s more than the runnng tme of the sequental algorthm of [4] by just a factor of k.. We demonstrate our results on nstances generated usng the Erdös-Reny random graph model, the Chung- Lu random graph model and on synthetc socal contact graphs for Mam cty and Chcago cty (wth 5.7 and 68.9 mllon edges, respectvely), constructed usng the methodology of [7]. We study the performance of countng unlabeled/labeled templates wth up to 1 nodes. The total runnng tmes for templates wth 1 nodes on Mam and Chcago networks are and 35 mnutes, respectvely; note that these are the total end-to-end tmes, and do not requre any addtonal pre-processng (unlke, e.g. [8]). 3. We dscuss how our basc algorthms for countng subgraphs can be extended to compute supervsed motfs and graphlet frequency dstrbutons. They can also be extended to count labeled subgraphs. 4. SAHAD runs easly on heterogeneous computng resources, e.g., t scales well when we request up to 16 nodes on a medum sze cluster wth 3 cores per node. Our Hadoop based mplementaton s also amenable to runnng on publc clouds, e.g., Amazon EC [6]. Except for a 1-node template whch produces extremely large amount of data so as to ncur the I/O bottleneck on the vrtual dsk of EC. It s worth notng here that the performance of SAHAD on EC s almost the same as on the local cluster. Ths would enable researchers to perform useful queres even f they do not have access to large resources, such as those requred to run prevously proposed queryng nfrastructures. We beleve ths aspect s unque to SAHAD and lowers the barrer-to-entry for scentfc researchers to utlze advanced computng resources. 5. We study the performance mprovement n extensons of the standard Hadoop framework. The enhanced algorthm s called EN-SAHAD. Frst, we consder technques to explctly control the sortng and nter partton communcatons n Hadoop. We fnd that reducng the sortng step by pre-allocatng can mprove the performance by about %, but mproved parttonng does not seem to help. 6. Fnally, we mplement SAHAD wthn the Harp [9] framework the new algorthm s called HARPSAHAD+. HARPSAHAD+ yelds an order of magntude mprovement n performance, as a result of ts flexblty n task schedulng, data flow control and n memory cache. We are therefore able to scale to networks wth up to bllons of edges usng the HARPSAHAD+ and obtan a comparable performance when compared to a state-of-the-art MPI/C++ mplementaton. Organzaton. Secton 3 ntroduces the background for the subgraph countng problem and MapReduce, the opensourced mplementaton Hadoop and the Harp system. Then n Secton 4, we gve a bref overvew of the color codng algorthm proposed by Alon et. al n [4]. Furthermore, n Secton 5 we present our MapReduce mplementatons. In Secton 6 we study the computaton cost of our algorthm. Secton 7 proposes several varatons of the subgraph countng problems that can be computed usng our framework,

3 IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS 3 whle secton 8 dscusses experment results of SAHAD, EN- SAHAD and HARPSAHAD+. Fnally, Secton 9 concludes the paper. Extenson from conference verson. The SAHAD algorthm appeared n [4]. The results on EN-SAHAD and HARPSA- HAD+ are new addtons. Snce the publcaton of [4], there has been more work done on parallelzng the color codng technque, e.g., [33], [34]. However, none of these have been based on MapReduce and ts generalzatons. RELATED WORK As mentoned earler, the subgraph somorphsm problem and ts varant has been studed extensvely by theoretcal computer scentsts; see [1], [1], [13], [17], [4], [38] for complexty theoretc results. Marx and Plpczuk [4] undertake a comprehensve study of the decson problem and provde strong lower bounds ncludng fxed parameter ntractablty results. They also study the complexty of the problem as a functon of structural propertes of G and H. A varety of dfferent algorthms and heurstcs have been developed for dfferent doman specfc versons of subgraph somorphsm problems. One verson nvolves fndng frequent subgraphs, and many approaches for ths problem use the Apror method from frequent tem set mnng [14], [18], []. These approaches nvolve canddate generaton durng a breadth frst search on the subset lattce and a determnaton of the support of tem sets by a subset test. A varety of optmzatons have been developed, e.g., usng a DFS order to avod the cost of canddate generaton [], [4] or prunng technques, e.g., []. A related problem s that of computng the graphlet frequency dstrbuton, whch generalzes the degree dstrbuton [8]. Another class of results for frequent subgraph fndng s based on the powerful technque of color codng (whch also forms the bass of our paper), e.g., [4], [16], [41], whch has been used for approxmatng the number of embeddngs of templates that are trees or tree-lke. In [4], Alon et al. use color codng to compute the dstrbuton of treelets wth szes 8, 9 and 1, on the protenproten nteracton networks of Yeast. The color codng technque s further explored and mproved n [16], n terms of worst case performance and practcal consderatons. For example, by ncreasng the number of colors, they speed up the color codng algorthm wth up to orders of magntude. They also reduce the memory usage for mnmum weght paths fndng, by carefully removng unsatsfed canddates, and reducng the color set storage. A recent work developed by Venkatesan et al [?] extends color codng to subgraphs wth treewdth up to, and they scale ther algorthm to graph wth up to.7 mllon edges. Most of these approaches n bonformatcs applcatons nvolve small templates, and have only been scaled to relatvely small graphs wth at most 1 4 nodes (apart from [41], whch shows scalng to much larger graphs by means of a parallel mplementaton). Other settngs n relatonal databases and data mnng have nvolved queres for specfc labeled subgraphs. Some of the approaches for these problems have combned relatonal database technques, based on careful ndexng and translaton of queres, wth such depth-frst exploraton strategy that s dstrbuted over dfferent parttons of the graph e.g., [8], [31], [3], and scale to very large graphs. For nstance, Bröcheler et al. [8] demonstrate labeled subgraph queres wth up to 7-node templates on graphs wth over half a bllon edges, by carefully parttonng the massve network usng mnmum edge cuts, and dstrbutng the parttons on computng nodes. A shared-memory parallelzaton wth an OpenMP mplementaton of the color codng approach s gven n [33]. Ths algorthm acheves a speed up of 1 n a graph wth 1.5 mllon nodes and 31 mllon edges. A more recent work [34] parallelzes the dynamc processng of the color-codng algorthm to enumerate subgraphs and s able to handle networks as large as bllon edges, wth template sze up to 1. 3 BACKGROUND 3.1 Prelmnares and problem statement We consder labeled graphs G = (V G, E G, L, l G ), where V G and E G are the sets of nodes and edges, L s a set of labels and l G : V L s a labelng on the nodes. A graph H = (V H, E H, L, l H ) s a non-nduced subgraph of G f we have V H V G and E H E G. We say that a template graph T = (V T, E T, L, l T ) s somorphc to a nonnduced subgraph H = (V H, E H, L, l H ) of G f there exsts a bjecton f : V T V H such that: () for each (u, v) E T, we have (f(u), f(v)) E H, and () for each v V T, we have l T (v) = l H (f(v)). In ths paper, we assume T s a tree. We wll consder trees to be rooted, and use ρ = ρ(t ) V T to denote the root of T, whch s arbtrarly chosen. If T s somorphc to a non-nduced subgraph H wth the mappng f( ), we also say that H s a non-nduced embeddng of T wth the root ρ(t ) mapped to node f(ρ(t )). Fgure 1 shows an example of a non-nduced embeddng of template T n a graph G. Let emb(t, G) denote the number of all embeddngs of template T n graph G. Here, we focus on approxmatng emb(t, G). T u1 u u3 u 4 v9 v 8 Fg. 1: Here the shaded subgraph s a non-nduced embeddng of T. The mappng of the template to the subgraph s denoted wth the arrow. An (ε, δ)-approxmaton to emb(t, G). We say that a randomzed algorthm A produces an (ε, δ)-approxmaton to emb(t, G), f the estmate Z produced by A satsfes: Pr[ Z emb(t, G) > ε emb(t, G)] δ; n other words, A s requred to produce an estmate that s close to emb(t, G), wth hgh probablty. Problems studed. We consder the followng two problems: 1) Subgraph countng: Gven a template T and graph G, compute an (ε, δ)-approxmaton to emb(t, G). When the labels can be dsregarded, we refer to ths as the v7 v6 G v 5 v 4 v 3 v1 v

4 IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS 4 Unlabeled Subgraph Countng problem. Otherwse, t s referred to as the Labeled Subgraph Countng problem. ) Graphlet Frequency Dstrbuton (GFD) [8]: a graphlet s another name for a subgraph. We say a node touches a graphlet T, f t s contaned n an embeddng of T n the graph G. The graphlet degree of a node v s the number of graphlets t touches. Gven a sze parameter k, the GFD n a graph G s the frequency dstrbuton of the graphlet degrees of all nodes wth respect to all graphlets of sze up to k. The specfc problem s to obtan an approxmaton to the GFD. In ths paper, we wll focus on treelets, whch only consders all trees of sze up to k. 3. MapReduce, Hadoop and Harp MapReduce and ts extensons have become a domnant computaton model n bg data analyss. It nvolves two stages for data processng: (a) dvdng the nput nto dstnct map tasks and dstrbutng to multple computng enttes, and (b) mergng the results of ndvdual computng enttes n the reduce tasks to produce the fnal output [11]. The MapReduce model processes data n the form of key-value pars k, v. An applcaton frst takes pars of the form k 1, v 1 as nput to the map functon, n whch one or more k, v pars are produced for each nput par. Then the MapReduce re-organzes all k, v pars and aggregates all tems v that are assocated wth the same key k, whch are then processed by a reduce functon. Hadoop [39] s an open-sourced mplementaton of MapReduce. By defnng applcaton specfc map and reduce functons, the user can employ Hadoop to manage and allocate approprate resources n order to perform the tasks, wthout knowng the complexty of load balancng, communcaton and task schedulng. Due to the relablty and scalablty n handlng vast amount of computaton n parallel, Hadoop s becomng a de facto soluton for large parallel computng tasks. Hadoop falls short n two aspects though: () the hgh I/O cost nvolved wthn the mapper, shufflng and the reducer snce the data s always read and wrte from the dsk n every stage of a Hadoop job and () global synchronzaton of the mapper and reducer,.e. reducers can start only when all mappers have completed ther tasks and vce versa, thus reducng the effcent usage of the computng resources. To conquer the problems that Hadoop s facng, we further extend our work to use the Harp platform [9]. Harp ntroduces full collectve communcaton (broadcast, reduce, allgather, allreduce, rotaton, regroup or push & pull), addng a separate communcaton abstracton. The advantage of usng n-memory collectve communcaton replacng the shufflng phase s that fne-graned data algnment and data transfer of many synchronzaton patterns can be optmzed. Harp categorzes four types of computaton models (Lockng, Rotaton, Allreduce, Asynchronous) that are based on the synchronzaton patterns and the effectveness of the model parameter update. They provde the bass for a systematc approach to parallelzng teratve algorthms. Fgure shows the four categores of the computng model. The Harp framework has been used by 35 students at Indana Unversty for ther course projects. Now t has Fg. : Harp has 4 computaton models: (A) Lockng,(B) Rotaton, (C) AllReduce, (D) Asynchronous been released as an open source project that s avalable at the publc gthub doman [1]. Harp provdes a collecton of teratve machne learnng and data analyss algorthms (e.g. Kmeans, Mult-class Logstc Regresson, Random Forests, Support Vector Machne, Neural Networks, Latent Drchlet Allocaton, Matrx Factorzaton, Mult-Dmensonal Scalng) that have been tested and benchmarked on OpenStack Cloud and HPC platforms ncludng Haswell and Knghts Landng archtectures. It has also been used for Subgraph mnng, Force-Drected Graph Drawng, and Image classfcaton applcatons. 4 THE SEQUENTIAL ALGORITHM: COLOR CODING TABLE 1: Notatons symbol descrpton symbol descrpton G graph T, T, T template and sub-templates n, m # nodes, # edges k # nodes n T ρ root of T S, s color set, the th color d(v) degree of node v N(v) neghbors of node v We brefly ntroduce the color codng algorthm for subgraph countng [5], whch gves a randomzed approxmaton scheme for countng trees n a graph. Some of the notaton used n the paper s lsted n Table 1. Hgh level descrpton. There are two man deas underlyng the color codng algorthm of [5]. 1) Colorful embeddngs: Color the nodes of the graph wth k colors where k V T, and only count colorful embeddngs an embeddng H of the template T s colorful f each node n H has a dstnct color. The advantage of ths s that the number of colorful embeddngs can be counted by a smple and natural dynamc program. a) In partcular, let C(v, T (ρ), S) be the number of colorful embeddngs of T wth node v V G mapped to the root ρ, and usng the color set S, where V T = S. b) Suppose (ρ = u 1, u ) s an edge ncdent on the root node ρ n T. Let tree T be parttoned nto trees T 1 and T when the edge (u 1, u ) s removed, wth roots ρ 1 = u 1 and ρ = u of the trees T 1 and T, respectvely.

5 IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS 5 c) Suppose S 1 and S are dsjont subsets of colors such that S 1 = V T1, S = V T. Let H 1 and H be two colorful embeddngs of T 1 and T usng color sets S 1 and S, respectvely, wth ρ 1 and ρ mapped to neghborng nodes v 1 V G and v V G, respectvely. Then, H 1 and H must be non-overlappng, because they have dstnct colors. d) Therefore, C(v 1, T, S) = C(v 1, T 1 (v 1 ), S 1 ) S=S 1 S v N(v 1) C(v, T (v ), S ), where the frst summaton s over all neghbors v of v 1 and the second summaton s over all parttons S 1 S of S. ) Random colorngs: If the colorng s done randomly wth k = V T colors, there s a reasonable probablty k! that k k an embeddng s colorful ths allows us to get a good approxmaton of the number of embeddngs. Algorthm 1 The sequental color codng algorthm. 1: Input: Graph G = (V, E) and template T = (V T, E T ) : Output: Approxmaton to emb(t, G) 3: 4: For each v V G, pck a color c(v) S = {1,..., k} unformly at random, where k = V T. 5: Partton the tree T nto subtrees recursvely to form a set T usng algorthm PARTITION(T (ρ)). For each tree T T, we have a root ρ. Furthermore, f V T > 1, T s parttoned nto two trees T 1, T wth roots ρ 1 = ρ and ρ, respectvely, whch are referred to as the actve and passve chldren of T. 6: For each v V G, T T wth root ρ, and subset S S, wth S = T, we compute C(v, T (ρ ), S ) usng the the recurrence ( 1) below: c(v, T (ρ ), S ) = 1 d c(v, T (ρ ), S ) u (1) c(u, T (τ ), S ), where d s equal to one plus the number of sblngs of τ whch are roots of subtrees somorphc to T (τ ). 7: For the jth random colorng, let C (j) = 1 q k! k k v V G c(v, T (ρ), S), () where q denotes the number of node ρ V T such that T s somorphc to tself when ρ s mapped to ρ. 8: Repeat the above steps N = O( ek log(1/δ) ε ) tmes, and partton N estmates C (1),..., C (N) nto t = O(log(1/δ)) sets. Let Z j be the average of set j. Output the medan of Z 1,..., Z t. Algorthm 1 descrbes the sequental color codng algorthm. Fgure 3 gves an example of computng Eq PARALLEL ALGORITHMS In ths secton, we present a parallelzaton of the color codng approach usng MapReduce framework, we wll frst descrbe SAHAD [4], followed by EN-SAHAD and HARPSAHAD+ respectvely. Algorthm Partton(T (ρ)) 1: f T / T then : f V T = 1 then 3: T T 4: else 5: Add T to T 6: Pck τ N(ρ), the set of the neghbors of ρ, and partton T nto two sub-templates by cuttng the edge (ρ, τ) 7: Let T be the sub-template contanng ρ (name as actve chld) and T the other (name as passve chld) 8: Partton(T (ρ)) 9: Partton(T (τ)) Fg. 3: The example shows one step of the dynamc programmng n color codng. T n Fgure 1 s splt nto T and T. To count C(w 1, T (v 1 ), S), or the number of embeddngs of T (v 1 ) rooted at w 1, usng color set S = {red, yellow, blue, purple, green}, we frst obtan C(w 1, T (v 1 ), {r, y, b}) = and C(w 5, T (v 3 ), {p, g}) = 1. Then, C(w 1, T (v 1 ), S) = C(w 1, T (v 1 ), {r, y, b})c(w 5, T (v 3 ), {p, g}) =. The embeddngs of T are subgraphs wth nodes {w 3, w 4, w 1, w 5, w 6 } and {w 3, w, w 1, w 5, w 6 }. Here s, c, b represents the label of the nodes. Detals of labeled subgraph countng can be found at [4]. 5.1 SAHAD SAHAD takes a sequence of templates T = {T,..., T } as nput. Here T represents a set of templates generated by parttonng T usng Algorthm. Then t performs a MapReduce varaton of Algorthm 1 to compute the number of embeddngs of T. As shown n Equaton 1, the counts of all colorful embeddngs somorphc to T rooted from a sngle node v s computed by aggregatng the same measurement of T and T,.e., the two sub-templates, wth T rooted from v and T rooted from u N(v). We can parallelze color-codng algorthm by dstrbutng the computaton among multple machnes, and sendng data related wth v and N(v) to a computaton unt for the aggregaton. In our MapReduce algorthm, we manage ths by assgnng v as the key for both the counts of T rooted at v and the counts of T rooted at v s neghbors, such that all data requred for computng counts for T rooted at v has the same key and wll be handled by a sngle reduce functon. Let X T,v be a sequence of color-count pars (S =

6 IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS 6 {s 1, s,..., s k }, c ), (S 1 = {s 1 1, s 1,..., s 1 k }, c 1),..., where S represents a color set contanng k colors, and c represents the counts of the subgraphs somorphc to T and rooted at v that are colored by S. Here k = V (T ), and each subgraph s a colorful match. There are 3 types of Hadoop jobs n SAHAD, whch are 1) colorer (Algorthm 3) that performs lne 4 of Algorthm 1; ) counter (Algorthm 4, 5) whch performs lne 6 of Algorthm 1 and 3) fnalzer (Algorthm 6, 7) that performs lne 7 of Algorthm 1. The frst step s to random color network G wth k colors. The map functon s descrbed n Algorthm 3: Algorthm 3 mapper(v, N(v)) 1: Pck s {s 1,..., s k } unformly at random : color v wth s 3: Let T be the sngle node template 4: Let c(v, T, {s }) = 1 snce v s the only colorful matchng 5: X T,v {({s }, 1)} 6: Collect(key v, value X T,v, N(v)) Here Collect s a standard MapReduce operaton that wll emt the key-value pars to global space for further process such as shufflng, sortng or I/O. N(v) represents the neghbors of v. Note that template T s a sngle node, therefore X T,v contans only a sngle color-count par (s v, 1) Accordng to Equaton 1, to compute X T,v, we need X T,v for sub-template T and X T,u for all u N(v) for sub-template T. We use a mapper and a reducer functon to mplement ths as shown n Algorthm 4 and 5, respectvely. Algorthm 4 mapper(v, X t,v, N(v)) 1: f t s T then : Collect(key v, value X t,v, flag ) 3: else 4: for u N(v) do 5: Collect(key u, value X t,v, flag ) Note that n Algorthm 4, the second Collect emts X T,v to all ts neghbors. Therefore, as shown n Algorthm 5, X T,v and X T,u from all u N(v) are handled by the same reducer, whch s suffcent for computng Eq. 1. Also note that for a gven node v, the number of entres wth flag s 1, and the number of entres wth flag equals N(v). Algorthm 5 reducer(v, (X, flag), (X, flag),...) 1: pck X 1 where flag = flag : for all colorset S from X 1 do 3: for each X other than X 1 do 4: for all colorset S from X do 5: f S S = then 6: c(v, T, S S )+ = 1 7: Collect(key v, value X T,v, N(v)) The last step s to compute the total count descrbed n Eq., and s shown n Algorthm 6 and 7. Algorthm 6 mapper(v, X T,v, N(v)) 1: Collect(key sum, value X T,v ) Algorthm 7 reducer( sum, X T,v1, X T,v,...) 1: Y = mm m! 1 q v V G X : Collect(key sum, value X T,v ) Note that n Algorthm 6, X T,v only contans one element, whch s the count correspondng to the entre color set. Then n the reducer shown n Algorthm 7, all the counts are added together and properly factorzed, to obtan the fnal count. For a comprehensve descrpton of the MapReduce verson of color codng, please refer to [4]. 5. EN-SAHAD For general MapReduce problem, the set of keys that s processed n the Mapper and Reducer vares among dfferent jobs. Therefore, MapReduce uses external shufflng and sortng n-between Mappers and Reducers to deploy the keys to computng nodes. In our algorthm, however, the dynamc program aggregates counts based on the root node of the subtree, and therefore the key s the node ndex v. In EN-SAHAD, we use ths pre-knowledge to predefne a reducer that corresponds to a set of nodes. We also assgn the predefned reducers to computng nodes pror to the begnnng of the dynamc program. Therefore, a data entry wth key v wll be drectly sent to the correspondng computng node and processed by desgnated Reducer. Usng ths mechansm, we can reduce the cost of shufflng and sortng n ntermedate stage of Hadoop jobs. 5.3 HARPSAHAD+ HARPSAHAD+ s bult upon the Harp framework [?] [?], whch adopts a varety of the advanced technologes n the research area of hgh performance Java language. HARP- SAHAD+ has the followng optmzaton n front of the MapReduce Sahad verson: 1) It uses a two-level parallel programmng model. At the nter-node level, workload s dstrbuted by harp mappers; At the ntra-node level, local workload s dvded and assgned to multple Java threads. ) For nter-node communcaton, t utlzes a MPI-AlltoAll lke regroup operaton owned by Harp. 3) For ntra-node computaton, t utlzes Habanero Java thread lbrary from Rce Unversty [?] and adopts a Long-Runnng-Thread programmng style [?] to unleash the potental performance of Java language Inter-Node Communcaton In SAHAD, the template counts of a vertex v and all of ts neghbours N(v) are assgned the same key value v, therefore, they are shuffled nto the same reducer to complete the countng process. In HARPSAHAD+, we remove the reducer module and replace t by an user-defned mapper functon. The whole set of vertces V s dstrbuted and cached nto the memory space of p harp mappers. Each mapper holds a subset of vertces V wth s = V. In the mapper functon,

7 IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS 7 we create a table LT able wth s entres, and each entry j < s serves as a reducer for vertex v j. HARPSA- HAD+ then uses a regroup operaton to shuffle the data wthn the memory but n a collectve way. Each mapper functon creates another harp Table object RT able, contanng multple parttons, to transfer data. A preprocessng functon s fred to record re-usable nformaton requred by regroup operatons n each teraton. In the preprocessng stage, each mapper holds a copy of all the vertex IDs v and the mapper ID j, v V j by an allgather communcaton operaton. The mapper then parses the neghbour lsts N(v) of all the local vertces V and labels each vertex u, u N(v) but u / V, wth a mapper ID j that u V j. Therefore, each mapper keeps a queue of vertex IDs for each mapper j wth v Q,j, v V j. By sendng Q,j to mapper j, fnally each mapper j obtans a sendng queue Q j, of vertces. In each teraton of HARPSAHAD+, the regroup operaton fred by mapper has three steps: 1) For each sendng queue Q,j, loadng subtemplate counts of v n sendng queue Q,j nto a partton P ar,j of RT able. ) The sender and recever mapper denttes, and j, are coded nto a sngle partton ID for P ar,j. Durng the collectve regroupng, a desgned harp parttoner wll decode the partton ID and delver the partton P ar,j to the recever mapper j. 3) After the regroup operaton, the harp Table RT able of each mapper now contans counts of vertces u N(v) to update subtemplate counts of local vertces v n LT able Intra-Node Computaton HARPSAHAD+ extends the MapReduce framework by takng advantage of the mult-threadng programmng model n a shared-memory node. We favor the Habanero Java threads nstead of the Java.lang.Thread mplementaton because t allows users to setup thread affnty n multcore/many-core processors. We also embrace the so-called Long-Runnng-Thread programmng style, where we create the threads at the most out loop and keep them runnng untl the end of the program. Ths approach avods the overhead of frequently creatng and destroyng threads, nstead, t uses java.utl.concurrent.cyclcbarrer object to synchronze threads f requred. 6 PERFORMANCE ANALYSIS In ths secton, we dscuss the performance of SAHAD n terms of the overall work and tme complexty. Throughout ths secton, we denote the number of nodes and edges n the network by n and m respectvely. We use k to represent the number of nodes n the template. Lemma 6.1. For a template T, suppose the szes of the two sub-templates T and T are k and k, respectvely. As a result, the szes of the nput, output, and work complexty correspondng to a node v are gven below: The szes of the nput and output of Algorthm 4 are O( ( k ) ( k + k ) ( k + d(v)) and O( k ) k d(v)), respectvely. The sze of the nput to Algorthm 5 s O( ( k ) k d(v)). Proof For a node v, the nput to Algorthm 4 nvolves the correspondng X T,v and X T,v for T and T, as well as N(v), whch together have sze O( ( k ) ( k + k ) k + d(v)). If the nput s for T, Algorthm 4 generates multple key-value pars for a node v, n whch each key-value par corresponds to some node u N(v). Therefore, the output has sze O( ( k ) k d(v)). For a gven v, the nput to Algorthm 5 s the combnaton of the above, and therefore, has sze O( ( k ) k d(v)). Lemma 6.. The total work complexty s O(k E G k e k log (1/δ) 1 ε ). Proof For node v and each neghbor u N(v), Algorthm 5 aggregates every par of the form (S a, C a ) n X T,v, and (S b, C b ) n X T,u, whch leads to a work complexty of )( k ) d(v)). Snce T k, the total work, over all O( ( k k k nodes and templates s at most O( ( )( ) k k k v,t k d(v)) = O( v k k d(v)) = O(k E G k ) (3) Snce O(e k log (1/δ) 1 ε ) teratons are performed n order to get the (ε, δ)-approxmaton, the lemma follows. Tme Complexty. We use P to denote the number of machnes. We assume each machne s confgured to run a maxmum of M Mappers and R Reducers smultaneously. Fnally, we assume a unform parttonng, so that each machne processes n/p nodes. Lemma 6.3. The tme complexty of Algorthm 3 and 4 s O( n m P M ) and O( P M ), respectvely. Proof We frst consder Algorthm 3, whch takes as nput an entry of the form (v, N(v)) for some node v, and perform a constant work. There are n P entres processed by each machne. Snce M Mappers are run smultaneously, ths gves a runnng tme of O( n P M ). Next, we consder Algorthm 4. Each Mapper outputs (v, X) for nput T and d entres for nput T for each u N(v), where d s the degree of v. Therefore, each computng node performs O( n/p =1 d ) = O(m/P ) steps. Here d s the degree for v. Agan, snce M Mappers run smultaneously, the total runnng tme s O( m P M ). Lemma 6.4. The tme complexty of Algorthm 5 s O( m k P R ). Proof Suppose S = k and S = k. The number of possble color sets S and S s ( k ) ( k and k ) k, respectvely. Lne of Algorthm 5 nvolves O( ( k k ) = O( ) k ) steps. Smlarly, lne 4 also nvolves O( k ) steps and Lne 3 nvolves O(d) steps. Therefore the totally runnng tme s O(d) k. Each machne processes n P entres correspondng to dfferent nodes, leadng to a total of O( nd k P ) steps. Snce R reducers run n parallel on each machne, ths leads to a total tme of O( m k P R ). Lemma 6.5. The tme complexty of Algorthm 6 and 7 s ) and O(n), respectvely. O( n P M Proof Algorthm 6 maps out a sngle entry for each nput. Followng the same outlne as the proof of 6.3, ts runnng tme s O( n P M ). Algorthm 7 wll take O(n) tme snce we

8 IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS 8 have only one key sum, and only one Reducer wll be assgned for the summaton for all v V (G), whch takes O(n) tme. Lemma 6.6. The overall runnng tme of SAHAD s bounded by O( kk m P ( 1 M + 1 R )ek log (1/δ) 1 ε ) (4) Proof Algorthm 3 takes O( n P M ) tme. Algorthm 4 and 5 run for each step of the dynamc programmng,.e., jonng two sub-templates nto a larger template as shown n Fgure 3. Snce the number of total sub-templates s O(k) when T s a tree, Algorthm 4 and 5 run O(k) tmes. Therefore the total tme s O(k ( m P M + m k P R )) = O( kk m P ( 1 M + 1 R ). Fnally, the entre algorthm as to be repeated O(e k log (1/δ) 1 ε ) tmes, n order to get the (ε, δ)- approxmaton, and the lemma follows. 6.1 Performance Analyss of Intermedate Stage Wth SAHAD, a major bottleneck of a Hadoop job n terms of runnng tme s the shufflng and sortng cost n the ntermedate stage between Mapper and Reducer, due to the hgh I/O and synchronzaton cost as shown by the black bar n Fgure 4. Fg. 4: The fgure shows the tme spent n each stage of a runnng Hadoop job to produce a color-count for a 5-node template, by aggregatng the -node and 3-node sub-tree. The black bar s the tme for the ntermedate stage, whch s for shufflng and sortng. on whch graph s ths? We observe that the external shufflng and sortng stage takes roughly twce the tme of the reducng stage, whch dramatcally ncrease the overall runnng tme. Gven that the keys n Mappers and Reducers are always the ndex of all the nodes v V (G), we can enhance SAHAD by removng the shufflng and sortng n the ntermedate stages. Instead, we can desgnate Reducers and drectly send the data to correspondng Reducers. 7 VARIATIONS OF SUBGRAPH ISOMORPHISM PROBLEMS So far we have dscussed the basc framework of the algorthm. We have also dscussed how to compute the total number of subgraph embeddngs n Algorthm 7. We now dscuss a set of problems that are closely the subgraph somorphsm problem, ncludng fndng supervsed motf and computng graphlet frequency dstrbuton, whch can be computed usng our framework. Note that our algorthm s specfcally sutable for computng on multple templates f they have common subtemplates, snce those common sub-templates only need to be computed once. Ths s the case n many problems, where common sub-templates such as sngle node, edge, or smple paths are shared. 7.1 Supervsed Motf Fndng Motfs of a real-world network are specfc templates whose embeddngs occur wth much hgher frequences than n random networks and are referred as buldng blocks for networks. They have been found n many real-world networks [6]. Our algorthm can reduce the computatonal cost for a group of templates snce the common subtemplates are only computed once, therefore, ths approach s amenable to be appled n supervsed motf fndng. 7. Graphlet Frequency Dstrbuton Graphlet frequency dstrbuton has been proposed as a way of measurng the smlarty of proten-proten networks [8], where common propertes such as degree dstrbuton, dameter, etc., may not suffce. Unlke motfs, graphlet frequency dstrbuton s computed on all selected small subgraphs regardless of whether they appear frequently or not. Graphlet frequency dstrbuton D(, T ) measures the number of nodes from whch graphlets that are somorphc to T are touched on. The number of graphlets touched on a sngle node v can be computed usng a number of counts of the same templates T wth root placed at dfferent nodes of T. 8 EXPERIMENTAL ANALYSIS OF SAHAD, EN- SAHAD & HARPSAHAD+ We carry out a detaled expermental analyss of SAHAD, EN-SAHAD and HARPSAHAD+, by focusng on three aspects: () Qualty of the soluton: We compare the color codng results wth exact counts on small graphs n order to measure the emprcal approxmaton error of our algorthms and show that the error s very small (less than.5% wth one teraton as shown n Fgure 7) so n the followng experments we run the program for a sngle teraton. () Scalablty of the algorthms as a functon of template sze, graph sze and computng resources: We carred out experments usng templates wth szes rangng from 3 nodes to 1 nodes, ncludng both labeled and unlabeled templates. The graphs we use go from several hundreds of thousands of nodes to tens of mllons. We also study how our algorthm scales n terms of computng resources ncludng number of threads per node, number of computng nodes, as well as dfferent settngs of mappers and reducers, etcetera. () Varatons of the problem: Our framework has the ablty to extend to a varety of measurements related wth the subgraph countng problem. In the experments, we show the unlabled/labeled subgraph countng and graphlet dstrbuton results. (v) Enhancng overall performance by system tunng: We also nvestgate dfferent components of the system and

9 IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS 9 ther mpact to the overall performance. For example, EN- SAHAD studes the communcaton and sortng cost n the ntermedate stage of the system and gves approaches for mprovement. We also propose a degree based graph parttonng scheme that can mprove the performance of Harp by mposng better load balancng n terms of computatons wthn each partton. Table hghlghts the man results we obtaned wth varous methods. TABLE : Comparson on SAHAD, EN-SAHAD and HARP- SAHAD+ U5-1 U5- U5-3 U7-1 U1-1 ms fk ma fa fy fy fy fa ms my fa my fy fy fa fs fa ma ms fs L7-1 L1-1 Fg. 5: Templates used n the experments. ma mk mk my my fk ms mk L1-1 fk Method Networks Templates Performance we can have maxmum 31.5T B storage for the HDFS. In SAHAD 68M edges 1 nodes 1s of mn for 7 node most of our experments, we use up to 16 nodes, whch template on Chcago gve up to 1T B capacty for the computaton. Although EN-SAHAD 1M edges 5 nodes % mprovement over SAHAD the number of cores and RAM capacty on each node can HARPSAHAD+ 1.B edges up to 1 nodes 1- tmes fastersupport a large number of mappers/reducers, the avalablty of a sngle dsk on each node lmts aggregate I/O than SAHAD bandwdth of all parallel processes on each node. To make t worse, aggregate I/O bandwdth of parallel processes dong 8.1 Experment Desgn sequental I/O could result n many extra dsk seeks and hurt overall performance. Therefore, dsk bandwdth s the Datasets bottleneck for more parallelsm n each node. Ths lmtaton For our experments, we use synthetc socal contact networks of the followng ctes and regons: Mam, Chcago, Amazon Elastc Computng Cloud (EC) for some of our s further dscussed n secton 8... We also use the publc New Rver Valley (NRV), and New York Cty (NYC) (see [7] experments. EC enables customers to nstantly get cheap for detans). We consder demographc labels {kd, youth, yet powerful computng resources, and start the computng adult, senor} based on the age and gender for ndvduals. We also run experments on a G(n, p) graph (denoted Hgh-CPU Extra-Large nstances from EC. Each nstance has process wth no upfront cost for hardware. We allocated 4 GNP1) wth n nodes, where each par of nodes are connected wth probablty p, and are randomly assgned node Block Store Volume). 8 cores, 7 GB RAM, and two 5 GB vrtual dsks (Elastc labels. We also experment on a few other networks: Web- For experments wth HARPSAHAD+, we use the Julet Google [], RoadNet (rnet) [], Twtter [1] and Chung-Lu cluster (Intel Haswell archtecture) wth 1,, 4, 8 and 16 random graphs [9]. Table 3 summarzes the characterstcs nodes. The Julet cluster contans 3 nodes each wth two of the networks. 18-core 36-thread Intel Xeon E5-699 processors and 96 TABLE 3: Networks used n the experments nodes each wth two 1-core 4-thread Intel Xeon E5-67 processors. All the nodes used n the experments are wth Intel Xeon E5-67 processors and 18 GB memory. All the experments are performed on InfnBand FDR wth 1Gbt/s per lnk. Network No. of Nodes(n mllon) No. of Edges(n mllon) Twtter Mam Chcago NYC NRV. 1.4 rnet..8 GNP Web-Google Templates The templates we use n the experments are shown n Fgure 5. The templates vary n sze from 5 to 1 nodes, n whch U5-1,...U1-1 are the unlabeled templates and L7-1,L1-1 as well as L1-1 are the labeled templates. In the labels, m, f, k, y, a and s stand for male, female, kd, youth, adult and senor, respectvely Computng Envronment For experments wth SAHAD, we use a computng cluster Athena, wth 4 computng nodes and a large RAM memory footprnt. Each node has a quad-socket AMD.3GHz Magny Cour 8 Core Processor,.e., 3 cores per node or 1344 cores n total, and 64 GB RAM(1.4 TFLOP peak). The local dsk avalable on each node s 75GB. Therefore, Performance metrcs We carry out experments on SAHAD, EN-SAHAD and HARPSAHAD+. For SAHAD, we measure the approxmaton bounds, the mpact of Hadoop confguraton ncludng number of Mapper/Reducers and performance on queres related wth varous templates and graphs. For enhanced SAHAD, we measure the performance mprovement ganed by elmnatng the sortng n the ntermedate stage. We also measures the mpact wth dfference parttonng schemes. Then wth Harp, smlar to SAHAD, we measure the performance mpact wth varous templates and graphs, as well as the system performance regardng number of computng nodes. We also compare HARPSAHAD+ and SAHAD to study the mprovement Harp brngs. 8. Performance of SAHAD In ths secton, we evaluate varous aspects of the performance. Our man conclusons are summarzed below. Table 4 summarzes the dfferent experments we perform, whch are dscussed n greater detals later.

10 IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS 1 1. Approxmaton bounds: Whle the worst case bounds on the algorthm mply O(e k log (1/δ) 1 ε ) rounds to get an (ε, δ)-approxmaton (see Lemma 6.), n practce, we fnd that far fewer teratons are needed.. System performance: We run our algorthm on a dverse set of computng resources, ncludng the publcly avalable Amazon EC cloud. Here, we fnd that our algorthm scales well wth the number of nodes, and dsk I/O s one of the man bottlenecks. We post that employng multple dsks per node (a rsng trend n Hadoop) or usng I/O cachng wll help mtgate ths bottleneck and boost performance even further. 3. Performance on varous queres: We evaluate the performance on templates wth szes rangng from 5 to 1. Here, we fnd that labeled queres are sgnfcantly faster than unlabeled ones, and the overall runnng tme s under 35 mnutes for these queres on our computng cluster (descrbed below). We also get comparable performance on EC Approxmaton bounds As dscussed n Secton 3, the color codng algorthm averages the estmates over multple teratons. Fgure 6 shows the error for each teraton n countng U5-1 for Mam and Web-Google, respectvely. It s observed that the standard devaton for the error s % and.4% for Mam and Web- Google, whch s very small Standard devaton = (a) Mam Standard devaton = (b) Web-Google Fg. 6: Error n countng U5-1 for 3 teraton In Fgure 7, we show that the approxmaton error s below.5% for the template U7-1 for the GNP1 graph, even for one teraton. The fgure also plots the results based on usng more than 7 colors, whch can sometmes mprove the runnng tme, as dscussed n [16]. In the rest of the experments, we only use the estmaton from one teraton, because of the small error shown n ths secton. The error for teratons s computed usng ( Z)/ emb(t,g) emb(t,g). 8.. Performance Analyss We now study how the runnng tme s affected by the number of total computng nodes and number of reducers/mappers per node. We carry out 3 sets of experments: () how the total runnng tme scales wth the number of computng nodes; () how the runnng tme s affected by varyng assgnment of mappers/reducers per node. 1. Varyng number of computng nodes Fgure 8 shows that the runnng tme for Mam reduces from over mnutes to less than 3 mnutes when the number of computng nodes ncreases from 3 to 13. However, the curve for GNP1 does not show good scalng. The reason s that the actual computaton for GNP1 only consumes a small porton of the runnng tme, and there s overhead from managng the mappers/reducers. In other words, the curve for GNP1 shows a lower bound on the runnng tme n our algorthm..varyng number of mappers/reducers per node Here we consder two cases..a. Varyng number of reducers per node. Fgure 9 shows the runnng tme on Athena when we vary the number of reducers per node. Here we fx the number of nodes to be 16 and the number of mappers per node to be 4. We fnd that runnng 3 reducers concurrently on each node mnmzes the total runnng tme. In addton we fnd that although ncreasng the number of reducers per node can reduce the tme for the Reduce stage for a sngle job, the runnng tme ncreases sharply n Map and Shuffle stage. As a result, the total runnng tme ncreases wth the number of reducers. Ths can be explaned by the vsble I/O bottleneck for concurrent accessng on Athena, snce Athena has only 1 dsk per node. Ths phenomenon s not present on EC, as seen from Fgure 11b, ndcatng that EC s better optmzed for concurrent dsk accessng for cloud usage number of reducers per node (a) Total runnng tme v.s. number of reducers mapper shuffle and sortng reducer number of reducers per node (b) Runnng tme of job stages v.s. number of reducers. Fg. 9: Runnng tme v.s. number of reducers per node error sze of colorset = 7 sze of colorset = 8 sze of colorset = 9 sze of colorset = 1 sze of colorset = number of teratons Fg. 7: Approxmaton error n countng U7-1 on GNP Mam GNP number of computng nodes Fg. 8: Runnng tme for countng U1-1 vs number of computng nodes..b. Varyng number of mappers per node. Fgure 1 shows the runnng tme on Athena when we vary the number of mappers per node whle fxng the number of reducers as 7 per node. We fnd that varyng the number of mappers per node does not affect the performance. Ths s also valdated n EC, as shown n Fgure 11..c. Reducers runnng tme dstrbuton. Fgure 1 shows the dstrbuton of the reducers runnng tme on Athena. We observe that when we ncrease the number of reducers per node, the dstrbuton becomes more volatle; for example, when we concurrently run reducers per node, the reducers completon tme vary from mnutes to 1 mnutes.

11 IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS 11 TABLE 4: Summary of the experment results (refer to Secton 8.1 for the termnology used n the table) Experment Computng resource Template & Network Key Observatons Approxmaton bounds Athena U7-1 & GNP1 error well below.5% Impact of the number of data nodes Athena U1-1 & Mam, GNP1 scale from 4 hours to 3 mnutes wth data nodes from 3 to 13 Impact of the number of concurrent reducers Athena & EC U1-1 & Mam performance worsen on Athena Impact of the number of concurrent mappers Athena & EC U1-1 & Mam no apparent performance change Unlabeled/labeled templates countng Athena & EC templates from Fgure 5 all tasks complete n less than 35 mnutes and networks from Table 3 Graphlet frequency dstrbuton Athena U5-1 & Mam,Chcago complete n less than 35 mnutes number of mappers per node (a) Total runnng tme v.s. number of mappers mapper shuffle and sortng reducer 5 1 number of mappers per node (b) Runnng tme of job stages v.s. number of mappers. Fg. 1: Runnng tme v.s. number of mappers per node number of mappers per node (a) Total runnng tme v.s. number of mappers on EC number of reducers per node (b) Total runnng tme v.s. number of reducers on EC. Fg. 11: Runnng tme w.r.t. number of mappers and reducers on EC. Ths also ndcates the bad I/O performance on Athena for concurrent accessng Illustratve applcatons In ths secton, we llustrate the performance on 3 dfferent knds of queres. We use Athena and assgn 16 nodes as the data nodes; for each node, we assgn a maxmum of 4 mappers and 3 reducers per node. Our experments on EC for some of these queres are dscussed later n Secton Unlabeled subgraph queres: Here we compute the counts of templates U5-1, U7-1 and U1-1 on GNP1 and Mam, as well as the runnng tme, as shown n Fgure 13 we observe that for unlabeled templates wth up to 1 nodes on the Mam graph, the algorthm runs n less than 5 mnutes.. Labeled subgraph queres: Here we count the total number of embeddngs of templates L7-1, L1-1 and L1-1 n Mam and Chcago. Fgure 14b shows that the runnng tme for countng templates up to 1 nodes s around mnutes on Mam, whch s less than 35 mnutes needed for Chcago. The runnng tme s much less for the labeled subgraph queres than that for the unlabeled subgraph queres. Ths s number of the reducers reducers on each node (a) 3 reducers per computng node. number of the reducers reducers on each node (c) 11 reducers per computng node. number of subgraph matchngs 1e+17 1e+16 1e+ 1e+14 1e+13 1e+1 1e+11 1e+1 1e+9 1e+8 number of the reducers reducers on each node (b) 7 reducers per computng node. number of the reducers reducers on each node (d) reducers per computng node. Fg. 1: Reducers completon tme dstrbuton. U5-1 U5- U5-3 U7-1 U1-1 GNP1 graph Mam (a) The counts of unlabeled subgraphs U5-1 U5- U5-3 U7-1 U1-1 GNP1 graph Mam (b) Runnng tme for countng unlabeled subgraphs. Fg. 13: Queryng unlabeled subgraphs on GNP1 and Mam due to the fact that labeled templates contan a much fewer number of embeddngs due to the label constrants. 3. Computng graphlet frequency dstrbuton: Fgure shows the graphlet frequency dstrbuton n the networks of Mam and Chcago, respectvely. By usng template U5-1 for ths experment, we observe that t takes mnutes and 35 mnutes to compute graphlet frequency dstrbutons on Mam and Chcago, respectvely Performance Study wth Amazon EC On EC, we run unlabeled and labeled subgraph queres on Mam and GNP1 for templates U5-1, U7-1, U1-1, L7-1, L1-1 and L1-1. Here we use the same 4 EC nstances as

12 IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS 1 number of subgraph matchngs 1e+16 1e+ 1e+14 1e+13 1e+1 1e+11 1e+1 L7-1 L1-1 L1-1 Mam Chcago graph (a) The counts of labeled subgraphs L7-1 L1-1 L1-1 Mam Chcago graph (b) Runnng tme for countng labeled subgraphs. the key-value pars from Mappers can be drectly sent to correspondng Reducers wthout beng shuffled and sorted. Fgure 17 shows the overall runnng tme of our algorthm on NRV, RoadNet and ther varatons. Here we generate the varatons of the graph by shufflng a proporton of the edges n the graph, e.g., nrv4 s a NRV wth 4% of ts edges beng shuffled. As a result, we observe that preallocatng a Reducer can delver roughly a % performance mprovement. Fg. 14: Queryng labeled subgraphs on Mam and Chcago. 5 4 SAHad Enhanced-SAHad 5 4 SAHad En-SAHad number of nodes 1e Mam number of nodes 1e Chcago runnng tme (sec) 3 1 nrv nrv nrv4 nrv6 nrv8 nrv1 graphs runnng tme (sec) 3 1 rnet rnet rnet4 rnet6 rnet8 rnet1 graphs 1 1 5e+8 1e+9 1.5e+9 e+9.5e+9 3e+9 number of graphlet adjacent to a node (a) Mam 1 1 1e+9 e+9 3e+9 4e+9 5e+9 6e+9 number of graphlet adjacent to a node (b) Chcago (a) NRV and ts varatons (b) RoadNet and ts varatons Fg. 17: SAHAD v.s. EN-SAHAD on RoadNet and NRV. Fg. : Graphlet dstrbuton on Mam and Chcago. dscussed prevously, and each node runs up to a maxmum of mappers and 8 reducers concurrently. As shown n Fgures 16, the runnng tme on EC s comparable to that on Athena, except for U1-1 on Mam, whch takes roughly.5 hours to fnsh on EC, but only 5 mnutes on Athena. Ths s because for large templates and graphs as large as Mam, the nput/output data as well as the I/O pressure on dsks s tremendous. EC uses vrtual dsks as local storage, whch hurt overall performance when dealng wth such a large amount of data U5-1 U7-1 U1-1 template (a) GNP1 unlabeled labeled L7-1 L1-1 L U5-1 U7-1 U1-1 template (b) Mam unlabeled labeled L7-1 L1-1 L1-1 Fg. 16: Runnng tme for varous templates on EC. 8.3 Performance of EN-SAHAD In ths secton we experment our algorthms on two realworld networks NRV and RoadNet and a number of ther shuffled versons. We generate shuffled networks wth, 4, 6, 8 and 1 percent shufflng rato, and name them as nrv to nrv1, and rnet to rnet1. As dscussed n Secton 5., a major factor that mpacts the overall performance s the heavy shufflng and sortng cost n the ntermedate stage of a Hadoop job. We mtgate ths factor by desgnatng node ndex v to Reducers, and preallocatng Reducers among computng nodes. In ths way, 8.4 Performance of HARPSAHAD+ In the followng experments, we evaluate the performance of HARPSAHAD+ by comparng t wth a state-of-the-art MPI subgraph countng program called MPI-Fasca. MPI- Fasca s developped by Slota et al. [34], whch mplements the same color codng algorthm as SAHAD and HARP- SAHAD+. MPI-Fasca uses a MPI+OpenMP programmng model. In our tests, t s compled wth g and compler opton -O3 as well as OpenMPI Also, we choose InfnBand nstead of Ethernet as the nterconnect to test MPI-Fasca and HARPSAHAD+, thus offerng more challenges to the Java based communcaton operaton of HARPSAHAD Executon Tme In Fgure 18a, we observe that HARPSAHAD+ has a 1x to x speedup over SAHAD on a sngle Haswell node. Ths tremendous mprovement comes from two sdes: 1) HARPSAHAD+ has a better utlzaton of the hardware resources (logcal cores) by usng Habanero Java threads and affnty bndng. ) Compared to the dsk based shuffle process of SAHAD, HARPSAHAD+ caches all of the data n man memory, whch sgnfcantly reduces the overhead of data access. In Fgure 18, we compare HARPSAHAD+ wth MPI-Fasca on a Twtter dataset wth templates of large sze n a dstrbuted envronment of 16 Haswell nodes. HARPSAHAD+ acheves comparable or even slghtly better performance than MPI-Fasca, whch comes from ts optmzed communcaton operatons. Fgure 19 llustrates a breakdown of the executon tme nto computaton and communcaton on Twtter wth template U1-. Because of the hghly ntensve computaton workload, MPI-Fasca consumes less tme n computaton thanks to the complerlevel O3 optmaton. However, HARPSAHAD+ as a pure Java mplementaton can stll acheve almost the same total countng tme wth the help of optmzed collectve communcaton operatons.

13 IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS 13 Runnng tme (Sec) 1, Mam 1 u5-1 SAHad u5-1 u7-1 SAHad u7-1 Graphs 17 6 Web-Google (a) SAHAD vs. HARPSAHAD+ Runnng tme (Sec) 6, 4, U1-1 MPI-Fasca U1-1 U1- MPI-Fasca U1-, 1,887 1,486 1,1 1, NYC Twtter Graph 7,17 6,874 (b) HARPSAHAD+ vs. MPI-Fasca Fg. 18: (a) Test on 1 Haswell Node and each node runnng 4 threads; (b) Test on 16 Haswell Nodes and each node runnng 4 threads MPI-Fasca 1,, 3, 4, 5, 6, 7, 8, Countng Tme n Seconds Computaton Communcaton Fg. 19: BreakDown of Tme for Twtter-U1- on 16 Nodes 8.4. Problem Sze Scalng Next we study the performance of HARPSAHAD+ by controllng the number of nodes n a graph whle ncreasng the number of edges. In ths experment, we use the Chung-Lu model [9] to generate a seres of random graphs gven the degree sequence and ts varatons of Mam and NYC. The average degree of the generated random graphs range from 5 to for Mam and 1 to 1 for NYC. In Fgure, the runnng tme generally ncreases wth the number of edges, whch meets the tme complexty we propose n Secton 6. For Mam, when the average degree ncreases from 5 to, the runnng tme only ncreases by 1.7x. Also, a tenfold (x1) ncrease n average degree for the NYC graph only accounts for less than x of an ncrease n runnng tme. Ths ndcates that our HARPSAHAD+ mplementaton mantans good performance n computng the neghbours of vertces n parallel, whch s due to the hgh effcency of Java threads. Countng Tme (sec) CL CL1 CL CL3 CL4 CL5 CL6 CL7 CL8 CL9 Graph Name 7 (a) Mam Dataset Countng Tme (sec) 1,4 1, 1, ,4 1,49 1,16 1,179 1,31 1,4 CL CL1 CL CL3 CL4 CL5 CL6 CL7 CL8 CL9 Graph Name (b) NYC Dataset Fg. : (a) Test on Mam graph, Template U1-1, 4 Haswell Nodes, and 4 threads/node; (b) Test on NYC graph, Template U1-1, 4 Haswell Nodes and 4 threads/node; Varyng number of computng nodes In ths secton, we study the performance of HARPSAHAD+ as a functon of computng resources,.e., computng nodes and threads per node. In Fgure 1, we compare the nternode strong scalng test results between HARPSAHAD+ and MPI-Fasca. For the NYC dataset, we ran strong scalng tests on three templates, and the value of the y-axs represents the speedup on N nodes by dvdng the tme on a sngle node by the tme on N nodes. Snce the NYC dataset s relatvely small for HARPSAHAD+ and MPI-Fasca, both of the two mplementatons are not bounded by the computaton overhead, whch prevents them from achevng the lnear speedup. However, HARPSAHAD+ (sold lnes) stll obtans a better strong scalablty than MPI-Fasca (dashed lnes). Furthermore, MPI-Fasca could not run on two nodes due to a memory capacty bottleneck and t shows no scalablty after 4 nodes. For the Twtter Dataset, HARPSAHAD+ agan outperforms MPI-Fasca after 4 nodes. The speedup s also mproved as Twtter gves a much larger workload than NYC and HARPSAHAD+ s more bounded by computaton overhead. Speedup (T1/Tn) Num of Nodes U5-1 MPI-Fasca U5-1 1 U7-1 MPI-Fasca U7-1 (a) NYC Dataset 1.4 U1-1 MPI-Fasca U1-1 Speedup (T1/Tn) Num of Nodes U3-1 MPI-Fasca U3-1 U5-1 MPI-Fasca U5-1 (b) Twtter Dataset U7-1 MPI-Fasca U7-1 Fg. 1: (a) Test on NYC graph each node runnng 4 threads; (b) Test on Twtter graph each node runnng 4 threads Degree based parttonng schemes In the above experments, we evenly partton the graphs wthout consderng the nature of the problem and structure of the graphs. In that nave approach, each partton has the same number of vertces. In ths secton, we experment a new parttonng scheme based on a degree-related metrc D p as shown n Equaton 5. Gven a vertex wth degree d, there are n total ( d ) dfferent pars of edges that sub-templates τ and τ can resde at, or O(d ) ways to jon sub-templates. Hence, n order to nduce a roughly equal computatonal cost wthn each partton p we partton the graph such that each partton has smlar D p. We expect that the computaton n each partton wll be roughly the same wth ths parttonng scheme, hence lghtenng the overhead due to synchronzaton and unbalanced loads. D p = v p d v Here d v s the degree of node v. In Fgure (a), the beneft of usng the degreeparttoned Mam dataset s merely around 5% by average, whch s largely due to the relatvely small sze of the graph and computatonal cost. In contrast, the degree partton on NYC dataset has an 4% mprovement by average (5)

14 IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS 14 Countng Tme (sec) even-partton degree-partton CL CL1 CL CL3 CL4 CL5 CL6 CL7 CL8 CL9 Graph Name (a) Mam Dataset Countng Tme (sec) 1,4 1, 1, ,4 1,49 1, ,179 1,31 1, even-partton degree-partton CL CL1 CL CL3 CL4 CL5 CL6 CL7 CL8 CL9 Graph Name (b) NYC Dataset Fg. : (a) Test on Mam graph, Template U1-1, 4 Haswell Nodes 4 threads; (b) Test on NYC graph, Template U1-1, 4 Haswell Nodes 4 threads; and a 6% mprovement by maxmum. It shows that for larger graphs that can ncur hgh computatonal costs, the parttonng scheme plays a major role by reducng load mbalance. 9 CONCLUSION In ths paper we descrbed an effcent parallel algorthm to compute the number of somorphc embeddngs of a subgraph n very large networks usng MapReduce and the color codng technque. Hence, we frst develop SAHAD a Hadoop based mplementaton and also provde performance analyss n terms of work and tme complexty. After observng large sortng and communcaton costs n SA- HAD, we further explore two approaches to remedy these problems. The frst approach called EN-SAHAD, entals the tght couplng of the number of graph vertces to mappers and reducers, so as to reduce the sortng and shufflng phases of the MapReduce jobs. The second approach s the mplementaton of the color codng algorthm usng the Harp framework, called HARPSAHAD+, whch employs collectve communcaton and shared memory to better facltate computaton and communcaton. Our experments show that HARPSAHAD+ has sgnfcantly mproved performance when compared to SAHAD by almost two orders of magntude, and smultaneously acheves comparable or even better executon tme and scalablty than a state-of-the-art MPI soluton. HARPSAHAD+ can process networks wth 1. Bllon edges and 1 node templates. We also explore the performance of these mplementatons on dfferent cluster archtectures such as EC on-demand nodes and Intel Haswell nodes. Fnally, we ntroduce a novel graph-load parttonng scheme whch mproves the performance on large graphs and templates. As drectons for future research, t would be nterestng to devse new algorthms that scale to larger nstances. Addtonally, t would be nterestng to mplement a varant of these algorthms for restrcted classes of networks. REFERENCES [1] Harp. [] Snap stanford network analyss project. [3] E. Abdelhamd, I. Abdelazz, P. Kalns, Z. Khayyat, and F. Jamour. Scalemne: scalable parallel frequent subgraph mnng n a sngle large graph. In Proceedngs of the Internatonal Conference for Hgh Performance Computng, Networkng, Storage and Analyss, page 61. IEEE Press, 16. [4] N. Alon, P. Dao, I. Hajrasoulha, F. Hormozdar, and S. Sahnalp. Bomolecular network motf countng and dscovery by color codng. Bonformatcs, 4(13):41, 8. [5] N. Alon, R. Yuster, and U. Zwck. Color-codng. Journal of the ACM (JACM), 4(4):856, [6] Amazon. Elastc computng cloud (ec). com/ec. [7] C. Barrett, R. Beckman, M. Khan, V. Kumar, M. Marathe, P. Stretz, T. Dutta, and B. Lews. Generaton and analyss of large synthetc socal contact networks. In Wnter Smulaton Conference, 9. [8] M. Bröcheler, A. Puglese, and V. Subrahmanan. Cos: Cloud orented subgraph dentfcaton n massve socal networks. In 1 Internatonal Conference on Advances n Socal Networks Analyss and Mnng, pages IEEE, 1. [9] F. Chung and L. Lu. Connected components n random graphs wth gven expected degree sequences. Annals of combnatorcs, 6(): 145,. [1] R. Curtcapean and D. Marx. Complexty of countng subgraphs: Only the boundedness of the vertex-cover number counts. In Foundatons of Computer Scence (FOCS), 14 IEEE 55th Annual Symposum on, pages IEEE, 14. [11] J. Dean and S. Ghemawat. Mapreduce: Smplfed data processng on large clusters. Communcatons of the ACM, 51(1):17 113, 8. [1] J. Flum and M. Grohe. The parameterzed complexty of countng problems. SIAM Journal on Computng, 33(4):89 9, 4. [13] F. V. Fomn, D. Lokshtanov, V. Raman, S. Saurabh, and B. R. Rao. Faster algorthms for fndng and countng subgraphs. Journal of Computer and System Scences, 78(3):698 76, 1. [14] L. Getoor and C. Dehl. Lnk mnng: a survey. ACM SIGKDD Exploratons Newsletter, 7():3 1, 5. [] J. Huan, W. Wang, J. Prns, and J. Yang. Spn: mnng maxmal frequent subgraphs from graph databases. In Proceedngs of the tenth ACM SIGKDD nternatonal conference on Knowledge dscovery and data mnng, pages ACM, 4. [16] F. Hüffner, S. Werncke, and T. Zchner. Algorthm engneerng for color-codng wth applcatons to sgnalng pathway detecton. Algorthmca, 5():114 13, 8. [17] H. B. Hunt III, M. V. Marathe, V. Radhakrshnan, and R. E. Stearns. The complexty of planar countng problems. SIAM Journal on Computng, 7(4): , [18] A. Inokuch, T. Washo, and H. Motoda. An apror-based algorthm for mnng frequent substructures from graph data. Prncples of Data Mnng and Knowledge Dscovery, pages 13 3,. [19] I. Kouts and R. Wllams. Lmts and applcatons of group algebras for parameterzed problems. In Proc. ICALP, pages , 9. [] M. Kuramoch and G. Karyps. Fndng frequent patterns n a large sparse graph. Data mnng and knowledge dscovery, 11(3):43 71, 5. [1] H. Kwak, C. Lee, H. Park, and S. Moon. What s Twtter, a socal network or a news meda? In WWW 1: Proceedngs of the 19th nternatonal conference on World wde web, pages 591 6, New York, NY, USA, 1. ACM. [] J. Leskovec, A. Sngh, and J. Klenberg. Patterns of nfluence n a recommendaton network. Advances n Knowledge Dscovery and Data Mnng, pages , 6. [3] Y. Lu, X. Jang, H. Chen, J. Ma, and X. Zhang. Mapreducebased pattern fndng algorthm appled n motf detecton for prescrpton compatblty network. Advanced Parallel Processng Technologes, pages , 9. [4] D. Marx and M. Plpczuk. Everythng you always wanted to know about the parameterzed complexty of subgraph somorphsm (but were afrad to ask). In 31st Internatonal Symposum on Theoretcal Aspects of Computer Scence, page 54, 14. [5] E. Maxwell, G. Back, and N. Ramakrshnan. Dagnosng memory leaks usng graph mnng on heap dumps. In Proceedngs of the 16th ACM SIGKDD nternatonal conference on Knowledge dscovery and data mnng, pages ACM, 1. [6] R. Mlo, S. Shen-Orr, S. Itzkovtz, N. Kashtan, D. Chklovsk, and U. Alon. Network motfs: smple buldng blocks of complex networks. Scence, 98(5594):84,. [7] R. Pagh and C. Tsourakaks. Colorful trangle countng and a mapreduce mplementaton. Arxv preprnt arxv: , 11. [8] N. Pržulj. Bologcal network comparson usng graphlet degree dstrbuton. Bonformatcs, 3():e177, 7. [9] J. Qu, S. Jha, A. Luckow, and G. C. Fox. Towards hpc-abds: an ntal hgh-performance bg data stack. Buldng Robust Bg Data

15 IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS [3] [31] [3] [33] [34] [35] [36] [37] [38] [39] [4] [41] [4] Ecosystem ISO/IEC JTC 1 Study Group on Bg Data, pages 18 1, 14. J. Raymond and P. Wllett. Maxmum common subgraph somorphsm algorthms for the matchng of chemcal structures. Journal of computer-aded molecular desgn, 16(7):51 533,. R. Ronen and O. Shmuel. Evaluatng very large datalog queres on socal networks. In Proceedngs of the 1th Internatonal Conference on Extendng Database Technology: Advances n Database Technology, pages ACM, 9. S. Sakr. Graphrel: A decomposton-based and selectvtyaware relatonal framework for processng sub-graph queres. In Database Systems for Advanced Applcatons, pages Sprnger, 9. G. M. Slota and K. Maddur. Fast approxmate subgraph countng and enumeraton. In Parallel Processng (ICPP), 13 4nd Internatonal Conference on, pages IEEE, 13. G. M. Slota and K. Maddur. Parallel color-codng. Parallel Computng, 47:51 69,. B. Suo, Z. L, Q. Chen, and W. Pan. Towards scalable subgraph pattern matchng over bg graphs on mapreduce. In Parallel and Dstrbuted Systems (ICPADS), 16 IEEE nd Internatonal Conference on, pages IEEE, 16. S. Sur and S. Vasslvtsk. Countng trangles and the curse of the last reducer. In Proceedngs of the th nternatonal conference on World wde web, pages ACM, 11. C. Tsourakaks, U. Kang, G. Mller, and C. Faloutsos. Doulon: Countng trangles n massve graphs wth a con. In Proceedngs of the th ACM SIGKDD nternatonal conference on Knowledge dscovery and data mnng, pages ACM, 9. L. G. Valant. The complexty of enumeraton and relablty problems. SIAM Journal on Computng, 8(3):41 41, T. Whte. Hadoop: The defntve gude. Yahoo Press, 1. X. Yan, X. Zhou, and J. Han. Mnng closed relatonal graphs wth connectvty constrants. In Proceedngs of the eleventh ACM SIGKDD nternatonal conference on Knowledge dscovery n data mnng, pages ACM, 5. Z. Zhao, M. Khan, V. Kumar, and M. Marathe. Subgraph enumeraton n large socal contact networks usng parallel color codng and streamng. In Parallel Processng (ICPP), 1 39th Internatonal Conference on, pages , 1. Z. Zhao, G. Wang, A. R. Butt, M. Khan, V. Kumar, and M. V. Marathe. Sahad: Subgraph analyss n massve networks usng hadoop. In Parallel & Dstrbuted Processng Symposum (IPDPS), 1 IEEE 6th Internatonal, pages IEEE, 1. Zhao Zhao s pursung hs Ph.D degree n Computer Scence at Vrgna Tech. He s also a Software Engneer n Versgn Labs, Versgn Inc. Hs research nterests are n Network Scence and analytcs, especally n the desgn and analyss of parallel graph algorthms. Langsh Chen s a Postdoctoral researcher at the School of nformatcs and Computng n Indana Unversty. Hs research nterests nclude lnear solvers for HPC systems, energy effcency of HPC applcatons, data ntensve machne learnng applcatons on manycore archtectures, and so forth. Mha Avram s currently a Masters student who s studyng Computer Scence at Indana Unversty. Hs research nterests nvolve applyng varous CS sub-domans such as Bg Data, Hgh Performance Computng, IoT, Machne Learnng, HCI, and Data Mnng to solve large scale socal problems. Meng L s a Computer Scence Ph.D. student n the School of nformatcs and Computng at Indana Unversty. Hs advsor s Prof. Judy Qu. Hs research nterest s dstrbuted systems and parallel computng. Guanyng Wang earned hs PhD n Computer Scence from Vrgna Tech n 1. He s now a software engneer at Google. Al Butt receved hs Ph.D. degree n Electrcal & Computer Engneerng from Purdue Unversty n 6. He s a recpent of an NSF CAREER Award, IBM Faculty Awards, a VT College of Engneerng (COE) Dean s award for Outstandng New Assstant Professor, and NetApp Faculty Fellowshps. Al s research nterests are n dstrbuted computng systems and I/O systems. Maleq Khan s an Assstant Professor n the Department of Electrcal Engneerng and Computer Scence at Texas A&M Unversty Kngsvlle. He receved hs Ph.D. n Computer Scence from Purdue Unversty. Hs research nterests are n parallel and dstrbuted computng, bg data analytcs, hgh performance computng, and data mnng. Madhav Marathe s a professor of Computer Scence and the Drector of the Network Dynamcs and Smulaton Scence Laboratory, Bocomplexty Insttute, Vrgna Tech. Hs research nterests nclude hgh performance computng, modelng and smulaton, theoretcal computer scence and soco-techncal systems. He s a fellow of the IEEE, ACM and AAAS. Judy Qu s an assocate professor of Intellgent Systems Engneerng n the School of Informatcs and Computng at Indana Unversty. Her research nterests are parallel and dstrbuted systems, cloud computng, and hgh-performance computng. Her research has been funded by NSF, NIH, Intel, Mcrosoft, Google, and Indana Unversty. Judy Qu leads the Intel Parallel Computng Center (IPCC) ste at IU. She s the recpent of a NSF CAREER Award n 1. Anl Vullkant s an Assocate Professor n the Department of Computer Scence and the Bocomplexty Insttute of Vrgna Tech. Hs nterests are n the areas of approxmaton and randomzed algorthms, dstrbuted computng, graph dynamcal systems and ther applcatons to epdemology, socal networks and wreless networks. He s a recpent of the NSF and DOE Career awards.

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss. Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs:

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

Video Proxy System for a Large-scale VOD System (DINA)

Video Proxy System for a Large-scale VOD System (DINA) Vdeo Proxy System for a Large-scale VOD System (DINA) KWUN-CHUNG CHAN #, KWOK-WAI CHEUNG *# #Department of Informaton Engneerng *Centre of Innovaton and Technology The Chnese Unversty of Hong Kong SHATIN,

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT

DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT DESIGNING TRANSMISSION SCHEDULES FOR WIRELESS AD HOC NETWORKS TO MAXIMIZE NETWORK THROUGHPUT Bran J. Wolf, Joseph L. Hammond, and Harlan B. Russell Dept. of Electrcal and Computer Engneerng, Clemson Unversty,

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Wavefront Reconstructor

Wavefront Reconstructor A Dstrbuted Smplex B-Splne Based Wavefront Reconstructor Coen de Vsser and Mchel Verhaegen 14-12-201212 2012 Delft Unversty of Technology Contents Introducton Wavefront reconstructon usng Smplex B-Splnes

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

PATRIC: A Parallel Algorithm for Counting Triangles in Massive Networks

PATRIC: A Parallel Algorithm for Counting Triangles in Massive Networks PATRIC: A Parallel Algorthm for Countng Trangles n Massve etworks Shakh Arfuzzaman, Maleq Khan, and Madhav Marathe epartment of Computer Scence SSL, Vrgna Bonformatcs Insttute Vrgna Tech, Blacksburg, VA

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION 24 CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION The present chapter proposes an IPSO approach for multprocessor task schedulng problem wth two classfcatons, namely, statc ndependent tasks and

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Parallel and Distributed Association Rule Mining - Dr. Giuseppe Di Fatta. San Vigilio,

Parallel and Distributed Association Rule Mining - Dr. Giuseppe Di Fatta. San Vigilio, Parallel and Dstrbuted Assocaton Rule Mnng - Dr. Guseppe D Fatta fatta@nf.un-konstanz.de San Vglo, 18-09-2004 1 Overvew Assocaton Rule Mnng (ARM) Apror algorthm Hgh Performance Parallel and Dstrbuted Computng

More information

Constructing Minimum Connected Dominating Set: Algorithmic approach

Constructing Minimum Connected Dominating Set: Algorithmic approach Constructng Mnmum Connected Domnatng Set: Algorthmc approach G.N. Puroht and Usha Sharma Centre for Mathematcal Scences, Banasthal Unversty, Rajasthan 304022 usha.sharma94@yahoo.com Abstract: Connected

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound CSE 326: Data Structures Qucksort Comparson Sortng Bound Bran Curless Sprng 2008 Announcements (5/14/08) Homework due at begnnng of class on Frday. Secton tomorrow: Graded homeworks returned More dscusson

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

arxiv: v3 [cs.ds] 7 Feb 2017

arxiv: v3 [cs.ds] 7 Feb 2017 : A Two-stage Sketch for Data Streams Tong Yang 1, Lngtong Lu 2, Ybo Yan 1, Muhammad Shahzad 3, Yulong Shen 2 Xaomng L 1, Bn Cu 1, Gaogang Xe 4 1 Pekng Unversty, Chna. 2 Xdan Unversty, Chna. 3 North Carolna

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

CE 221 Data Structures and Algorithms

CE 221 Data Structures and Algorithms CE 1 ata Structures and Algorthms Chapter 4: Trees BST Text: Read Wess, 4.3 Izmr Unversty of Economcs 1 The Search Tree AT Bnary Search Trees An mportant applcaton of bnary trees s n searchng. Let us assume

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

ARTICLE IN PRESS. Signal Processing: Image Communication

ARTICLE IN PRESS. Signal Processing: Image Communication Sgnal Processng: Image Communcaton 23 (2008) 754 768 Contents lsts avalable at ScenceDrect Sgnal Processng: Image Communcaton journal homepage: www.elsever.com/locate/mage Dstrbuted meda rate allocaton

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

Cost-efficient deployment of distributed software services

Cost-efficient deployment of distributed software services 1/30 Cost-effcent deployment of dstrbuted software servces csorba@tem.ntnu.no 2/30 Short ntroducton & contents Cost-effcent deployment of dstrbuted software servces Cost functons Bo-nspred decentralzed

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

ELEC 377 Operating Systems. Week 6 Class 3

ELEC 377 Operating Systems. Week 6 Class 3 ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING An Improved K-means Algorthm based on Cloud Platform for Data Mnng Bn Xa *, Yan Lu 2. School of nformaton and management scence, Henan Agrcultural Unversty, Zhengzhou, Henan 450002, P.R. Chna 2. College

More information

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries Run-Tme Operator State Spllng for Memory Intensve Long-Runnng Queres Bn Lu, Yal Zhu, and lke A. Rundenstener epartment of Computer Scence, Worcester Polytechnc Insttute Worcester, Massachusetts, USA {bnlu,

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

Unsupervised Learning

Unsupervised Learning Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

Concurrent models of computation for embedded software

Concurrent models of computation for embedded software Concurrent models of computaton for embedded software and hardware! Researcher overvew what t looks lke semantcs what t means and how t relates desgnng an actor language actor propertes and how to represent

More information

Parallel Implementation of Classification Algorithms Based on Cloud Computing Environment

Parallel Implementation of Classification Algorithms Based on Cloud Computing Environment TELKOMNIKA, Vol.10, No.5, September 2012, pp. 1087~1092 e-issn: 2087-278X accredted by DGHE (DIKTI), Decree No: 51/Dkt/Kep/2010 1087 Parallel Implementaton of Classfcaton Algorthms Based on Cloud Computng

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

O n processors in CRCW PRAM

O n processors in CRCW PRAM PARALLEL COMPLEXITY OF SINGLE SOURCE SHORTEST PATH ALGORITHMS Mshra, P. K. Department o Appled Mathematcs Brla Insttute o Technology, Mesra Ranch-8355 (Inda) & Dept. o Electroncs & Electrcal Communcaton

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array Inserton Sort Dvde and Conquer Sortng CSE 6 Data Structures Lecture 18 What f frst k elements of array are already sorted? 4, 7, 1, 5, 1, 16 We can shft the tal of the sorted elements lst down and then

More information

AMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain

AMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain AMath 483/583 Lecture 21 May 13, 2011 Today: OpenMP and MPI versons of Jacob teraton Gauss-Sedel and SOR teratve methods Next week: More MPI Debuggng and totalvew GPU computng Read: Class notes and references

More information

A SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES

A SYSTOLIC APPROACH TO LOOP PARTITIONING AND MAPPING INTO FIXED SIZE DISTRIBUTED MEMORY ARCHITECTURES A SYSOLIC APPROACH O LOOP PARIIONING AND MAPPING INO FIXED SIZE DISRIBUED MEMORY ARCHIECURES Ioanns Drosts, Nektaros Kozrs, George Papakonstantnou and Panayots sanakas Natonal echncal Unversty of Athens

More information

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments Comparson of Heurstcs for Schedulng Independent Tasks on Heterogeneous Dstrbuted Envronments Hesam Izakan¹, Ath Abraham², Senor Member, IEEE, Václav Snášel³ ¹ Islamc Azad Unversty, Ramsar Branch, Ramsar,

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound CSE 326: Data Structures Qucksort Comparson Sortng Bound Steve Setz Wnter 2009 Qucksort Qucksort uses a dvde and conquer strategy, but does not requre the O(N) extra space that MergeSort does. Here s the

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information