SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs

Size: px
Start display at page:

Download "SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs"

Transcription

1 SAPPER: Sugrph Indexing nd Approximte Mtching in Lrge Grphs Shijie Zhng, Jiong Yng, Wei Jin EECS Dept., Cse Western Reserve University, {shijie.zhng, jiong.yng, ABSTRACT With the emergence of new pplictions, e.g., computtionl iology, new softwre engineering techniques, socil networks, etc., more dt is in the form of grphs. Locting occurrences of query grph in lrge dtse grph is n importnt reserch topic. Due to the existence of noise (e.g., missing edges) in the lrge dtse grph, we investigte the prolem of pproximte sugrph indexing, i.e., finding the occurrences of query grph in lrge dtse grph with (possile) missing edges. The SAPPER method is proposed to solve this prolem. Utilizing the hyrid neighorhood unit structures in the index, SAPPER tkes dvntge of pre-generted rndom spnning trees nd crefully designed grph enumertion order. Rel nd synthetic dt sets re employed to demonstrte the efficiency nd sclility of our pproximte sugrph indexing method.. INTRODUCTION Grph dt hs ppered in mny recent pplictions, rnging from ioinformtics, softwre engineering to socil networks. Mnging, processing, nd nlyzing these grph dt ecomes n urgent prcticl prolem. Sugrph query is one of the most fundmentl procedures in mnging grphs. In mny pplictions, e.g., iologicl networks, grphs re lrge with thousnds or tens of thousnds of vertices nd millions of edges. A sugrph query is to identify the occurrences of the query sugrph in the dtse grph. Although sugrph query hs een studied previously [24], the sic ssumption is tht the networks of interest re perfectly clen. In order to qulify n occurrence of query sugrph q, ll edges of the query grph hve to occur in the dtse grph G. In other words, the occurrence hs to e exct. However, noise commonly exists in mny pplictions or the pproximte mtches themselves re more interesting. For exmple:. A chllenging prolem in the computtionl iology is to n- Acknowledgement: This presenttion ws mde possile, in prt, through finncil support from the School of Grdute Studies t Cse Western Reserve University. This project ws prtilly supported y grnts of NSF nd NSF060. Permission to mke digitl or hrd copies of ll or prt of this work for personl or clssroom use is grnted without fee provided tht copies re not mde or distriuted for profit or commercil dvntge nd tht copies er this notice nd the full cittion on the first pge. To copy otherwise, to repulish, to post on servers or to redistriute to lists, requires prior specific permission nd/or fee. Articles from this volume were presented t The 6th Interntionl Conference on Very Lrge Dt Bses, Septemer -7, 200, Singpore. Proceedings of the VLDB Endowment, Vol., No. Copyright 200 VLDB Endowment /0/09... $ notte, index nd serch sugrphs in lrge networks generted with high throughput experiments. Specificlly, the prolem is to serch for well chrcterized pthwys/ptterns in less studied model orgnism [7]. Sugrph Indexing is useful in querying for pthwys/ptterns from well studied model orgnisms in other unfmilir orgnisms with known protein-protein interction networks where vertices nd edges represent proteins nd interctions, respectively. However, due to possile errors in dt collection nd different thresholds used in experiments, the dt re highly noisy. Missing interctions re common nd it is very difficult to clen the dt. By discovering nd nlyzing the pproximte mtches, iologists would generte solid hypotheses for future studies in understnding nd identifying pthwys/ptterns in not so well studied model orgnisms. 2. In oject-oriented progrmming, developers nd testers hndle multiple ojects of the sme or different clsses. The oject dependency grph of progrm run, where ech vertex is n oject nd ech edge is n interction etween two ojects through method cll or field ccess, helps developers nd testers understnd the flow of the progrm nd identify ugs. The ptterns to e queried tht re confirmed y the developers s typicl oject usges cn e used to utomticlly detect the loctions in progrms tht devite from them (tht is similr to the pttern ut not exctly the sme) [8]. Hence, y retrieving the pproximte occurrences of typicl pttern, developers nd testers cn quickly locte where the possile ugs re. In this pper, we investigte the prolem of discovering the occurrences of query grph q in G. The query grph my contin dozens of vertices. Sugrph indexing hs een studied efore [24, 9, 9]. In previous work, to qulify n occurrence of q in G, ll edges of q hve to occur. On the other hnd, we re studying the sugrph indexing prolem in the context of noises, e.g., missing edges. Therefore, in this pper, n pproximte mtch model is developed. In this model, the edge edit distnce (i.e., the numer of edge modifictions needed to trnsform one grph to nother) is used to qulify n occurrence of q. If the edge distnce etween the query grph q nd sugrph q of G is no more thn some threshold θ, then q is considered s n pproximte occurrence of q. This pproximte mtching model tkes into ccount missing edges in the dtse grph G. Note tht we do not consider the pproximte mtches with dditionl edges to the query grphs ecuse such mtches re lwys contined y the mtches of the query grphs. We do not consider lel mismtches ecuse the numer of possile cndidte grphs with lel mismtches to given query grph cn e huge. For exmple, let us ssume the size of the query grph is n nd the numer of vertex lels in the dtse grph is m, then the totl numer of cndidte grphs with only two lel 8

2 mismtches is n (n ) (m ) 2 even without considering ny missing edges. There is very strightforwrd solution for pproximte query mtching. We cn first find ll grphs whose edge edit distnce to q is no more thn θ. Next, for ech of these grphs q, the exct occurrences of q in G cn e discovered. In this wy, the pproximte sugrph mtching cn e reduced to the prolem of exct sugrph mtching nd previous existing methods, e.g., GADDI [24] cn e pplied. However, this pproch hs two shortcomings. First, the exct sugrph mtching itself is very difficult prolem since sugrph isomorphism is known to e n NP-hrd prolem. Secondly, there could e potentilly lrge numer of grphs whose edge edit distnce is no more thn θ wy from q (Denote these grphs s AI(q,θ)). For instnce, if q hs m edges, then the numer of grphs in AI(q,θ) could e O(m θ ), which could e very lrge. Thus, it is crucil to devise n efficient wy to process the group of queries. In this pper, we im to solve the ove two prolems. To efficiently identify the occurrences of one sugrph, novel indexing structure, hyrid neighorhood unit (HNU), is devised. Let N i(v, G) e the set of vertices u in G such tht there exists n i-edge pth in G tht connects u nd v. For ech vertex v in the dtse grph G, HNU stores the degree of v nd the lels of v, v s neighors (N ((v, G)), nd v s neighors neighors (N 2(v, G)). In most cses, N (v, G) is reltively smll set, ut N 2(v, G) could e lrge. For grph with verge degree d, there could e d 2 vertices in N 2(v, G). During the query time, when mtching one vertex u in q to vertex v in G, we need to find out whether the lels in N 2(u, q) re suset of those in N 2(v, G), which could e costly if these sets re lrge. To efficiently determine the set reltionship, the loom filter [] dt structure is used to represent the lels in N 2(v, G). The loom filter is n L-it vector which cn e used to determine whether one set is suset of nother. It hs the following dvntges. It is time efficient nd spce compct. Moreover, it hs no flse negtives nd only smll rte ( %) of flse positives. Therefore, the vertices in the query grph q cn e efficiently mtched to the vertices in G with high ccurcy. To improve the efficiency of processing set of sugrph queries (grphs in AI(q,θ)), we mke the following oservtion. Although there could e m θ grphs in AI(q,θ), these grphs re highly overlpped. Therefore, it is eneficil to query the overlpping prts first since they hve the gretest pruning power, i.e., cn e used in mny of grphs in AI(q,θ). As result, the spnning trees of q re used for the query first ecuse (i) mny grphs in AI(q,θ) contin some spnning tree of q nd (ii) the time to identify tree in G is quite smll. Bsed on the mtches of the spnning trees, we cn mp vertices in q to vertices in G. The grph occurrences hve similr property s the Apriori Property [2] ecuse n occurrences of supergrph hs to contin n occurrence of sugrph. Therefore, finding the mtches of grphs in AI(q,θ) is similr to tht of discovering frequent ptterns. As result, depth-first enumertion order similr to tht of FP-tree [0] is constructed for mtching grphs in AI(q,θ) so tht previous discovered occurrences of grph q cn e used for the mtching of lter enumerted supergrphs of q. The reminder of this pper is orgnized s follows. Section 2 is the relted work nd section is the preliminries. We present how to preprocess the dtse nd construct the index in section 4. Section descries the query processing. The experiment results re presented in section 6. Lst, the finl conclusion is drwn in section RELATED WORK These dys grph dtse reserch hs ttrcted gret ttention, relted works of sugrph indexing for pproximte grph mtching include sugrph isomorphism lgorithms, grph indexing nd sugrph indexing, pproximte sugrph mtching nd grph similrity serch. The first ctegory of relted reserch lies in sugrph isomorphism lgorithms. Ullmnn [20] proposed sugrph mtching lgorithm sed on stte spce serch method with cktrcking. However, this lgorithm is prohiitively expensive for querying ginst lrge grph. Cordell [6] proposed new sugrph isomorphism lgorithm for lrge grphs. These lgorithms do not utilize ny index structure y preprocessing the dtse grphs. Mny index-sed grph mtching nd serching schemes hve een proposed to find where the query grph occurs in the grph dtses [4, 2, 8, 22, 2, 9], which cn e further divided into the grph indexing nd sugrph indexing. In grph indexing, e.g., gindex[22], TreePi[2], FG-Index[4], the grph dtse consists of set of smll grphs. The grph indexing ims to find ll dtse grphs tht contin or re contined y given query grph. On the other hnd, in the sugrph indexing e.g., GrphGrep [9], TALE [9], GADDI [24], the gol is to index very lrge dtse grph, so tht we cn find ll or suset of the mtches of given query grph efficiently in the very lrge dtse grph. The proposed method, SAPPER, lso dels with sugrph indexing in very lrge dtse grph, nd thus flls into this ctegory. Recently, numer of lgorithms re proposed which support pproximte grph mtching or similrity serch through different mens [7,, 2, 9, 2,, 9, 6]. C-Tree[] orgnizes dtse grphs in tree sed structure, where interior nodes re grph closures, nd lef nodes re dtse grphs. The design of its dt structure enles it to perform similrity queries efficiently. In TALE [9], importnt nodes re mtched first nd then the mtch is progressively extended. The method is very effective nd fst in pproximtely finding mtches in lrge grph. In G-Hsh [2], wvelet grph mtching kernels re pplied long with hshing scheme. In [], top-k query scheme is proposed to find the most similr k nswers. However, most of these lgorithms re not designed for finding ll pproximte mtches for the query grph with given threshold in very lrge grph. In [6], the uthors im to find the dtse grphs tht re similr to the query grph. Since the dtse grphs nd the query grph re ll smll, they trnsform the pproximte grph mtching to the SET-COVER prolem. Another ctegory of reserch relted to the sugrph mtching is grph lignment [4, 7]. Insted of mtching sugrphs in lrge dtse grph, these methods imed to lign pir of iologicl grphs. In the prolem studied in this pper, the size of the query grph my e much smller thn tht of the dtse grph. Thus, the grph lignment method my not e directly pplicle.. PRELIMINARIES In this section, we introduce the fundmentl definitions used in this pper nd give the forml prolem sttement. We investigte the pproximte grph mtching methods for undirected nd unweighted leled grphs. Without loss of generlity, it is esy to extend our methods to directed nd weighted leled grphs. DEFINITION. A leled grph G is five element tuple G = (V,E,Σ V, Σ E,L G) where V is set of vertices nd E V V is set of edges. Σ V nd Σ E re the sets of vertices nd edge lels, respectively. The leling function L G defines the mppings V Σ V nd E Σ E. 86

3 DEFINITION 2. The edge edit distnce from grph g to g 2 is defined s the minimum numer of dded edges required to trnsform g into g 2. We denote the edge edit distnce s D edit (g,g 2). is one, while it is zero for the right mtch, which is lso n exct mtch. Before presenting the pproximte sugrph indexing method, we will introduce the sugrph mtching property which will e used extensively lter in this pper. PROPERTY. Given query grph q nd dtse grph G, for ny exct mtch g of q in G, let q e sugrph of q, g must contin mtch of q in G. Figure : An Exmple of the Edge Edit Distnce Figure 2: The Dtse Grph, Query Grph nd Mtches For exmple, in Figure, y dding two edges to g, we cn trnsform g to g 2. This leds to D edit (g,g 2)=2. Edge edit distnce is not symmetric, i.e., g,g 2,D edit (g,g 2) D edit (g 2,g ). When grph g is not possile to e trnsformed to nother grph g y dding edges, we hve D edit (g,g )=+. DEFINITION. Given dtse grph G, connected query grph q, nd n integer θ s threshold, connected sugrph s of G is defined s n pproximte mtch of q in G if nd only if D edit (s, q) θ; ny grph isomorphic to s is defined s pproximtely isomorphic to q. The set of grphs pproximtely isomorphic to q is denoted s AI(q,θ). If the edge edit distnce from n pproximte mtch m to q is exctly zero, m is n exct mtch of q in G. Apprently, the set of pproximte mtches of ny query grph is the superset of the set of exct mtches of the sme query grph. In this pper, two restrictions on the pproximte mtch re imposed: (i) the pproximte mtch hs to e connected nd (ii) only edge dditions re considered, ut not the edge deletions. A rief discussion on pproximte mtches without these two restrictions is presented in the ppendix. Prolem Sttement: We im to solve the following two prolems. () Given lrge dtse grph G, we wnt to construct n index. (2) Given query grph q nd threshold integer θ, we wnt to efficiently find ll mtches of grphs tht re pproximtely isomorphic to q in G with the help of the indexed informtion. Our gol is not to find some of the mtches to the grphs in AI(q,θ), ut to find ll mtches to the grphs in AI(q,θ). The word pproximte refers to the mtches of grphs tht re pproximtely isomorphic to the query grph. In Figure 2, given the query grph nd threshold θ =,two distinct pproximte mtches exist in the dtse grph. The edge edit distnce from the left pproximte mtch to the query grph Figure 2 lso illustrtes this property: the right exct mtch in dtse grph contins ny mtch of sugrph q of the query grph q. This property is similr to the Apriori property in the frequent pttern mining [2]. With this property, we cn devise n lgorithm tht serches the mtches of sugrph first. By refining these mtches, we cn uild the mtches for lrger sugrphs. The processing of grph queries in our pper cn e divided into two mjor steps. In the first step we construct the index from the dtse grph. The hyrid neighorhood unit (HNU) is used to store the useful locl informtion for ech vertex. In the second step, pproximte mtches of the query grph q re identified. 4. HYBRID NEIGHBORHOOD UNIT INDEX In GrphGrep [9], the effectiveness of pths is first reveled, while in TALE [9], neighoring unit proves to e compct nd powerful index unit. In GADDI [24], neighoring distnces sed index shows its strength in grph mtching in single lrge grph. Tking the usefulness of these three models into ccount, we crete new index unit, clled hyrid neighorhood unit (HNU). For ech vertex v in G, let N i(v, G) e the set of vertices u in G such tht there exists pth of i edges etween u nd v. For exmple, N (v, G) is the set of vertices tht re djcent to v in G. For the dtse grph G, we construct the HNU for ech vertex v in G. The HNU of v includes four prts: the lel v, the degree of v, the lels of vertices in N (v, G) nd the lels of vertices in N 2(v, G). The first three prts re esy to compute nd efficient to store. However, the lst prt could e too lrge. For grph with the verge degree of d, N 2(v, G) could e O(d 2 ). The loom filter [] is used to store the lels in N 2(v, G). A loom filter B is n L-it vector nd set of m independent hsh functions {f,f 2,...,f m}. It is used to determine whether n element x is memer of set X. Ech of the m hsh functions f i mps n element into n integer etween nd L. Initilly, ll its in B re set to 0. If f i mps n element in X into the integer k, then the kth it in B (B[k]) is set to. After mpping every element in X with m hshing functions, some it in B is while others re 0. To determine whether x is in X, x is mpped to m integers with the m independent hsh functions. Assume tht f i(x) =k i.ifx X, then B[k i] hs to e for ll k i ( i m). If k i, B[k i]=0, then x cn not e memer of X. There is no flse negtive in the loom filter. However, there could e flse positive, i.e., if ll mpped its of x re in B, then there is still chnce tht x is not memer of X. The error rte depends on L, X (numer of elements in X), nd m. The optiml numer of independent hsh functions is pproximtely 0.7 L/ X. In ddition, if the positive error rte is set to %, then L/ X should e 9.6 []. Since X re the lels of vertices in N 2(v, G), X cn e pproximted y d 2 where d is the verge degree of vertex in G. Without loss of the generlity, we choose L nd m to e 9.6d 2 nd 7, respectively to ensure the flse positive rte no more thn 0.0. If lower flse positive rte is needed, ech time we dd out 4.8 its per element to the length of the loom filter, the flse positive rte is reduced y ten times. In the HNU of vertex v, the lels 87

4 of N 2(v, G) re collected nd n L-it loom filter is uilt during index construction time. The time complexity to otin the first three prts of the HNU is O(d) for ech vertex while the loom filter tkes O(d 2 m + L) time to uild. Since L is in the order of d 2, the time complexity of loom filter construction cn e simplified s O(md 2 ). Thus, the totl index construction time for ll vertices in G is O(md 2 V G ) where V G is the numer of vertices in G.. SAPPER QUERY PROCESSING In this section, we introduce the pproximte sugrph mtching lgorithm, nmely SAPPER. During the query of sugrph q in G, SAPPER consists of four min steps: vertex mtching, constructing rndom spnning trees of q, generting mtching order of grphs in AI(q,θ), nd the finl grph mtching. SAPPER first finds cndidte mtches of ech vertex v q q to vertices in G sed on the HNUs. Next, we rndomly generte set of spnning trees of q. The mtches of the spnning trees re discovered sed on the vertices mtch. The spnning tree mtches re used for mtching the pproximte grphs. Since there re multiple grphs need to e mtched, n order on mtching these grphs is determined. Finlly, mtches of ll these grphs re discovered.. Vertex Mtching For ech vertex v q in the query sugrph q, we serch for its mtches in G sed on the HNUs. A vertex v G in G is mtch of v q if ll the following conditions re stisfied: ) The lel of v q is the sme s tht of v G. 2) The degree of v q is less thn or equl to tht of v G. ) The lels of vertices in N (v q,q) is suset of those of N (v G,G). 4) The lels of vertices in N 2(v q,q) is suset of those of N 2(v G,G). In the lst step, the loom filter B is employed. Ech lel in N 2(v q,q) is hshed vi the m hsh functions nd check whether the corresponding its in B of v G re. After this step, ech v q is ssocited with set of mtched vertices in G, denoted s M(v q). The totl time complexity in this step is O(d 2 m V (q) V (G) ) where d, m, V (q), nd V (G) re the mximum of the verge degree of G nd q, the numer of hsh functions for the loom filter, the numer of vertices in q, nd the numer of vertices in G, respectively. There re some flse positives in the fourth step due to the loom filter. The totl flse positive rte is ( e) l where e nd l re the flse positive rte of determining whether one element is in the loom filter nd the numer of distinct lels in N 2(v q,q), respectively. This is ecuse if ny lel out of the l lels is reported s flse positive y the loom filter of v G, then v G is flse positive mtch of v q.ifend l re 0.0 nd 0, then the totl flse positive rte is less thn 0.. Since the vertex mtching is to find cndidte set of mtches for vertex in q, the flse positive rte is well in the tolernce..2 Rndom Spnning Tree Genertion nd Mtching Although mtches for vertices hve een discovered, these mtches re determined sed on the locl informtion (within 2-edge distnce). It is possile tht some of these mtches re flse positives. Therefore, more informtion needs to e used to prune the mtches. Since our ultimte gol is to find mtches for ll grphs in AI(q,θ), it is desirle to use the glol informtion existing in lrge numer of the grphs of AI(q,θ). All grphs in AI(q,θ) re θ or less edge edit distnce wy from q, nd hence they re hevily overlpped. Therefore, spnning trees of q will e used for the glol informtion ecuse grphs in AI(q,θ) would shre mny spnning trees. In ddition, we wnt ech edge in q to hve the sme proility to e selected into spnning tree. This could ensure tht ech grph in AI(q,θ) would contin similr numer of spnning trees, nd thus hve similr mount of pruning power. A rndom spnning tree T of q hs the following property: ech edge e in q hs the sme proility to e selected into T []. For grph q with vertices V (q) nd edges E(q), rndom spnning tree T of q is constructed vi rndom wlk. A rndom wlk on q is discrete-time Mrkov chin with the following trnsition proilities from vertex v to nother vertex w: P (v,w) =/d v (d v is the degree of vertex v) if there is n edge from v to w. Otherwise, P (v, w) =0. Initilly, vertex v 0 V (q) is rndomly chosen s the strting point nd the spnning tree T only contins vertex v 0. The rndom wlk strts t v 0. An edge (v, w) is rndomly chosen sed on the proility P.Ifw is not in T, edge (v, w) nd w re inserted into T. Otherwise, no edge will e dded into T. Next the wlk is repeted on w. This process termintes until T includes ll vertices of V q. The forml rndom tree construction lgorithm is descried in Algorithm in Appendix nd n exmple is depicted in Figure. In the exmple, t time step t 0, T only includes v 0 nd no edge. In t time stmp, the edge (v 0,v ) nd v re dded into T nd T contins vertices v 0, v, nd one edge. At the time stmp t 2, no edge or vertex is dded into T since the rndom wlk is ck to v 0.Att, the edge (v 0,v 2) nd vertex v 2 re dded into T nd spnning tree is formed. the query grph v0 v time t2 v0 time t0 v0 v time t v0 v0 v2 v v2 v time t the spnning tree Figure : The Rndom Spnning Tree Genertion A tree generted y this rndom wlk lgorithm is uniform rndom spnning tree, i.e., the proility of spnning tree t of grph q to e generted y Algorithm is /T N(q), where TN(q) is the numer of distinct spnning trees of q. This cn e proved y showing tht the set of trees constructed y rndom wlk hs sttionry distriution proportionl to the degree of the vertex from which it strts. The detiled proof ws presented in []. In this step, we generte V (q) + rndom spnning trees so tht () ech edge hs 8% proility to e included in t lest one of the spnning trees nd (2) the complexity is still not too lrge. A vertex v in q is rndomly chosen s the prime vertex. For ech generted spnning tree T, we find its mtches in G sed on the vertices mtch. The mtching strts from the prime vertex v in T nd tries to mtch v s neighors in T. For exmple, let s ssume tht v s mtches in G re M(v) ={u,u 2} nd v is connected to v in T. Then we try to see whether v mtches to ny neighor of u or u 2 in G. In other words, we wnt to see whether ny neighor of u or u 2 is in M(v ). Ifv only could e mtched to some neighor of u, ut not ny neighor of u 2, we know tht u 2 could not e mtch to v for the occurrence of T in G nd hence, u 2 could e removed from the mtch for v of T. The process continues until ll mtches of T re locted. The mtching process is performed in depth-first trversl mnner. Since the tree is very 88

5 specil form of grph, the mtch of tree in G is rther efficient nd simple. Due to the spce limittions, we omit the detils of tree mtching in this pper. After the mtching process for T, the prime vertex v hs set of mtched vertices in G for T. M(v, T i) is denoted s the set of vertices in G tht could e mtched to the prime vertex v for the query grph T i. For exmple, in Figure, v hs lel in the query grph, then M(v, T )={, 0} (circled y the solid ellipses), where nd 0 re the ids of the mpped vertices of v in the two mtches of the spnning tree T. Since there re V (q) + spnning trees, there re V (q) + sets of M(v, T i). These mtched sets of v serve s the strting point for the lter grph mtching. Given query grph q, nd threshold θ, there re pproximtely ( E(q) θ ) sugrphs of q of E(q) θ edges. After generting V (q) + spnning trees, sugrph of q with E(q) θ edges hs the proility P to contin t lest one of these rndom spnning trees, where P is P = ( ( E(q) θ ) V (q) ) V (q) +. E(q) For instnce, if q consists of 0 vertices nd 20 edges nd θ is 2, P would e lrger thn This mens tht most of these grphs could utilize the mtch informtion of the spnning trees.. Query Grph Enumertion Order Since there re mny grphs in AI(q,θ), we need devise n order on enumerting these grphs. This prolem is similr to tht of frequent pttern mining in the dt mining field. There re two min pproches to enumerte ptterns in frequent pttern mining: redth-first enumertion nd depth-first enumertion. In the red-first enumertion [2], ll ptterns (grphs) with i items (edges) re first enumerted. Bsed on the occurrences (mtches) of these pttern (grphs), their super-ptterns (super grphs) with one extr item (edge) re enumerted nd so on. In the depth-first pttern (grph) enumertion [0], one pttern (sugrph) is generted first, if it hs sufficient occurrences (mtches), one item (edge) is dded into the pttern (sugrph), nd the occurrences (mtches) of the new pttern is serched nd so on. It hs een shown tht the depth-first enumertion hs n dvntge over the redth-first serch ecuse in depth-first serch, () pttern genertion is simpler nd more efficient, (2) the mtch of pttern cn e directly uilt on its predecessor, nd () mny ptterns re not enumerted. Bsed on this knowledge, we devise depth-first enumertion of our grphs in AI(q,θ). We ssign unique id to ech edge in q nd lexicogrphicl order is ssumed on these edge ids. Assume tht there re z edges in q, whose ids re e <e 2 < <e z ccording to the lexicogrphicl order. (We will discuss how to ssign the lexicogrphicl order shortly.) Thus, ech grph in AI(q,θ) cn e uniquely represented y sequence of edges (sorted ccording to the lexicogrphicl order of the edges). The order of two distinct grphs q nd q in AI(q,θ) cn e determined sed on their corresponding edge lists. Let edge lists of q nd q e e,e 2,...,e i nd e,e 2,...,e j. respectively. If one sequence is prefix of nother, e.g., q is prefix of q, then we define q <q. Otherwise, there exists n integer k (k i nd k j) such tht e k e k, then the order of q nd q cn e determined s follows. Let k e the smllest integer such tht e k e k. q <q if nd only if e k <e k. By defining the lexicogrphicl order of grphs, the grphs in AI(q,θ) cn e enumerted in depth-first mnner from the lexicogrphiclly smllest to the lexicogrphiclly lrgest. First, the edge sequence (grph) with the smllest lexicogrphicl order q is enumerted, which is e,e 2,...,e l (l = E(q) θ). If q hs t lest one mtch, then n edge with the smllest lexicogrphicl order fter e l is ppended into q to form new grph q 2 s descried in Algorithm 2 in Appendix. (This procedure is illustrted s next in Figure 4.) This process continues on q 2 until no edge cn e ppended into q 2 or there is no mtch for q 2. In such cse, it is not necessry to enumerte ny edge sequences contining q 2 s prefix. The enumertion process will resume from the lexicogrphiclly smllest grph tht does not contin q 2 nd is lrger thn q 2. This procedure is descried in Algorithm in Appendix. (This procedure is illustrted s jump in Figure 4.) Let s tke look t n exmple. Assume tht q consists of four edges e < e 2 < e < e 4 nd θ =2. The lexicogrphiclly smllest grph in AI(q,θ) is (e,e 2). If(e,e 2) hs t lest one mtch, then e is ppended nd (e,e 2,e ) will e enumerted next. In the cse of (e,e 2,e ) hs no mtch, then ny sequence whose prefix is (e,e 2,e ) will not e enumerted, nmely the sequence (e,e 2,e,e 4). Next, the lexicogrphiclly smllest grph tht does not contin (e,e 2,e ) s prefix nd is lrger thn (e,e 2,e ) is enumerted, which is (e,e 2,e 4). Figure 4 shows the enumertion order of grphs in this exmple. The grphs re enumerted from top-down nd left-to-right fshion. In this method, ech grph in AI(q,θ) will e enumerted or reched t most once. Thus, t most AI(q,θ) grphs will e enumerted under this method. next e e 2e e 4 e e 2 jump e e e e 4 e 2e e 2e 4 e e 4 next next next e e 2e e e 2e 4 e e e 4 pruned jump jump jump jump e 2e e 4 Figure 4: The Enumertion Order Although ny lexicogrphicl order mong edges will work, our gol is to prune the grphs in AI(q,θ) s erly s possile. As result, it is eneficil to serch the grphs with the smllest numer of mtches first so tht it cn prune the grphs in AI(q,θ) the most. Therefore, the lexicogrphicl order of edges is set ccording to the numer of mtches of ech edge. e i <e j if edge e i occurs less times in G thn e j. If two edges hve the sme numer of occurrences/mtches, thn n order is ssigned ritrrily..4 Grph Mtching After determining the enumertion order of query grphs, we continue to mtch these grphs in the enumertion order. When mtching grph q, there re two cses: q is connected nd q is not connected. In the cse tht q is not connected, it is not necessry to find mtches of q since we re only interested in connected query grphs. However, it is possile tht some supergrph of q is connected. Thus, we pretend there is mtch of q (without serching for the mtches of q ), nd continue to enumerte the supergrphs of q y ppending n edge to q. In the second cse tht q is connected, we need to find mtches of q in G. The mtching process cn e divided into two cses gin ccording to q, () we hve not yet serched ny prefix of q nd (2) we hve found mtch(es) of some prefix of q. In the first suctegory, since q is very likely to contin t lest one pregenerted spnning tree. Thus, the mtching of q often could strt from the spnning trees. In the rre scenrio tht q does not contin ny rndomly generted spnning tree, the mtch hs to strt 89

6 from the vertex mtches without the help of the spnning trees. The vertices re mtched in depth-first order. To mtch dtse grph vertex v g nd query grph vertex v q, we require tht () v g is in M(v q) nd (2) for ech edge djcent to v q in q (v q,u q), there exists vertex u g such tht the edge lel of (v g,u g) is the sme s (v q,u q) nd u g is mtched to u q. This process is similr to other existing grph mtching lgorithms, e.g., GADDI [24] nd hence we will not present it here due to the spce limittions. When q contins t lest one spnning trees, the following procedure is employed. First, the spnning trees contined y q will e identified vi the edges in q nd those contined in the spnning trees. Assume q contins r spnning trees T,T 2,...,T r. Ech mtch of q hs to contin t lest one occurrence of T, T 2,..., nd T r. Therefore, the mtched vertices of the prime vertex v for q should e in M(v, T i) for ll i r. Thus, M(v, q )= r i=m(v, T i) will serve s the strting point for finding the mtches of q in G. Bsed on the mtch set of M(v, q ), we serch for the mtches of q s neighors nd so on. After finding the mtches of q. For ech mtch of q in G, we keep the mpping from the vertices in the mtch of q to vertices in q. Figure shows n exmple of mtching q sed on the mtches of the spnning trees. We cn see tht M(v, T )={, 0} (circled y the solid ellipses) nd M(v, T 2)={8, 0} (circled y the dotted ellipses). The intersection of the two sets is {0}, which is the strting point to mtch q. 2 The dtse grph G The query grph q T Prime vertex Prime vertex Figure : Mtching q sed on the mtches of the spnning trees q contins In the second su-ctegory, mtches of some of q s prefix hve een discovered. Let q 2 e the longest prefix of q such tht the mtches of q 2 hve een identified. Also denote tht e,e 2,...,e i e the edges in q, ut not in q 2. For ech mtch of q 2, we check whether e,e 2,...,e i exist in G. If so, this will e mtch of q. Otherwise, this mtch of q 2 could not e extended to mtch of q. This process continues until ll mtches of q 2 re exmined. The forml lgorithm is descried in Algorithm 4. Figure 6 depicts n exmple of mtching q sed on its sugrph q 2 corresponding to the longest prefix of q. Then when mtching q, we only need to check the mtches of q 2. Although the SAPPER lgorithm employs pproximtion to ccelerte the mtching process, it cn find ll mtches to the grphs tht re pproximtely isomorphic to query grph. Due to the spce limittions, the proof is in the Appendix. It is difficult to determine the exct time complexity of the SAPPER method since it depends on how mny grphs in AI(q,θ) re enumerted. Since the sugrph isomorphism test is n NP-hrd prolem, the worst cse time complexity is exponentil. We will empiriclly nlyze the time efficiency nd sclility of the SAPPER method in the next section. T2 Prime vertex The dtse grph G Prime vertex The query grph q Figure 6: Mtching q sed on its sugrph q 2 q2 Prime vertex 6. EXPERIMENTAL RESULTS In this section, we empiriclly nlyze the performnce of SAP- PER ginst TALE, GADDI, two of the most recent sugrph mtching tools tht designed for lrge grphs, nd Bsic SAPPER (BSAP- PER). TALE is efficient in index construction nd heuristiclly finds the pproximte mtches of the query grph. GADDI enumertes ll possile pproximte isomorphic grphs (AI(q,θ)) of the query grph nd finds ll exct mtches for ech of these grphs. To show the pruning power of the rndom spnning trees nd lexicogrphicl order, we lso include BSAPPER in the comprison results. BSAPPER employs the sme indexing structure s SAPPER, ut it differs from SAPPER in the following two spects. (i) BSAPPER does not use spnning trees. (ii) BSAPPER uses redth-first enumertion order similr to the level-wise serch lgorithm in [2]. In the first level, ll the grphs θ edge edit distnce wy from the query grph q re enumerted nd queried. Next it enumertes grphs θ edge edit distnce wy in the second level, grph will e enumerted in the second level if there exists t lest one mtch for ll its sugrphs in the first level. This process continues until either the level contining q or no grph cn e enumerted sed on the sugrph property. The performnce difference etween BSAPPER nd SAPPER is essentilly the effects of the rndom spnning trees nd the lexicogrphicl order query grph enumertion while the performnce difference etween BSAPPER nd GADDI is the effects of the loom filters. All methods re implemented with C++ nd run on Dell PowerEdge 290, with two.0 GHZ dul-core CPUs nd 6 GB min memory, nd Linux smp system. 6. Protein Interction Network In this set of experiments, the grph is generted from suset of the protein interction network for homo spiens. Ech vertex represents protein nd the lel of the vertex is its gene ontology term from [2]. An edge in the grph represents n interction etween the two proteins it connects. There re 640 vertices, 844 edges, nd the verge degree of vertex is 6.8. There re totl of 62 distinct lels. SAPPER spends out 2 minutes to construct n index of 60MB, while TALE spends 0 minutes to construct n index of MB, nd GADDI spends minutes to construct n 00MB index. As SAP- PER processes more informtion thn TALE, it tkes more time to construct the index. Since we only need to uild n index structure for ech dtse grph once, the query time is much more importnt thn the index uilding time. To evlute these four methods, we use eight known signl trnsduction pthwys from the KEGG dtse [] to query the protein interction network. These known pthwys re from species

7 other thn homo spiens, e.g., flies nd yest, etc. Since some protein interction only exists in yest or flies nd does not exist in humn, there re missing edges in the homo spiens protein interction network. If θ is set to 2, ll eight signl trnsduction pthwys should e recovered in our homo spiens protein interction network. Thus, we use these eight pthwys s the query grphs nd set θ to 2. SAPPER, BSAPPER nd GADDI find ll these eight pthwys successfully. Among these three methods, SAPPER is much fster thn the remining two due to its dvnced pruning techniques. Since TALE is heuristic lgorithm, it only finds two out of these eight pthwys. Although TALE runs very fst, its ccurcy (e.g., recll) is not high. The execution time of SAPPER, BSAPPER, GADDI, nd TALE is shown in Figure 7. The numer of vertices on the eight known pthwys re 9, 0,, 2, nd 4. Thus, we report the verge execution time with respect to the numer of vertices in ech query grph. of G. This ffects SAPPER more on the index construction time since the numer of 2-hop neighor vertices grows exponentilly with respect to the verge degree. () Index Construction Time () Index Size (c) Index Construction Time (d) Index Size Figure 8: Comprisons of the Indices Figure 7: The Performnces of the Queries on Protein Interction Network 6.2 Synthetic Dt Sets In this portion of the experimentl studies, we nlyze the performnce of SAPPER, BSAPPER nd GADDI y independently vrying ech of six prmeters on set of syntheticlly generted grphs. We do not include TALE ecuse lthough it cn efficiently finish the queries, only round 20% of ll the pproximte mtches re discovered y TALE s shown in the rel dt set. To systemticlly nlyze the performnce of these methods, we vry one prmeter t time. The defult vlues of the prmeters re listed in Tle. Tle : Defult Prmeter Vlue Prmeter Defult Vlue Numer of vertices in G 000 Numer of vertices in q 20 Numer of lels 20 θ Averge degree of G 8 Averge degree of q 4 The index construction comprisons re shown in in Figure 8. We first vry the numer of vertices in G. GADDI needs more time to construct the index thn SAPPER ecuse it needs to clculte the NDS distnces for neighoring vertices. Due to the nture of the compctness of the loom filter, the size of the index of SAPPER is consistently smller thn tht of GADDI. When the numer of vertices in G is 0,000, SAPPER tkes round 8000 seconds to uild n 80 MB index. Next, we vry the verge vertex degree Now the verge query time of these methods on different prmeters re nlyzed. The first prmeter is the numer of vertices in G. The V (G) is vried from 200 to 0,000. SAPPER nd BSAPPER chieve etter mtching efficiency thn GADDI s they cn quickly mtch vertices y the index nd optimize the pproximtion mtching process. SAPPER outperforms BSAPPER due to the effectiveness of the rndom spnning trees nd lexicogrphicl order pruning techniques. The results re shown in Figure 9(). Next we vry the numer of vertices in the query grph q. We show the result in Figure 9 (). With more vertices in q, more vertices nd edges need to e compred in the query process, so the query times of ll three methods increses. The increse is more evident with V (q) 40, s the methods need to find ll pproximte mtches, especilly GADDI, which processes more cndidte grphs for lrge query grph without pruning techniques. The third prmeter we vry is the numer of distinct lels. From Figure 9 (c), we cn see tht more lels in G increses the pruning power of GADDI, ut hs mixed effect on SAPPER. This my e due to the fct tht SAPPER only indexes suset of lels of neighoring vertices. Incresing the numer of distinct lels reduces the numer of cndidte mtches etween ny pir of vertices in G nd q, ut lso decreses the pruning power of SAPPER s index. The pproximte threshold prmeter θ is vried nd the results re shown in Figure 9 (d). With the increse of θ, the query time of SAPPER is still less thn GADDI nd BSAPPER ecuse GADDI needs to generte ll possile cndidte grphs, whose numer increses drmticlly with θ. On the other hnd, due to the use of the dvnced pruning techniques, the query time of SAPPER increses t slower pce. The fifth prmeter we vry is the verge degree of G nd the results re shown in Figure 9 (e). The high degree in G mens more edges hve to e exmined when mtching pttern nd siclly the query time of these three methods grows t similr rte. Lst we vry the verge degree of vertex in q. The results re 9

8 () V (G) () V (q) sugrph indexing, i.e., finding the occurrences of query grph in lrge dtse grph with (possile) missing edges. In this pper, we hve proposed sugrph indexing nd mtching method (SAP- PER) to find ll pproximte mtches of query grph. SAPPER constructs the HNU index to ccelerte query processing. During the query time, SAPPER improves mtching efficiency y using pre-generted rndom spnning trees nd lexicogrphicl query grph enumertion order. To the est of our knowledge, this is the first ttempt to find the complete set of pproximte mtches in single lrge grph. With lrge set of rel nd synthetic dt, we demonstrte tht the SAPPER pproch cn outperform the lterntive methods in ccurcy while chieve good efficiency. (c) Numer of Lels (e) Averge Degree of G (d) Different vlues of θ (f) Averge Degree of q Figure 9: Query Time on Different Prmeters shown in Figure 9 (f). It is ovious tht the higher verge degree of q is, the more informtion tht q possesses for pruning vertices in G. However, high vertex degree will lso generte more potentil cndidte query grphs since the numer of cndidte query grphs is exponentil to the verge degree of q. When the verge degree of q is 2, there re few edges to e exmined nd ll lgorithms re efficient. When the verge vertex degree of q is lrger thn 6, the numer of edges tht need to e compred grows exponentilly, which results in GADDI s long response time. The min difference etween TALE nd SAPPER is the ccurcy. TALE is heuristic method which does not find ll pproximte mtches of pttern while SAPPER is n exct method to find the complete set of the pproximte mtches. Thus, if the gol is to tke quick look of the pproximte mtches of ny query grph in the dtse, TALE is n efficient nd convenient tool. On the other hnd, SAPPER is etter choice if the complete set of pproximte mtches needs to e retrieved. The min difference etween GADDI nd SAPPER is the efficiency. Although GADDI cn find ll pproximte mtches y enumerting ll pproximte isomorphic grphs of the query grph, this is very time consuming process. The performnce of BSAPPER is etween GADDI nd SAPPER since it utilizes the loom filter to mtch vertices nd the sugrph property to prune query grphs without the help of the rndom spnning trees nd lexicogrphicl order. Therefore, when the gol is to discover ll pproximte mtches, SAPPER is preferred. 7. CONCLUSION Due to the existence of noises (e.g., missing edges) in the lrge dtse grph, we re investigting the prolem of pproximte 8. REFERENCES [] D. J. Aldous, The rndom wlk construction of uniform spnning trees nd uniform lelled trees, SIAM J. Discrete Mth, 990. [2] R. Agrwl nd R. Sriknt, Fst lgorithms for mining ssocition rules, Prof. of VLDB, 994. [] B. H. Bloom, Spce/time trde-offs in hsh coding with llowle errors, Communictions of the ACM (7), 970. [4] J. Cheng, Y. Ke, W. Ng nd A. Lu, FG-Index: towrds verifiction-free query processing on grph dtses. Proc. of SIGMOD, [] B. Chzelle, J. Kilin, R. Ruinfeld nd A. Tl, The loomier filter: n efficient dt structure for sttic support lookup tles, Proc. of th Annul ACM-SIAM Symposium on Discrete Algorithms, [6] L. Cordell, P. Foggi, C. Snsone nd M. Vento, A (su)grph isomorphism lgorithm for mtching lrge grphs. PAMI, [7] B. Dost, T. Shlomi, N. Gupt, E. Ruppin, V. Bfn nd R. Shrn, QNet: tool for querying protein interction networks, Proc. of RECOMB, [8] T. Nguyen, H. Nguyen, N. Phm, J. AI-Kofhi nd T. Nguyen, Grph-sed mining of multiple oject usge ptterns, Proc. of the Joint Meeting of ESEC nd ACM SIGSOFT, [9] R. Giugno nd D. Shsh, GrphGrep: A fst nd universl method for querying grphs. Proc. of ICPR, [0] J. Hn, J. Pei nd Y. Yin, Mining frequent ptterns without cndidte genertion, Proc. of SIGMOD, [] H. He nd A. K. Singh, Closure-Tree: n index structure for grph queries. Proc. of ICDE, [2] H. Jing, H. Wng, P. Yu nd S. Zhou, GString: A novel pproch for efficient serch in grph dtses. Proc. of ICDE, [] M. Knehis nd S. Goto, KEGG: Kyoto encyclopedi of genes nd genomes, Nuc. Ac. Res, 2000, 28:27-0 [4] M. Koyuturk, A. Grm nd W. Szpnkowski, Pirwise locl lignment of protein interction networks guided y models of evolution. Proc. of RECOMB, 200. [] F. Mndreoli, R. Mrtogli, G. Villni nd W. Penzo, Flexile query nswering on grph-modeled dt. Proc. of EDBT, [6] M. Mongiovi, R. Ntle, R. Giugno, A, Pulvirenti, nd A. Ferro. A set-cover-sed pproch for inexct grph mtching. Proc. of CSB, [7] R. Pinter, O. Rokhlenko, E. Yeger-Lotem nd M. Ziv-Ukelson, Alignment of metolic pthwys, Bioinformtics, 200. [8] H. Shng, Y. Zhng, X. Lin, nd J. Yu, Tming verifiction hrdness: n efficient lgorithm for testing sugrph isomorphism. PVLDB, [9] Y. Tin nd J. Ptel, TALE: tool for pproximte lrge grph mtching, Proc. of ICDE, [20] J. Ullmnn, An lgorithm for sugrph isomorphism. J. ACM, 976. [2] X. Wng, A. Smlter, J. Hun, nd G. Lushington, G-Hsh: towrds fst kernel-sed similrity serch in lrge grph dtses, Proc. of EDBT, [22] X. Yn, P. Yu nd J. Hn, Grph indexing, frequent structure-sed pproch. Proc. of SIGMOD, [2] S. Zhng, M. Hu, nd J. Yng, Treepi: novel grph indexing method. Proc. of ICDE, [24] S. Zhng, S. Li, nd J. Yng, Gddi: distnce index sed sugrph mtching in iologicl networks. Proc. of EDBT, [2] Gene Ontology. 92

9 APPENDIX A. FORMAL ALGORITHM DESCRIPTION Algorithm Generting Rndom Spnning Tree Input: grph q. Output: Rndom Spnning Tree t of q. : Construct trnsition mtrix P from q. 2: Vertex set S, edge list E. : rndomly select vertex X 0 of q. 4: S S + X 0. : v X 0. 6: while S< V (q) do 7: rndomly select vertex w y P, e vw exists. 8: if!w S then 9: E E + e vw 0: S S + w : end if 2: v w : end while 4: Output the grph composed of edge list E. Algorithm 4 Algorithm SAPPER Input: dtse grph G, query grph q, threshold θ. Output: pproximte mtches of q. : Sort q s edges decresingly y their numer of mtches in G, l E(q) 2: edge list EL e,..., e l,( i, e i q). : s e,..s l θ 4: while s end do : if The grph corresponding to the longest prefix of s is not mtched yet then 6: Find nd output the exct mtches of g(s) with the help of mtches of the spnning trees if it contins ny 7: else 8: Find nd output the exct mtches of g(s) ccording to the mtches of the grph corresponding to the longest prefix of s 9: end if 0: if g(s) hs no mtch then : s LEXI JUMP(s, EL, θ) 2: else : s LEXI Next(s, EL, θ) 4: end if : end while Algorithm 2 LEXI Next Input: sequence s, edge list EL = {e,..., e l }, threshold θ. Output: the next sequence of s. : L Length(s ) 2: if s (L) <e l then : e x s (L) 4: return Sequence s (),..., s (L)e x+ : end if 6: LEXI JUMP(s,EL, θ) Algorithm LEXI Jump Input: sequence s, edge list e,..., e l, threshold θ. Output: the next sequence of s which is not super-sequence of s. : if i, s.t. s (i) <e l (L i) then 2: x MAX{i : s (i) <e l (L i) } : e t s (x) 4: if x l θ then : return Sequence s (),..s (x )e t+ 6: end if 7: return Sequence s (),..s (x )e t+ e t+2...e t+l θ x 8: end if 9: return end B. PROOF OF CORRECTNESS OF SAPPER The proof of the correctness of SAPPER is divided into two prts. First, we prove tht given query grph q, dtse grph G, nd n pproximtion threshold θ, for every connected grph s where Dist e(s, q) θ nd there exists t lest one mtch of s in G, SAPPER will enumerte s (descried in Section.). Second, we wnt to prove tht if s is enumerted in Section., ll of its mtches in G will e discovered. Lemm : SAPPER enumertes every cndidte grph s of query grph q such tht D edit (s, q) θ nd s hs t lest one exct mtch in G. Proof: The lexicogrphicl order enumertes every grph s such tht D edit (s,q) θ in depth first style. When we find tht such grph (denoted s s ) does not hve ny exct mtch in the dtse grph, we perform jump procedure. The grphs we skip re ll supergrphs of s, which cnnot hve ny exct mtch in the dtse grph, nd hence re not cndidte grphs. Therefore, we enumerte ll cndidte grphs s of query grph q such tht D edit (s, q) θ nd s hs t lest one exct mtch in G. Lemm 2: SAPPER finds ll exct mtches of ny cndidte grph s. Proof: For cndidte grph s, if we hve not yet serched for ny prefix of s nd s does not contin ny pre-generted rndom spnning trees, then we would perform depth first mtching for s, which will not miss ny exct mtch of s. Otherwise, we strt the serch from either the mtches of the prefix cndidte grph of s or the intersection of mtches of the pre-generted rndom spnning trees contined y s. Either the prefix cndidte grph of s or pre-generted rndom spnning tree contined y s is sugrph of s. Since ny exct mtch of s must contin t lest one exct mch of ny sugrph of s sed on Property, we will not miss ny exct mtch of s in this scenrio either. Therefore, SAPPER cn find ll exct mtches of ny cndidte grph s. Theorem : SAPPER finds ll pproximte mtches of query grph q. Proof: From Lemm, we prove tht SAPPER cn enumerte ll cndidte grphs of the query grph. From Lemm 2, we prove tht for ny cndidte grph s, SAPPER finds ll mtches of s. By the definition of pproximte mtches, SAPPER cn find ll pproximte mtches of q. C. EDGE ADDITIONS/DELETIONS AND DISCONNECTED MATCHES In this pper, we focus on pproximte mtches with the following two restrictions: () the mtch hs to e connected nd (2) only edge dditions ut not edge deletions re considered. The rtionle ehind these two restrictions re the following. If unconnected mtches re considered, there could e too mny of these mtches. Moreover, these unconnected mtches my not e useful in mny pplictions. Thus, in this pper, we focus on finding connected mtches. Edge deletions could e s importnt s edge dditions. In most cses, mtch with edge deletions is super-grph of some other pproximte mtches. For instnce, if g 2 cn e otined y deleting some edge from g, then g hs to contin g 2. For n pproximte mtch g 2, if the edit distnce etween g 2 nd the query grph q is less thn θ, then y dding different edges to g 2, (potentil) lrge numer of mtches will e discovered nd ll these mtches 9

COMP 423 lecture 11 Jan. 28, 2008

COMP 423 lecture 11 Jan. 28, 2008 COMP 423 lecture 11 Jn. 28, 2008 Up to now, we hve looked t how some symols in n lphet occur more frequently thn others nd how we cn sve its y using code such tht the codewords for more frequently occuring

More information

What are suffix trees?

What are suffix trees? Suffix Trees 1 Wht re suffix trees? Allow lgorithm designers to store very lrge mount of informtion out strings while still keeping within liner spce Allow users to serch for new strings in the originl

More information

COMBINATORIAL PATTERN MATCHING

COMBINATORIAL PATTERN MATCHING COMBINATORIAL PATTERN MATCHING Genomic Repets Exmple of repets: ATGGTCTAGGTCCTAGTGGTC Motivtion to find them: Genomic rerrngements re often ssocited with repets Trce evolutionry secrets Mny tumors re chrcterized

More information

Algorithm Design (5) Text Search

Algorithm Design (5) Text Search Algorithm Design (5) Text Serch Tkshi Chikym School of Engineering The University of Tokyo Text Serch Find sustring tht mtches the given key string in text dt of lrge mount Key string: chr x[m] Text Dt:

More information

2 Computing all Intersections of a Set of Segments Line Segment Intersection

2 Computing all Intersections of a Set of Segments Line Segment Intersection 15-451/651: Design & Anlysis of Algorithms Novemer 14, 2016 Lecture #21 Sweep-Line nd Segment Intersection lst chnged: Novemer 8, 2017 1 Preliminries The sweep-line prdigm is very powerful lgorithmic design

More information

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming Lecture 10 Evolutionry Computtion: Evolution strtegies nd genetic progrmming Evolution strtegies Genetic progrmming Summry Negnevitsky, Person Eduction, 2011 1 Evolution Strtegies Another pproch to simulting

More information

CS481: Bioinformatics Algorithms

CS481: Bioinformatics Algorithms CS481: Bioinformtics Algorithms Cn Alkn EA509 clkn@cs.ilkent.edu.tr http://www.cs.ilkent.edu.tr/~clkn/teching/cs481/ EXACT STRING MATCHING Fingerprint ide Assume: We cn compute fingerprint f(p) of P in

More information

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries Tries Yufei To KAIST April 9, 2013 Y. To, April 9, 2013 Tries In this lecture, we will discuss the following exct mtching prolem on strings. Prolem Let S e set of strings, ech of which hs unique integer

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Dt Mining y I. H. Witten nd E. Frnk Simplicity first Simple lgorithms often work very well! There re mny kinds of simple structure, eg: One ttriute does ll the work All ttriutes contriute eqully

More information

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5 CS321 Lnguges nd Compiler Design I Winter 2012 Lecture 5 1 FINITE AUTOMATA A non-deterministic finite utomton (NFA) consists of: An input lphet Σ, e.g. Σ =,. A set of sttes S, e.g. S = {1, 3, 5, 7, 11,

More information

Fig.25: the Role of LEX

Fig.25: the Role of LEX The Lnguge for Specifying Lexicl Anlyzer We shll now study how to uild lexicl nlyzer from specifiction of tokens in the form of list of regulr expressions The discussion centers round the design of n existing

More information

Alignment of Long Sequences. BMI/CS Spring 2012 Colin Dewey

Alignment of Long Sequences. BMI/CS Spring 2012 Colin Dewey Alignment of Long Sequences BMI/CS 776 www.biostt.wisc.edu/bmi776/ Spring 2012 Colin Dewey cdewey@biostt.wisc.edu Gols for Lecture the key concepts to understnd re the following how lrge-scle lignment

More information

Information Retrieval and Organisation

Information Retrieval and Organisation Informtion Retrievl nd Orgnistion Suffix Trees dpted from http://www.mth.tu.c.il/~himk/seminr02/suffixtrees.ppt Dell Zhng Birkeck, University of London Trie A tree representing set of strings { } eef d

More information

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

In the last lecture, we discussed how valid tokens may be specified by regular expressions. LECTURE 5 Scnning SYNTAX ANALYSIS We know from our previous lectures tht the process of verifying the syntx of the progrm is performed in two stges: Scnning: Identifying nd verifying tokens in progrm.

More information

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs. Lecture 5 Wlks, Trils, Pths nd Connectedness Reding: Some of the mteril in this lecture comes from Section 1.2 of Dieter Jungnickel (2008), Grphs, Networks nd Algorithms, 3rd edition, which is ville online

More information

Complete Coverage Path Planning of Mobile Robot Based on Dynamic Programming Algorithm Peng Zhou, Zhong-min Wang, Zhen-nan Li, Yang Li

Complete Coverage Path Planning of Mobile Robot Based on Dynamic Programming Algorithm Peng Zhou, Zhong-min Wang, Zhen-nan Li, Yang Li 2nd Interntionl Conference on Electronic & Mechnicl Engineering nd Informtion Technology (EMEIT-212) Complete Coverge Pth Plnning of Mobile Robot Bsed on Dynmic Progrmming Algorithm Peng Zhou, Zhong-min

More information

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1):

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1): Overview (): Before We Begin Administrtive detils Review some questions to consider Winter 2006 Imge Enhncement in the Sptil Domin: Bsics of Sptil Filtering, Smoothing Sptil Filters, Order Sttistics Filters

More information

Intermediate Information Structures

Intermediate Information Structures CPSC 335 Intermedite Informtion Structures LECTURE 13 Suffix Trees Jon Rokne Computer Science University of Clgry Cnd Modified from CMSC 423 - Todd Trengen UMD upd Preprocessing Strings We will look t

More information

9 Graph Cutting Procedures

9 Graph Cutting Procedures 9 Grph Cutting Procedures Lst clss we begn looking t how to embed rbitrry metrics into distributions of trees, nd proved the following theorem due to Brtl (1996): Theorem 9.1 (Brtl (1996)) Given metric

More information

Presentation Martin Randers

Presentation Martin Randers Presenttion Mrtin Rnders Outline Introduction Algorithms Implementtion nd experiments Memory consumption Summry Introduction Introduction Evolution of species cn e modelled in trees Trees consist of nodes

More information

Outline. Introduction Suffix Trees (ST) Building STs in linear time: Ukkonen s algorithm Applications of ST

Outline. Introduction Suffix Trees (ST) Building STs in linear time: Ukkonen s algorithm Applications of ST Suffi Trees Outline Introduction Suffi Trees (ST) Building STs in liner time: Ukkonen s lgorithm Applictions of ST 2 3 Introduction Sustrings String is ny sequence of chrcters. Sustring of string S is

More information

Efficient K-NN Search in Polyphonic Music Databases Using a Lower Bounding Mechanism

Efficient K-NN Search in Polyphonic Music Databases Using a Lower Bounding Mechanism Efficient K-NN Serch in Polyphonic Music Dtses Using Lower Bounding Mechnism Ning-Hn Liu Deprtment of Computer Science Ntionl Tsing Hu University Hsinchu,Tiwn 300, R.O.C 886-3-575679 nhliou@yhoo.com.tw

More information

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have Rndom Numers nd Monte Crlo Methods Rndom Numer Methods The integrtion methods discussed so fr ll re sed upon mking polynomil pproximtions to the integrnd. Another clss of numericl methods relies upon using

More information

Suffix trees, suffix arrays, BWT

Suffix trees, suffix arrays, BWT ALGORITHMES POUR LA BIO-INFORMATIQUE ET LA VISUALISATION COURS 3 Rluc Uricru Suffix trees, suffix rrys, BWT Bsed on: Suffix trees nd suffix rrys presenttion y Him Kpln Suffix trees course y Pco Gomez Liner-Time

More information

Ma/CS 6b Class 1: Graph Recap

Ma/CS 6b Class 1: Graph Recap M/CS 6 Clss 1: Grph Recp By Adm Sheffer Course Detils Adm Sheffer. Office hour: Tuesdys 4pm. dmsh@cltech.edu TA: Victor Kstkin. Office hour: Tuesdys 7pm. 1:00 Mondy, Wednesdy, nd Fridy. http://www.mth.cltech.edu/~2014-15/2term/m006/

More information

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016 Solving Prolems y Serching CS 486/686: Introduction to Artificil Intelligence Winter 2016 1 Introduction Serch ws one of the first topics studied in AI - Newell nd Simon (1961) Generl Prolem Solver Centrl

More information

Notes for Graph Theory

Notes for Graph Theory Notes for Grph Theory These re notes I wrote up for my grph theory clss in 06. They contin most of the topics typiclly found in grph theory course. There re proofs of lot of the results, ut not of everything.

More information

Graphs with at most two trees in a forest building process

Graphs with at most two trees in a forest building process Grphs with t most two trees in forest uilding process rxiv:802.0533v [mth.co] 4 Fe 208 Steve Butler Mis Hmnk Mrie Hrdt Astrct Given grph, we cn form spnning forest y first sorting the edges in some order,

More information

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis CS143 Hndout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexicl Anlysis In this first written ssignment, you'll get the chnce to ply round with the vrious constructions tht come up when doing lexicl

More information

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards A Tutology Checker loosely relted to Stålmrck s Algorithm y Mrtin Richrds mr@cl.cm.c.uk http://www.cl.cm.c.uk/users/mr/ University Computer Lortory New Museum Site Pemroke Street Cmridge, CB2 3QG Mrtin

More information

Answering Label-Constraint Reachability in Large Graphs

Answering Label-Constraint Reachability in Large Graphs Answering Lel-Constrint Rechility in Lrge Grphs Kun Xu Peking University Beijing, Chin xukun@icst.pku.edu.cn Lei Chen Hong Kong Univ. of Sci. & Tech. Hong Kong, Chin leichen@cse.ust.hk Lei Zou Peking University

More information

Today. CS 188: Artificial Intelligence Fall Recap: Search. Example: Pancake Problem. Example: Pancake Problem. General Tree Search.

Today. CS 188: Artificial Intelligence Fall Recap: Search. Example: Pancake Problem. Example: Pancake Problem. General Tree Search. CS 88: Artificil Intelligence Fll 00 Lecture : A* Serch 9//00 A* Serch rph Serch Tody Heuristic Design Dn Klein UC Berkeley Multiple slides from Sturt Russell or Andrew Moore Recp: Serch Exmple: Pncke

More information

A dual of the rectangle-segmentation problem for binary matrices

A dual of the rectangle-segmentation problem for binary matrices A dul of the rectngle-segmenttion prolem for inry mtrices Thoms Klinowski Astrct We consider the prolem to decompose inry mtrix into smll numer of inry mtrices whose -entries form rectngle. We show tht

More information

Ma/CS 6b Class 1: Graph Recap

Ma/CS 6b Class 1: Graph Recap M/CS 6 Clss 1: Grph Recp By Adm Sheffer Course Detils Instructor: Adm Sheffer. TA: Cosmin Pohot. 1pm Mondys, Wednesdys, nd Fridys. http://mth.cltech.edu/~2015-16/2term/m006/ Min ook: Introduction to Grph

More information

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus Unit #9 : Definite Integrl Properties, Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl

More information

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. Example: Pancake Problem. Example: Pancake Problem

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. Example: Pancake Problem. Example: Pancake Problem Announcements Project : erch It s live! Due 9/. trt erly nd sk questions. It s longer thn most! Need prtner? Come up fter clss or try Pizz ections: cn go to ny, ut hve priority in your own C 88: Artificil

More information

Position Heaps: A Simple and Dynamic Text Indexing Data Structure

Position Heaps: A Simple and Dynamic Text Indexing Data Structure Position Heps: A Simple nd Dynmic Text Indexing Dt Structure Andrzej Ehrenfeucht, Ross M. McConnell, Niss Osheim, Sung-Whn Woo Dept. of Computer Science, 40 UCB, University of Colordo t Boulder, Boulder,

More information

Nearest Keyword Set Search in Multi-dimensional Datasets

Nearest Keyword Set Search in Multi-dimensional Datasets Nerest Keyword Set Serch in Multi-dimensionl Dtsets Vishwkrm Singh Deprtment of Computer Science University of Cliforni Snt Brbr, USA Emil: vsingh014@gmil.com Ambuj K. Singh Deprtment of Computer Science

More information

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona Implementing utomt Sc 5 ompilers nd Systems Softwre : Lexicl nlysis II Deprtment of omputer Science University of rizon collerg@gmil.com opyright c 009 hristin ollerg NFs nd DFs cn e hrd-coded using this

More information

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona CSc 453 Compilers nd Systems Softwre 4 : Lexicl Anlysis II Deprtment of Computer Science University of Arizon collerg@gmil.com Copyright c 2009 Christin Collerg Implementing Automt NFAs nd DFAs cn e hrd-coded

More information

MATH 25 CLASS 5 NOTES, SEP

MATH 25 CLASS 5 NOTES, SEP MATH 25 CLASS 5 NOTES, SEP 30 2011 Contents 1. A brief diversion: reltively prime numbers 1 2. Lest common multiples 3 3. Finding ll solutions to x + by = c 4 Quick links to definitions/theorems Euclid

More information

A New Learning Algorithm for the MAXQ Hierarchical Reinforcement Learning Method

A New Learning Algorithm for the MAXQ Hierarchical Reinforcement Learning Method A New Lerning Algorithm for the MAXQ Hierrchicl Reinforcement Lerning Method Frzneh Mirzzdeh 1, Bbk Behsz 2, nd Hmid Beigy 1 1 Deprtment of Computer Engineering, Shrif University of Technology, Tehrn,

More information

An Algorithm for Enumerating All Maximal Tree Patterns Without Duplication Using Succinct Data Structure

An Algorithm for Enumerating All Maximal Tree Patterns Without Duplication Using Succinct Data Structure , Mrch 12-14, 2014, Hong Kong An Algorithm for Enumerting All Mximl Tree Ptterns Without Dupliction Using Succinct Dt Structure Yuko ITOKAWA, Tomoyuki UCHIDA nd Motoki SANO Astrct In order to extrct structured

More information

The Greedy Method. The Greedy Method

The Greedy Method. The Greedy Method Lists nd Itertors /8/26 Presenttion for use with the textook, Algorithm Design nd Applictions, y M. T. Goodrich nd R. Tmssi, Wiley, 25 The Greedy Method The Greedy Method The greedy method is generl lgorithm

More information

Suffix Tries. Slides adapted from the course by Ben Langmead

Suffix Tries. Slides adapted from the course by Ben Langmead Suffix Tries Slides dpted from the course y Ben Lngmed en.lngmed@gmil.com Indexing with suffixes Until now, our indexes hve een sed on extrcting sustrings from T A very different pproch is to extrct suffixes

More information

Applied Databases. Sebastian Maneth. Lecture 13 Online Pattern Matching on Strings. University of Edinburgh - February 29th, 2016

Applied Databases. Sebastian Maneth. Lecture 13 Online Pattern Matching on Strings. University of Edinburgh - February 29th, 2016 Applied Dtses Lecture 13 Online Pttern Mtching on Strings Sestin Mneth University of Edinurgh - Ferury 29th, 2016 2 Outline 1. Nive Method 2. Automton Method 3. Knuth-Morris-Prtt Algorithm 4. Boyer-Moore

More information

The dictionary model allows several consecutive symbols, called phrases

The dictionary model allows several consecutive symbols, called phrases A dptive Huffmn nd rithmetic methods re universl in the sense tht the encoder cn dpt to the sttistics of the source. But, dpttion is computtionlly expensive, prticulrly when k-th order Mrkov pproximtion

More information

MTH 146 Conics Supplement

MTH 146 Conics Supplement 105- Review of Conics MTH 146 Conics Supplement In this section we review conics If ou ne more detils thn re present in the notes, r through section 105 of the ook Definition: A prol is the set of points

More information

CS311H: Discrete Mathematics. Graph Theory IV. A Non-planar Graph. Regions of a Planar Graph. Euler s Formula. Instructor: Işıl Dillig

CS311H: Discrete Mathematics. Graph Theory IV. A Non-planar Graph. Regions of a Planar Graph. Euler s Formula. Instructor: Işıl Dillig CS311H: Discrete Mthemtics Grph Theory IV Instructor: Işıl Dillig Instructor: Işıl Dillig, CS311H: Discrete Mthemtics Grph Theory IV 1/25 A Non-plnr Grph Regions of Plnr Grph The plnr representtion of

More information

Systems I. Logic Design I. Topics Digital logic Logic gates Simple combinational logic circuits

Systems I. Logic Design I. Topics Digital logic Logic gates Simple combinational logic circuits Systems I Logic Design I Topics Digitl logic Logic gtes Simple comintionl logic circuits Simple C sttement.. C = + ; Wht pieces of hrdwre do you think you might need? Storge - for vlues,, C Computtion

More information

Dr. D.M. Akbar Hussain

Dr. D.M. Akbar Hussain Dr. D.M. Akr Hussin Lexicl Anlysis. Bsic Ide: Red the source code nd generte tokens, it is similr wht humns will do to red in; just tking on the input nd reking it down in pieces. Ech token is sequence

More information

AI Adjacent Fields. This slide deck courtesy of Dan Klein at UC Berkeley

AI Adjacent Fields. This slide deck courtesy of Dan Klein at UC Berkeley AI Adjcent Fields Philosophy: Logic, methods of resoning Mind s physicl system Foundtions of lerning, lnguge, rtionlity Mthemtics Forml representtion nd proof Algorithms, computtion, (un)decidility, (in)trctility

More information

Topological Queries on Graph-structured XML Data: Models and Implementations

Topological Queries on Graph-structured XML Data: Models and Implementations Topologicl Queries on Grph-structured XML Dt: Models nd Implementtions Hongzhi Wng, Jinzhong Li, nd Jizhou Luo Astrct In mny pplictions, dt is in grph structure, which cn e nturlly represented s grph-structured

More information

Section 10.4 Hyperbolas

Section 10.4 Hyperbolas 66 Section 10.4 Hyperbols Objective : Definition of hyperbol & hyperbols centered t (0, 0). The third type of conic we will study is the hyperbol. It is defined in the sme mnner tht we defined the prbol

More information

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining EECS150 - Digitl Design Lecture 23 - High-level Design nd Optimiztion 3, Prllelism nd Pipelining Nov 12, 2002 John Wwrzynek Fll 2002 EECS150 - Lec23-HL3 Pge 1 Prllelism Prllelism is the ct of doing more

More information

Definition of Regular Expression

Definition of Regular Expression Definition of Regulr Expression After the definition of the string nd lnguges, we re redy to descrie regulr expressions, the nottion we shll use to define the clss of lnguges known s regulr sets. Recll

More information

From Dependencies to Evaluation Strategies

From Dependencies to Evaluation Strategies From Dependencies to Evlution Strtegies Possile strtegies: 1 let the user define the evlution order 2 utomtic strtegy sed on the dependencies: use locl dependencies to determine which ttriutes to compute

More information

10.5 Graphing Quadratic Functions

10.5 Graphing Quadratic Functions 0.5 Grphing Qudrtic Functions Now tht we cn solve qudrtic equtions, we wnt to lern how to grph the function ssocited with the qudrtic eqution. We cll this the qudrtic function. Grphs of Qudrtic Functions

More information

documents 1. Introduction

documents 1. Introduction www.ijcsi.org 4 Efficient structurl similrity computtion etween XML documents Ali Aïtelhdj Computer Science Deprtment, Fculty of Electricl Engineering nd Computer Science Mouloud Mmmeri University of Tizi-Ouzou

More information

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Solving Prolems y Serching CS 486/686: Introduction to Artificil Intelligence 1 Introduction Serch ws one of the first topics studied in AI - Newell nd Simon (1961) Generl Prolem Solver Centrl component

More information

ISG: Itemset based Subgraph Mining

ISG: Itemset based Subgraph Mining ISG: Itemset bsed Subgrph Mining by Lini Thoms, Stynryn R Vlluri, Kmlkr Krlplem Report No: IIIT/TR/2009/179 Centre for Dt Engineering Interntionl Institute of Informtion Technology Hyderbd - 500 032, INDIA

More information

GENERATING ORTHOIMAGES FOR CLOSE-RANGE OBJECTS BY AUTOMATICALLY DETECTING BREAKLINES

GENERATING ORTHOIMAGES FOR CLOSE-RANGE OBJECTS BY AUTOMATICALLY DETECTING BREAKLINES GENEATING OTHOIMAGES FO CLOSE-ANGE OBJECTS BY AUTOMATICALLY DETECTING BEAKLINES Efstrtios Stylinidis 1, Lzros Sechidis 1, Petros Ptis 1, Spiros Sptls 2 Aristotle University of Thessloniki 1 Deprtment of

More information

Compression Outline :Algorithms in the Real World. Lempel-Ziv Algorithms. LZ77: Sliding Window Lempel-Ziv

Compression Outline :Algorithms in the Real World. Lempel-Ziv Algorithms. LZ77: Sliding Window Lempel-Ziv Compression Outline 15-853:Algorithms in the Rel World Dt Compression III Introduction: Lossy vs. Lossless, Benchmrks, Informtion Theory: Entropy, etc. Proility Coding: Huffmn + Arithmetic Coding Applictions

More information

2014 Haskell January Test Regular Expressions and Finite Automata

2014 Haskell January Test Regular Expressions and Finite Automata 0 Hskell Jnury Test Regulr Expressions nd Finite Automt This test comprises four prts nd the mximum mrk is 5. Prts I, II nd III re worth 3 of the 5 mrks vilble. The 0 Hskell Progrmming Prize will be wrded

More information

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. General Tree Search. Uniform Cost. Lecture 3: A* Search 9/4/2007

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. General Tree Search. Uniform Cost. Lecture 3: A* Search 9/4/2007 CS 88: Artificil Intelligence Fll 2007 Lecture : A* Serch 9/4/2007 Dn Klein UC Berkeley Mny slides over the course dpted from either Sturt Russell or Andrew Moore Announcements Sections: New section 06:

More information

12-B FRACTIONS AND DECIMALS

12-B FRACTIONS AND DECIMALS -B Frctions nd Decimls. () If ll four integers were negtive, their product would be positive, nd so could not equl one of them. If ll four integers were positive, their product would be much greter thn

More information

LR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table

LR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table TDDD55 Compilers nd Interpreters TDDB44 Compiler Construction LR Prsing, Prt 2 Constructing Prse Tles Prse tle construction Grmmr conflict hndling Ctegories of LR Grmmrs nd Prsers Peter Fritzson, Christoph

More information

Mobile IP route optimization method for a carrier-scale IP network

Mobile IP route optimization method for a carrier-scale IP network Moile IP route optimiztion method for crrier-scle IP network Tkeshi Ihr, Hiroyuki Ohnishi, nd Ysushi Tkgi NTT Network Service Systems Lortories 3-9-11 Midori-cho, Musshino-shi, Tokyo 180-8585, Jpn Phone:

More information

OUTPUT DELIVERY SYSTEM

OUTPUT DELIVERY SYSTEM Differences in ODS formtting for HTML with Proc Print nd Proc Report Lur L. M. Thornton, USDA-ARS, Animl Improvement Progrms Lortory, Beltsville, MD ABSTRACT While Proc Print is terrific tool for dt checking

More information

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe CSCI 0 fel Ferreir d Silv rfsilv@isi.edu Slides dpted from: Mrk edekopp nd Dvid Kempe LOG STUCTUED MEGE TEES Series Summtion eview Let n = + + + + k $ = #%& #. Wht is n? n = k+ - Wht is log () + log ()

More information

On String Matching in Chunked Texts

On String Matching in Chunked Texts On String Mtching in Chunked Texts Hnnu Peltol nd Jorm Trhio {hpeltol, trhio}@cs.hut.fi Deprtment of Computer Science nd Engineering Helsinki University of Technology P.O. Box 5400, FI-02015 HUT, Finlnd

More information

Inference of node replacement graph grammars

Inference of node replacement graph grammars Glley Proof 22/6/27; :6 File: id293.tex; BOKCTP/Hin p. Intelligent Dt Anlysis (27) 24 IOS Press Inference of node replcement grph grmmrs Jcek P. Kukluk, Lwrence B. Holder nd Dine J. Cook Deprtment of Computer

More information

An Efficient Divide and Conquer Algorithm for Exact Hazard Free Logic Minimization

An Efficient Divide and Conquer Algorithm for Exact Hazard Free Logic Minimization An Efficient Divide nd Conquer Algorithm for Exct Hzrd Free Logic Minimiztion J.W.J.M. Rutten, M.R.C.M. Berkelr, C.A.J. vn Eijk, M.A.J. Kolsteren Eindhoven University of Technology Informtion nd Communiction

More information

II. THE ALGORITHM. A. Depth Map Processing

II. THE ALGORITHM. A. Depth Map Processing Lerning Plnr Geometric Scene Context Using Stereo Vision Pul G. Bumstrck, Bryn D. Brudevold, nd Pul D. Reynolds {pbumstrck,brynb,pulr2}@stnford.edu CS229 Finl Project Report December 15, 2006 Abstrct A

More information

COMPUTER EDUCATION TECHNIQUES, INC. (MS_W2K3_SERVER ) SA:

COMPUTER EDUCATION TECHNIQUES, INC. (MS_W2K3_SERVER ) SA: In order to lern which questions hve een nswered correctly: 1. Print these pges. 2. Answer the questions. 3. Send this ssessment with the nswers vi:. FAX to (212) 967-3498. Or. Mil the nswers to the following

More information

Reducing a DFA to a Minimal DFA

Reducing a DFA to a Minimal DFA Lexicl Anlysis - Prt 4 Reducing DFA to Miniml DFA Input: DFA IN Assume DFA IN never gets stuck (dd ded stte if necessry) Output: DFA MIN An equivlent DFA with the minimum numer of sttes. Hrry H. Porter,

More information

UT1553B BCRT True Dual-port Memory Interface

UT1553B BCRT True Dual-port Memory Interface UTMC APPICATION NOTE UT553B BCRT True Dul-port Memory Interfce INTRODUCTION The UTMC UT553B BCRT is monolithic CMOS integrted circuit tht provides comprehensive MI-STD- 553B Bus Controller nd Remote Terminl

More information

Efficient Techniques for Tree Similarity Queries 1

Efficient Techniques for Tree Similarity Queries 1 Efficient Techniques for Tree Similrity Queries 1 Nikolus Augsten Dtbse Reserch Group Deprtment of Computer Sciences University of Slzburg, Austri July 6, 2017 Austrin Computer Science Dy 2017 / IMAGINE

More information

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011 CSCI 3130: Forml Lnguges nd utomt Theory Lecture 12 The Chinese University of Hong Kong, Fll 2011 ndrej Bogdnov In progrmming lnguges, uilding prse trees is significnt tsk ecuse prse trees tell us the

More information

Fig.1. Let a source of monochromatic light be incident on a slit of finite width a, as shown in Fig. 1.

Fig.1. Let a source of monochromatic light be incident on a slit of finite width a, as shown in Fig. 1. Answer on Question #5692, Physics, Optics Stte slient fetures of single slit Frunhofer diffrction pttern. The slit is verticl nd illuminted by point source. Also, obtin n expression for intensity distribution

More information

Video-rate Image Segmentation by means of Region Splitting and Merging

Video-rate Image Segmentation by means of Region Splitting and Merging Video-rte Imge Segmenttion y mens of Region Splitting nd Merging Knur Anej, Florence Lguzet, Lionel Lcssgne, Alin Merigot Institute for Fundmentl Electronics, University of Pris South Orsy, Frnce knur.nej@gmil.com,

More information

F. R. K. Chung y. University ofpennsylvania. Philadelphia, Pennsylvania R. L. Graham. AT&T Labs - Research. March 2,1997.

F. R. K. Chung y. University ofpennsylvania. Philadelphia, Pennsylvania R. L. Graham. AT&T Labs - Research. March 2,1997. Forced convex n-gons in the plne F. R. K. Chung y University ofpennsylvni Phildelphi, Pennsylvni 19104 R. L. Grhm AT&T Ls - Reserch Murry Hill, New Jersey 07974 Mrch 2,1997 Astrct In seminl pper from 1935,

More information

Representation of Numbers. Number Representation. Representation of Numbers. 32-bit Unsigned Integers 3/24/2014. Fixed point Integer Representation

Representation of Numbers. Number Representation. Representation of Numbers. 32-bit Unsigned Integers 3/24/2014. Fixed point Integer Representation Representtion of Numbers Number Representtion Computer represent ll numbers, other thn integers nd some frctions with imprecision. Numbers re stored in some pproximtion which cn be represented by fixed

More information

Lily Yen and Mogens Hansen

Lily Yen and Mogens Hansen SKOLID / SKOLID No. 8 Lily Yen nd Mogens Hnsen Skolid hs joined Mthemticl Myhem which is eing reformtted s stnd-lone mthemtics journl for high school students. Solutions to prolems tht ppered in the lst

More information

LECT-10, S-1 FP2P08, Javed I.

LECT-10, S-1 FP2P08, Javed I. A Course on Foundtions of Peer-to-Peer Systems & Applictions LECT-10, S-1 CS /799 Foundtion of Peer-to-Peer Applictions & Systems Kent Stte University Dept. of Computer Science www.cs.kent.edu/~jved/clss-p2p08

More information

Premaster Course Algorithms 1 Chapter 6: Shortest Paths. Christian Scheideler SS 2018

Premaster Course Algorithms 1 Chapter 6: Shortest Paths. Christian Scheideler SS 2018 Premster Course Algorithms Chpter 6: Shortest Pths Christin Scheieler SS 8 Bsic Grph Algorithms Overview: Shortest pths in DAGs Dijkstr s lgorithm Bellmn-For lgorithm Johnson s metho SS 8 Chpter 6 Shortest

More information

A Heuristic Approach for Discovering Reference Models by Mining Process Model Variants

A Heuristic Approach for Discovering Reference Models by Mining Process Model Variants A Heuristic Approch for Discovering Reference Models by Mining Process Model Vrints Chen Li 1, Mnfred Reichert 2, nd Andres Wombcher 3 1 Informtion System Group, University of Twente, The Netherlnds lic@cs.utwente.nl

More information

Qubit allocation for quantum circuit compilers

Qubit allocation for quantum circuit compilers Quit lloction for quntum circuit compilers Nov. 10, 2017 JIQ 2017 Mrcos Yukio Sirichi Sylvin Collnge Vinícius Fernndes dos Sntos Fernndo Mgno Quintão Pereir Compilers for quntum computing The first genertion

More information

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy Recognition of Tokens if expressions nd reltionl opertors if è if then è then else è else relop

More information

ZZ - Advanced Math Review 2017

ZZ - Advanced Math Review 2017 ZZ - Advnced Mth Review Mtrix Multipliction Given! nd! find the sum of the elements of the product BA First, rewrite the mtrices in the correct order to multiply The product is BA hs order x since B is

More information

A Sparse Grid Representation for Dynamic Three-Dimensional Worlds

A Sparse Grid Representation for Dynamic Three-Dimensional Worlds A Sprse Grid Representtion for Dynmic Three-Dimensionl Worlds Nthn R. Sturtevnt Deprtment of Computer Science University of Denver Denver, CO, 80208 sturtevnt@cs.du.edu Astrct Grid representtions offer

More information

1.5 Extrema and the Mean Value Theorem

1.5 Extrema and the Mean Value Theorem .5 Extrem nd the Men Vlue Theorem.5. Mximum nd Minimum Vlues Definition.5. (Glol Mximum). Let f : D! R e function with domin D. Then f hs n glol mximum vlue t point c, iff(c) f(x) for ll x D. The vlue

More information

Meaningful Change Detection in Structured Data.

Meaningful Change Detection in Structured Data. Meningful Chnge Detection in Structured Dt Sudrshn S. Chwthe Hector Grci-Molin Computer Science Deprtment, Stnford University, Stnford, Cliforni 94305 fchw,hectorg@cs.stnford.edu Astrct Detecting chnges

More information

On the Detection of Step Edges in Algorithms Based on Gradient Vector Analysis

On the Detection of Step Edges in Algorithms Based on Gradient Vector Analysis On the Detection of Step Edges in Algorithms Bsed on Grdient Vector Anlysis A. Lrr6, E. Montseny Computer Engineering Dept. Universitt Rovir i Virgili Crreter de Slou sin 43006 Trrgon, Spin Emil: lrre@etse.urv.es

More information

Midterm 2 Sample solution

Midterm 2 Sample solution Nme: Instructions Midterm 2 Smple solution CMSC 430 Introduction to Compilers Fll 2012 November 28, 2012 This exm contins 9 pges, including this one. Mke sure you hve ll the pges. Write your nme on the

More information

Fall 2018 Midterm 2 November 15, 2018

Fall 2018 Midterm 2 November 15, 2018 Nme: 15-112 Fll 2018 Midterm 2 November 15, 2018 Andrew ID: Recittion Section: ˆ You my not use ny books, notes, extr pper, or electronic devices during this exm. There should be nothing on your desk or

More information

Functor (1A) Young Won Lim 8/2/17

Functor (1A) Young Won Lim 8/2/17 Copyright (c) 2016-2017 Young W. Lim. Permission is grnted to copy, distribute nd/or modify this document under the terms of the GNU Free Documenttion License, Version 1.2 or ny lter version published

More information

Lexical Analysis: Constructing a Scanner from Regular Expressions

Lexical Analysis: Constructing a Scanner from Regular Expressions Lexicl Anlysis: Constructing Scnner from Regulr Expressions Gol Show how to construct FA to recognize ny RE This Lecture Convert RE to n nondeterministic finite utomton (NFA) Use Thompson s construction

More information

MA1008. Calculus and Linear Algebra for Engineers. Course Notes for Section B. Stephen Wills. Department of Mathematics. University College Cork

MA1008. Calculus and Linear Algebra for Engineers. Course Notes for Section B. Stephen Wills. Department of Mathematics. University College Cork MA1008 Clculus nd Liner Algebr for Engineers Course Notes for Section B Stephen Wills Deprtment of Mthemtics University College Cork s.wills@ucc.ie http://euclid.ucc.ie/pges/stff/wills/teching/m1008/ma1008.html

More information

PLWAP Sequential Mining: Open Source Code

PLWAP Sequential Mining: Open Source Code PL Sequentil Mining: Open Source Code C.I. Ezeife School of Computer Science University of Windsor Windsor, Ontrio N9B 3P4 cezeife@uwindsor.c Yi Lu Deprtment of Computer Science Wyne Stte University Detroit,

More information