Taming Subgraph Isomorphism for RDF Query Processing

Size: px
Start display at page:

Download "Taming Subgraph Isomorphism for RDF Query Processing"

Transcription

1 Tming Sugrph Isomorphism for RDF Query Processing Jinh Kim # jinh.kim@orcle.com Hyungyu Shin hgshin@dl.postech.c.kr Sungpck Hong # Hssn Chfi # {sungpck.hong, hssn.chfi}@orcle.com POSTECH, South Kore # Orcle Ls, USA Wook-Shin Hn wshn@postech.c.kr ABSTRACT RDF dt re used to model knowledge in vrious res such s life sciences, Semntic We, ioinformtics, nd socil grphs. The size of rel RDF dt reches illions of triples. This clls for frmework for efficiently processing RDF dt. The core function of processing RDF dt is sugrph pttern mtching. There hve een two completely different directions for supporting efficient sugrph pttern mtching. One direction is to develop specilized RDF query processing engines exploiting the properties of RDF dt for the lst decde, while the other direction is to develop efficient sugrph isomorphism lgorithms for generl, leled grphs for over 30 yers. Although oth directions hve similr gol (i.e., finding sugrphs in dt grphs for given query grph), they hve een independently reserched without cler reson. We rgue tht sugrph isomorphism lgorithm cn e esily modified to hndle the grph homomorphism, which is the RDF pttern mtching semntics, y just removing the injectivity constrint. In this pper, sed on the stte-of-the-rt sugrph isomorphism lgorithm, we propose n in-memory solution, Turo HOM ++, which is tmed for the RDF processing, nd we compre it with the representtive RDF processing engines for severl RDF enchmrks in server mchine where illions of triples cn e loded in memory. In order to speed up Turo HOM ++, we lso provide simple yet effective trnsformtion nd series of optimiztion techniques. Extensive experiments using severl RDF enchmrks show tht Turo HOM ++ consistently nd significntly outperforms the representtive RDF engines. Specificlly, Turo HOM ++ outperforms its competitors y up to five orders of mgnitude. 1. INTRODUCTION The Resource Description Frmework (RDF) is stndrd for representing knowledge on the we. It is primrily designed for uilding the Semntic we nd hs een widely dopted in dtse nd dt mining communities. RDF models fct s triple which consists of suject (S), predicte (P), nd n oject (O). Due to its simple structure, mny prctitioners mterilize their dt in corresponding uthor This work is licensed under the Cretive Commons Attriution- NonCommercil-NoDerivs 3.0 Unported License. To view copy of this license, visit Otin permission prior to ny use eyond those covered y the license. Contct copyright holder y emiling info@vld.org. Articles from this volume were invited to present their results t the 41st Interntionl Conference on Very Lrge Dt Bses, August 31st - Septemer 4th 2015, Kohl Cost, Hwii. Proceedings of the VLDB Endowment, Vol. 8, No. 11 Copyright 2015 VLDB Endowment /15/07. n RDF formt. For exmple, RDF dtsets re now pervsive in vrious res including life sciences, ioinformtics, nd socil networks. The size of rel RDF dt reches illions of triples. Such illion-scle RDF dt re fully loded in min memory of tody s server mchine (The cost of 1TB mchine is less thn $40,000). The SPARQL query lnguge is stndrd lnguge for querying RDF dt in declrtive fshion. Its core function is sugrph pttern mtching, which corresponds to finding ll grph homomorphisms in the dt grph for query grph [19]. In recent yers, there hve een significnt efforts to speed up the processing of SPARQL queries y developing novel RDF query processing engines. Mny engines [1, 18, 19, 25, 26, 29] model RDF dt s tulr structures nd process SPARQL queries using specilized join methods. For exmple, RDF-3X [19] trets RDF dt s n edge tle, EDGE(S,P,O), nd mterilizes six different orderings for this tle, so tht it cn support mny SPARQL queries just y using merge sed join. Note tht this pproch is efficient for oth disk-sed nd in-memory environments since merge join exploits only sequentil scns. Some engines [2, 30, 35] tret RDF dt s grphs (or mtrices) nd develop specilized grph processing methods for processing SPARQL queries. For exmple, gstore [35] uses specilized index structures to process SPARQL queries. Note tht these index structures re sed on gcode [34], which ws originlly proposed for grph indexing. Sugrph isomorphism, on the other hnd, hs een studied since the 1970s. The representtive lgorithms re VF2 [20], QuickSI [21], GrphQL [11], GADDI [32], SPATH [33], nd Turo ISO [9]. In order to speed up performnce, these lgorithms exploit good mtching orders nd effective pruning rules. A recent study [14] shows tht good sugrph isomorphism lgorithms significntly outperform grph indexing sed ones. However, ll of these lgorithms use only smll grphs in their experiments, nd thus, it still remins uncler whether these lgorithms cn show good performnce for illion-scle grphs such s RDF dt. Although sugrph isomorphism processing nd RDF query processing hve similr gols (i.e., finding sugrphs in dt grphs for given query grph), they hve two inexplicly different directions. A sugrph isomorphism lgorithm cn e esily modified to hndle the grph homomorphism, which is the RDF pttern mtching semntics, just y removing the injectivity constrint. In this pper, sed on the stte-of-the-rt sugrph isomorphism lgorithm [9], we propose n in-memory solution, Turo HOM ++, which is tmed for the RDF processing, nd we compre it with the representtive RDF processing engines for severl RDF enchmrks in server mchine where illions of triples cn e loded in memory. We elieve tht this pproch opens new direction for RDF processing so tht oth trditionl directions cn merge or enefit from ech other. 1238

2 By trnsforming RDF grphs into leled grphs, we cn pply sugrph homomorphism methods to RDF query processing. Extensive experiments using severl enchmrks show tht direct modifiction of Turo ISO outperforms the RDF processing engines for queries which require smll mount of grph explortion. However, for some queries which require lrge mount of grph explortion, the direct modifiction is slower thn some of its competitors. This poses n importnt reserch question: Is this phenomenon due to inherent limittions of the grph homomorphism (sugrph isomorphism) lgorithm? Our profile results show tht two mjor sutsks of Turo ISO 1) exploring cndidte sugrphs in ExploreCndidteRegion nd 2) enumerting solutions sed on cndidte regions in SugrphSerch require performnce improvement. Turo HOM ++ resolves such performnce hurdles y proposing the type-wre trnsformtion nd tilored optimiztion techniques. First, in order to speed up ExploreCndidteRegion, we propose novel trnsformtion (Section 4.1), clled type-wre trnsformtion, which is simple yet effective in processing SPARQL queries. In type-wre trnsformtion, y emedding the types of n entity (i.e., suject or oject) into vertex lel set, we cn eliminte corresponding query vertices/edges from query grph. With type-wre trnsformtion, the query grph size decreses, its topology ecomes simpler thn the originl query, nd thus, this trnsformtion improves performnce ccordingly y reducing the mount of grph explortion. In order to optimize performnce in depth, in oth Explore- CndidteRegion nd SugrphSerch, we propose series of optimiztion techniques (Section 4.3), ech of which contriutes to performnce improvement significntly for such slow queries. In ddition, we explin how Turo HOM ++ is extended to support 1) generl SPARQL fetures such s OPTIONAL, nd FILTER, nd 2) prllel execution for Turo HOM ++ in non-uniform memory ccess (NUMA) rchitecture [15, 16]. These generl fetures re necessry to execute comprehensive enchmrks such s Berlin SPARQL enchmrk (BSBM) [3]. Note lso tht, when the RDF dt size grows lrge, we hve to rely on the NUMA rchitecture. Extensive experiments using severl representtive enchmrks show tht Turo HOM ++ consistently nd significntly outperforms ll its competitors for ll queries tested. Specificlly, our method outperforms the competitors y up to five orders of mgnitude with only single thred. This indictes tht sugrph isomorphism lgorithm tmed for RDF processing cn serve s n in-memory RDF ccelertor on top of commercil RDF engine for rel-time RDF query processing. Our contriutions re s follows. 1) We provide the first direct comprison etween RDF engines nd the stte-of-the-rt sugrph isomorphism method tmed for RDF processing, Turo HOM ++, through extensive experiments nd nlyze experimentl results in depth. 2) In order to simplify query grph, we propose novel trnsformtion method clled type-wre trnsformtion, which contriutes to oosting query performnce. 3) In order to speed up query performnce further, we propose series of performnce optimiztions s well s NUMA-wre prllelism for fst RDF query processing. 4) Extensive experiments using severl enchmrks show tht the optimized sugrph isomorphism method consistently nd significntly outperforms representtive RDF query processing engines. The rest of the pper is orgnized s follows. Section 2 descries the sugrph isomorphism, its stte-of-the-rt lgorithms, Turo ISO, nd their modifiction for the grph homomorphism. Section 3 presents how direct modifiction of Turo ISO, Turo HOM, hndles the SPARQL pttern mtching. Section 4 descries how we otin Turo HOM ++ from Turo HOM using the type-wre trnsformtion nd optimiztions for the efficient SPARQL pttern mtching. Section 5 reviews the relted work. Section 6 presents the experimentl result. Finlly, Section 7 presents our conclusion. Note tht due to the spce limit, plese refer [13] for how Turo HOM ++ hndle OPTIONAL, UNION, FILTER keywords, nd prllelize. 2. PRELIMINARY 2.1 Sugrph Isomorphism nd RDF Pttern Mtching Semntic Suppose tht leled grph is defined s g(v, E, L), where V is set of vertices, E( V V ) is set of edges, nd L is leling function which mps from vertex or n edge to the corresponding lel set or lel, respectively. Then, the sugrph isomorphism is defined s follows. Definition 1. [14] Given query grph q(v, E, L) nd dt grph g(v, E, L ), sugrph isomorphism is n injective function M : V V such tht 1) v V, L(v) L (M(v)) nd 2) (u, v) E, (M(u), M(v)) E nd L(u, v) = L (M(u), M(v)). If query vertex, u, hs lnk lel set (or does not specify vertex lel equivlently), it cn mtch ny dt vertex. Here, L(u) =, nd thus, the suset condition, L(u) L (M(u)), is lwys stisfied. Similrly, if query edge (u, v) hs lnk lel, it cn mtch ny dt edge y generlizing the equlity condition L(u, v) = L (M(u), M(v)) to L(u, v) L (M(u), M(v)). The grph homomorphism [6] is esily otined from the sugrph isomorphism y just removing the injective constrint on M in Definition 1. Even though the RDF pttern mtching semntics is sed on the grph homomorphism, to nswer SPARQL queries which hve vriles on predictes, mpping from query edge to n edge lel is lso required. We cll such grph homomorphism the e(xtended)-grph homomorphism nd present forml definition for it s follows. Definition 2. Given query grph q(v, E, L) nd dt grph g(v, E, L ), n e(xtended)-grph homomorphism is pir of two mpping functions, query vertex to dt vertex function M v : V V such tht 1) v V, L(v) L (M v(v)) nd 2) (u, v) E, (M v(u), M v(v)) E, nd L(u, v) = L (M v(u), M v(v)), nd query edge to edge lel function M e : V V L such tht (u, v) E, M e(u, v) = L (M v(u), M v(v)). The sugrph isomorphism prolem (resp. the e-grph homomorphism prolem) is to find ll distinct sugrph isomorphisms (resp. e-grph homomorphisms) of query grph in dt grph. Figure 1 shows query q 1 nd dt grph g 1. In q 1, _ mens lnk vertex lel set or lnk edge lel. In the sugrph isomorphism, there is only one solution M 1 = {(u 0, v 0), (u 1, v 1), (u 2, v 2), (u 3, v 3), (u 4, v 4)}. In the e-grph homomorphism, there re three solutions M 1 v = M 1, M 1 e = {((u 0, u 1), ), ((u 0, u 4), ), ((u 2, u 1), ), ((u 2, u 3), ), ((u 3, u 4), c)}, M 2 v = {(u 0, v 2), (u 1, v 3), (u 2, v 2), (u 3, v 3), (u 4, v 5)}, M 2 e = M 1 e, nd M 3 v = {(u 0, v 2), (u 1, v 1), (u 2, v 2), (u 3, v 3), (u 4, v 5)}, M 3 e = M 1 e. 2.2 Turo ISO In this susection, we introduce the stte-of-the rt sugrph isomorphism solution, Turo ISO [9], nd its modifiction for the e- grph homomorphism. Although we only descrie the modifiction of Turo ISO for the e-grph homomorphism, such modifiction is pplicle to other sugrph isomorphism lgorithms including 1239

3 u 0 {A} u 0 u 1 {A} _ {C} u 4 u 1 _ {A} u{a} 2 {B} {B} u 3 {C} u 4 () query grph q 1. u 2 u 3 v 0 v {A,D} {A,D} 0 {B} {B} v 1 v 1 {C} {C} c v 4 v 4 v 2 v 2 {A} {A} {B} c {B} c v v 3 3 () dt grph g 1. e {C,E} {C,E} v 5 Figure 1: Exmple of sugrph isomorphism nd e-grph homomorphism. VF2 [20], QuickSI [21], GrphQL [11], GADDI [32], nd SPATH [33], since ll of the sugrph lgorithms mentioned re instnces of generic sugrph isomorphism frmework [14]. Turo ISO presents n effective method for the notorious mtching order prolem from which ll the previous sugrph isomorphism lgorithms hve suffered [14]. Figure 2 illustrtes n exmple of the mtching order prolem, where q 2 is the query grph, nd g 2 is the dt grph 1. Note tht this exmple query results in no nswers. However, the time to finish this query cn differ drsticlly y how one chooses the mtching order, s it leds to different numer of comprisons. For instnce, mtching order < u 0, u 2, u 1, u 3 > requires comprisons while different mtching order < u 0, u 3, u 1, u 2 > requires only * 10 comprisons. X u 1 X u 1 u 0 A Y u 2 u 0 A Y u 2 Z u 3 Z u 3 () query grph q 2. v 1 v 1 v 11 X... X... X v 12 v12 v v10011 v v10012 v10016v X Y... Y Z... Y... Y Z... Z Z Figure 2: Exmple of showing the mtching query order query grph: grph: q q prolem. v11 v 0 v 0 A CR(v0) Z... Z X... X A 10Xs 10000Ys 5Zs 10Xs 10000Ys 5Zs () dt grph g 2. Turo ISO solves the mtching order prolem with cndidte region explortion, technique tht ccurtely estimtes the numer of cndidte vertices for given query pth [9]. In prticulr, Turo ISO first identifies cndidte dt sugrphs (i.e., cndidte regions) from the strting vertices (e.g. the shded re in Figure 2), then explores ech region y performing depth-first serch, which llows lmost exct selectivity for ech query pth. Algorithm 1 outlines the overll procedure of Turo ISO in detil. First, if query grph hs only one vertex u nd no edge, it is sufficient to retrieve ll dt vertices which hve u s lels (= V (g) L(u) ) nd to find sugrph isomorphism for ech of them (lines 2 4). Otherwise, it selects the strting query vertex from the query grph (line 6). Then, it trnsforms the query grph into its corresponding query tree (line 7). After getting the query tree, for ech dt vertex tht contins the vertex lel of the strting query vertex, the cndidte region is otined y exploring the dt grph v For simplicity, we omit the edge lels nd llow only one vertex lel in the dt grph. (lines 9). If the cndidte region is not empty, its mtching order is determined (line 11). The dt vertex, v s, is mpped to the first query vertex u s y ssigning M(u s) = v s nd F (v s) = true where F : V oolen is function which checks whether dt vertex is mpped or not (line 12). Then, the remining sugrph mtching is conducted (line 13). Lstly, the mpping (u s, v s) is restored y removing the mpping for u s nd ssigning F (v s) = flse (line 14). Algorithm 1 Turo ISO (g(v, E, L), q(v, E, L )) Require: q: query grph, g: dt grph Ensure: ll sugrph isomorphisms from q to g. 1: if V (q) = {u} nd E = φ then 2: for ech v V (g) L(u) do 3: report M = {(u, v)} 4: end for 5: else 6: u s ChooseStrtQueryV ertex(q, g) 7: q W ritequeryt ree(q, u s) 8: for ech v s {v v V, L(u s) L(v)} do 9: CR ExploreCndidteRegion(u s, v s) 10: if CR is not empty then 11: order DetermineMtchingOrder(q, CR) 12: UpdteStte(M, F, u s, v s) 13: SugrphSerch(q, q, g, CR, order, 1) 14: RestoreStte(M, F, u s, v s) 15: end if 16: end for 17: end if ChooseStrtQueryVertex. ChooseStrtQueryV ertex tries to pick the strting query vertex which hs the lest numer of cndidte regions. First, s rough estimtion, the query vertices re rnked y their scores. The score of query vertex u is rnk(u) = freq(g,l(u)), where freq(g, L(u)) is the numer of deg(u) dt vertices tht hve u s vertex lels. The score function prefers lower frequencies nd higher degrees. After otining the topk lest-scored query vertices, the numer of cndidte regions is more ccurtely estimted for ech of them y using the degree filter nd the neighorhood lel frequency (NLF) filter. The degree filter qulifies the dt vertices which hve equl or higher degree thn their corresponding query vertices. The NLF filter qulifies the dt vertices which hve equl or lrger numer of neighors for ll distinct lels of the query vertex. In Figure 2, for exmple, u 0 ecomes the strting query vertex since it hs the lest numer of cndidte regions (= 1). WriteQueryTree. Next, W ritequeryt ree trnsforms the query grph to the query tree. From the strting query vertex otined y ChooseStrtQueryV ertex, reth-first tree trversl is conducted. Every non-tree edge (u, v) of the query grph lso is recorded in the corresponding query tree. For exmple, when u 0 is the strting query vertex, the non-tree edges of q 2 s query tree re (u 1, u 2),(u 1, u 3), nd (u 2, u 3). ExploreCndidteRegion. Using the query tree nd the strting query vertex, ExploreCndidteRegion collects the cndidte regions. A cndidte region is otined y exploring the dt grph from the strting query vertex in depth-first mnner following the topology of the query tree. During the explortion, the injectivity constrint should e enforced. The shded re of Figure 2 is the cndidte region CR(v 0) sed on q 2 s query tree. Note tht the cndidte region expnsion is conducted only fter the current dt vertex stisfies the constrints of the degree filter nd the NLF filter. 1240

4 DetermineMtchingOrder. After otining the cndidte regions for strting dt vertex, the mtching order is determined for ech cndidte region. Using the cndidte region, Determine- MtchingOrder cn ccurtely estimte the numer of cndidte vertices for ech query pth. Then, it orders ll query pths in the query tree y the numer of cndidte vertices. For exmple, from CR(v 0), the ordered list of query pths is [u 0.u 3, u 0.u 1, u 0.u 2]. Thus, we cn esily see tht < u 0, u 3, u 1, u 2 > is the est mtching order sed on this ordered list. SugrphSerch. Exploiting the dt structures otined from the previous steps, SugrphSerch (Algorithm 2) enumertes ll distinct sugrph isomorphisms. It first determines the current query vertex u from given mtching order order (line 1). Then, it otins set of dt vertices, C R from cndidte region CR (line 2). CR(u, v) represents the cndidte vertices of query vertex u which re the children of v in CR, nd P (q, u) is the prent of u in query tree q. For ech cndidte dt vertex v, if v hs lredy een mpped, the current solution is rejected since it violtes the injectivity constrint of the sugrph isomorphism (lines 4 6). Next, y clling IsJoinle, if the query vertex u of the current dt vertex v hs non-tree edges, the existence of the corresponding edges re checked in the dt grph (line 7). For exmple, given CR(v 0) nd the mtching order < u 0, u 3, u 1, u 2 >, when mking the emedding for u 1, we must check whether there is n edge from M(u 1) to M(u 3). If the IsJoinle test is pssed, the mpping informtion is updted y ssigning M(u) = v nd F (v) = true (line 8). After updting the mpping, if ll query vertices re mpped, sugrph isomorphism M is reported (lines 9 10). Otherwise, further sugrph serch is conducted (line 12). Finlly, ll chnges done y UpdteStte re restored (line 14). Algorithm 2 SugrphSerch(q, q, g, CR, order, d c) 1: u order[d c] 2: C R CR(u, M(P (q, u))) 3: for ech v C R such tht v is not yet mtched do 4: if F (v) = true then 5: continue 6: end if 7: if IsJoinle(q, g, M, u, v,... ) then 8: UpdteStte(M, F, u, v) 9: if M = V (q) then 10: report M 11: else 12: SugrphSerch(q, q, g, CR, order, d c + 1) 13: end if 14: RestoreStte(M, F, u, v) 15: end if 16: end for Modifying Turo ISO for e-grph Homomorphism. We first explin how the generic sugrph isomorphism lgorithm [14] cn esily hndle grph homomorphism. The generic sugrph isomorphism lgorithm is implemented s cktrck lgorithm, where we find solutions y incrementing prtil solutions or ndoning them when it is determined tht they cnnot e completed. Here, given query grph q nd its mtching order (u σ(1), u σ(2),..., u σ( V (q) ) ), solution is modeled s vector v = (M(u σ(1) ), M(u σ(2) ),..., M(u σ( V (q) ) )) where ech element in v is dt vertex for the corresponding query vertex in the mtching order. At ech step in the cktrck lgorithm, if prtil solution is given, we extend it y dding every possile cndidte dt vertex t the end. Here, ny cndidte dt vertex tht does not stisfy the following three conditions must e pruned. 1) u i V (q), L(u i) L(M(u i)) 2) (u i, u j) E(q), (M(u i), M(u j)) E(g) nd L(u i, u j) = L(M(u i), M(u j)) 3) M(u i) M(u j) if u i u j Note tht the third condition ensures the injective condition, gurnteeing tht no duplicte dt vertex exists in ech solution vector. Thus, y just disling the third condition, the generic sugrph isomorphism lgorithm finds ll possile homomorphisms. Now, we descrie how to disle the third condition in Turo ISO, which is n instnce of the generic sugrph isomorphism lgorithm. Turo ISO uses pruning rules y pplying filters in Explore- CndidteRegion nd SugrphSerch. First, the degree filter nd the NLF filter should e modified since dt vertex cn e mpped to multiple query vertices. The degree filter qulifies dt vertices which hve n equl numer or more neighors thn distinct lels of their corresponding query vertices. The NLF filter qulifies dt vertices which hve t lest one neighor for ll distinct lels of their corresponding query vertices. Second, lines 4 6 of SugrphSerch ensuring the third condition should e removed in order to disle the injectivity test. As we see here, with miniml modifiction to Turo ISO, it cn esily support grph homomorphism. In order to mke Turo ISO hndle the e-grph homomorphism, the query edge to edge lel mpping, M e, should e dditionlly dded in SugrphSerch. For this, UpdteStte ssigns M e(p (q, u), u) = L(M v(p (q, u)), M v(u)), nd RestoreStte removes such mpping. From here on, let us denote Turo ISO modified for the e-grph homomorphism s Turo HOM. 3. RDF QUERY PROCESSING BY E-GRAPH HOMOMORPHISM In this section, we discuss how RDF dtsets cn e nturlly viewed s grphs (Section 3.1), nd thus how n RDF dtset cn e directly trnsformed into corresponding leled grph (Section 3.2). After such trnsformtion, henceforth, the sugrph isomorphism lgorithms modified for the e-grph homomorphism such s Turo HOM cn e pplied for processing SPARQL queries. 3.1 RDF s Grph An RDF dtset is collection of triples ech of which consists of suject, predicte, nd n oject. By considering triples s directed edges, n RDF dtset nturlly ecomes directed grph: the sujects nd the ojects re vertices while the predictes re edges. Figure 3 is grph representtion of triples tht cptures type reltionships etween university orgniztions. Note tht we use rectngles to represent vertices in RDF grphs to distinguish them from the leled grphs. Student University rdf:suclssof rdf:type GrduteStudent rdf:type undergrdutedegreefrom student1 univ1 telephone emiladdress suorgniztionof memerof Figure 3: RDF grph. john@dept1.univ1.edu dept1.univ1 rdf:type Deprtment 1241

5 3.2 Direct Trnsformtion To pply sugrph isomorphism lgorithms modified for e-grph homomorphism (e.g. Turo HOM ) for RDF query processing, RDF grphs hve to e trnsformed into leled grphs first. The most sic wy to trnsform RDF grphs is (1) to mp sujects nd ojects to vertex IDs nd (2) to mp predictes to edge lels. We cll such trnsformtion the direct trnsformtion ecuse the topology of the RDF grph is kept in the leled grph fter the trnsformtion. The vertex lel function L(v)(v V (g)) is the identity function (i.e. L(v) = {v}). Figure 4 shows the result of the direct trnsformtion of Figure 3 Figures 4, 4, nd 4c re the vertex mpping tle, the edge lel mpping tle, nd the trnsformed grph, respectively. e-grph homomorphism hs edge lel mpping from query edges to their corresponding edge lels. Consequently, the direct trnsformtion mkes it possile to pply conventionl e-grph homomorphism lgorithms for processing SPARQL queries. In order to evlute the performnce of such n pproch, we pplied Turo HOM on LUBM8000, illion-triple RDF dtset of Leihigh University Benchmrk (LUBM) [8], fter pplying direct trnsformtion. We compred the performnce of Turo HOM ginst two existing RDF engines: RDF-3X [19], nd System-X 2. Figure 6 depicts the mesured execution time of these three systems in log scle. (See Section 6.1 for the detils of the experiment setup) Suject/Oject Vertex Predicte Edge Lel GrduteStudent v 0 rdf:type Student v 1 rdf:suclssof University v 2 undergrddegreefrom c Deprtment v 3 memerof d student1 v 4 suorgniztionof e univ1 v 5 telephone f dept1.univ1 v 6 emiladdress g v 7 john@dept1.univ1.edu v 8 () edge lel mpping tle. () vertex mpping tle. v1 {v1} {v2} v2 v4 v0 {v0} {v4} c {v5} v5 f g d e (c) grph. v7 {v7} v8 {v8} {v6} v6 {v3} Figure 4: Direct trnsformtion of RDF grph (Vertex lel function L(v) = {v}). A query grph is otined from SPARQL query. A query vertex my hold the vertex lel which corresponds to the suject or oject specified in the SPARQL query. If the query vertex corresponds to vrile, the vertex lel is left lnk. For exmple, the SPARQL query of Figure 5 is trnsformed into the query grph of Figure 5. Here the query vertex u 0, which corresponds to Student, holds the vertex lel {v 1}; To the contrry, the query vertex u 3, which corresponds to the vrile X, hs lnk (_) s the vertex lel. Similrly, query edge my hold the edge lel which corresponds to the predicte. For exmple, the edge lel of (u 3, u 4) is c s the edge corresponds to the undergrddegreefrom predicte. SELECT?X,?Y,?Z WHERE {?X rdf:type Student.?Y rdf:type University.?Z rdf:type Deprtment.?X undergrddegreefrom?y.?x memerof?z.?z suorgniztionof?y.} () SPARQL query. u1 u4(=?y) _ {v2} v3 u3(=?x) _ c u0 {v1} () query grph. Figure 5: Direct trnsformtion of SPARQL query. e d _ u5(=?z) Note tht, when vrile is declred on predicte in SPARQL query, query edge hs lnk edge lel. An e-grph homomorphism lgorithm cn nswer such SPARQL queries since n {v3} u2 Figure 6: Comprison etween originl Turo HOM with the direct trnsformtion grph nd other RDF engines. Although there is no cler winner mong them, the figure revels tht Turo HOM performs s good s the existing RDF engines. For short-running queries (i.e Q1, Q3-Q5, Q7, Q8, Q10- Q13), Turo HOM shows fster elpsed time. As those queries specify dt vertex ID, Turo HOM only needs smll mount of grph explortion from one cndidte region with n optiml mtching order, while RDF-3X nd System-X require expensive join opertions. For long-running queries (i.e., Q2, Q6, Q9, nd Q14), Turo HOM is slower thn some of its competitors. The performnce of Turo HOM lrgely relies on 1) grph explortion y ExploreCndidteRegion nd 2) sugrph enumertion y SugrphSerch. Moreover, when query grph hs non-tree edges, IsJoinle constitutes lrge portion of SugrphSerch. The profiling results of long running queries confirmed tht 1) ExploreCndidte- Region nd SugrphSerch re the dominting fctors nd 2) for queries which hve non-tree edges (Q2 nd Q9), IsJoinle is the dominting fctor of SugrphSerch. Specificlly, Turo HOM spent the most time on ExploreCndidteRegion (e.g. 46% for Q2, 70% for Q6, 72% for Q9, nd 69% for Q14) nd Sugrph- Serch (e.g. 54% for Q2, 30% for Q6, 28% for Q9, nd 31% for Q14). Moreover, for queries which hve non-tree edges, the most of SugrphSerch time ws spent on IsJoinle (e.g. 81.4% for Q2 nd 77.6% for Q9). In order to speed up ExploreCndidte- Region, we propose novel trnsformtion (Section 4.1). Tilored optimiztion techniques re proposed for improving performnce for oth functions (Section 4.3). 4. TURBOHOM++ In this section, we propose n improved e-grph homomorphism lgorithm, Turo HOM ++. Introduced first is the type-wre trnsformtion, which cn result in fster pttern mtching thn direct trnsformtion (Section 4.1). Turo HOM ++ processes the leled grph trnsformed y the type-wre trnsformtion (Section 4.2). Furthermore, for efficient RDF query processing, four optimiztions re pplied to Turo HOM ++ (Section 4.3). 2 We nonymize the product nme to void ny conflict of interest. 1242

6 4.1 Type-wre Trnsformtion To enle the type-wre trnsformtion, we devise the twottriute vertex model which mkes use of the type informtion specified y the rdf:type predicte. Specificlly, this model ssumes tht ech vertex is ssocited with set of lels (the lel ttriute) in ddition to its ID (the ID ttriute). The lel ttriute is otined y following the rdf:type predicte if suject hs one or more rdf:type predictes, its types cn e otined y following the rdf:type (s well s rdf:suclssof predictes trnsitively). For exmple, student1 in Figure 3 hs the lel ttriute, {GrdStudent, Student}. The ove two-ttriute vertex model nturlly leds to our new RDF grph trnsformtion, the type-wre trnsformtion. Here, sujects nd ojects re trnsformed to two-ttriute vertices y utilizing rdf:type predictes s descried ove. Then, the ID ttriute corresponds to the vertex ID, nd the lel ttriute corresponds to the vertex lel. Figure 7 shows n exmple of the mpping tles nd the dt grph, which is the result of type-wre trnsformtion pplied to Figure 3. Now, we formlly define the type-wre trnsformtion s follows. Definition 3. The type-wre trnsformtion (F V, F ID, F E, F V L, F EL) converts set of triples T (S, P, O) to type-wre trnsformed grph G(V, E, ID, L). Let us divide T into three disjoint susets whose union is T T (S, P, O ), T t (S t, P t, O t) = {(s, rdf:type, o) T }, nd T sc(s sc, P sc, O sc) = {(s, rdf:suclssof, o) T }. 1. A vertex mpping F V : S O S t V, which is ijective, mps suject in S S t or n oject in O to vertex. 2. A vertex ID mpping F ID : S O S t N { }, which is ijective, mps suject in S S t or n oject in O to vertex ID or lnk. Here, F ID(x) = if x is vrile. 3. An edge mpping F E : T E, which is ijective, mps triple of T into n edge, F E(s, p, o) = (F V (s), F V (o)). 4. A vertex lel mpping F V L : O t O sc V L { }, which is ijective, mps n oject of O t O sc into vertex lel. Here, F V L(x) = if x is vrile. 5. An edge lel mpping F EL : P EL { }, which is ijective, mps predicte of P into n edge lel. Here, F EL(x) = if x is vrile. 6. A vertex ID mpping function ID : V N mps vertex to vertex ID where ID(v) = F ID F 1 V (v). 7. A leling function L 1) mps vertex to set of vertex lels such tht v V, L(v) = {F V L(o) there is pth from F 1 V (v) to o using triples in T t T sc} nd 2) mps n edge e to n edge lel such tht e E, L(e) = F LE (P red(f 1 E (e))) where P red(s, p, o) = p. After finding type-wre trnsformtion (F V, F ID, F E, F V L, kf EL) for dt grph g(v, E, L, ID ), we cn lso convert SPARQL query into type-wre trnsformed query grph q(v, E, L, ID) y using nother type-wre trnsformtion (F V, F ID, F E, F V L, F EL) such tht F ID = F ID, F V L = F V L, nd F EL = F EL. For exmple, Figure 8 is the query grph type-wre trnsformed from the SPARQL query in Figure 5. Note tht query vertex my hve multiple vertex lels like dt vertex. Now, we explin how the generic e-grph homomorphism lgorithm works for type-wre trnsformed query/dt grphs. When ppending cndidte dt vertex to the current prtil solution, we dditionlly check the following condition for the ID ttriute of the two-ttriute vertex model. Suject/Oject Vertex ID student1 0 univ1 1 dept1.univ john@dept1.univ1.edu 4 () vertex ID mpping tle. Predicte Edge Lel undergrddegreefrom memerof suorgniztionof c telephone d emiladdress e (c) edge lel mpping tle. Type Vertex Lel GrduteStudent A Student B University C Deprtment D () vertex lel mpping tle. v0 v1 0,{A,B} 1,{C} c d e (d) dt grph. 3,{} 4,{} 2,{D} Figure 7: Type-wre trnsformtion of n RDF grph. u1 _,{C} u0 _,{B} c u2 _,{D} Figure 8: Type-wre trnsformtion of SPARQL query of Figure 5. u {u ID(u) for u V }, ID(u) = ID (M v(u)). The virtue of the type-wre trnsformtion is tht it cn improve the efficiency of RDF query processing. Since the type-wre trnsformtion elimintes certin vertices nd edges y emedding type informtion into the vertex lel, the resulting dt/query grphs hve smller size nd simpler topology thn those trnsformed y the direct trnsformtion. As n exmple, let us consider the SPARQL query in Figure 5. After direct trnsformtion, it ecomes the query grph in Figure 5 tht hs reltively complex topology consisting of six vertices nd six edges. On the other hnd, the type-wre trnsformtion produces the query grph in Figure 8 tht hs simple tringle topology. This reduced numer of vertices nd edges hs positive effect on efficiency ecuse it results in less grph explortion. In generl, the effect of the type-wre trnsformtion cn e descried in terms of the numer of dt vertices in ll cndidte regions. Consider SPARQL query which consists of set of triples T, its direct trnsformed query grph q(v, E, L), nd its typewre trnsformed query grph q (V, E, ID, L ). Let O type = {o (s, rdf : type, o) T or (s, rdf : suclssof, o) T }. In the direct trnsformtion, o O type is trnsformed to query vertex. Let V type set of direct trnsformed query vertices from O type. However, in the type-wre trnsformtion, o O type is not trnsformed to query vertex, which stisfies V = V V type. Therefore, the type-wre trnsformtion leds to less grph explortion in ExploreCndidteRegion nd SugrphSerch. Formlly, using the type-wre trnsformtion, the numer of dt vertices in ll cndidte regions is reduced y CR vs (u) u V type v s where v s represents the strting dt vertex for ech cndidte region, nd CR vs (u) represents set of dt vertices in cndidte region CR(v s) tht correspond to u. v3 v4 v2 1243

7 } 4.2 Implementtion Turo HOM ++ mintins two in-memory dt structures the inverse vertex lel list nd the djcency list. Figure 9 shows the inverse vertex lel list of Figure 7d. The end offsets records the exclusive end offset of the vertex IDs for ech vertex lel. Figure 9 shows the djcency list of Figure 7d for the outgoing edges. The djcency list stores the djcent vertices for ech dt vertex in the sme wy s the inverse vertex lel list. One difference is tht the djcency list hs n dditionl rry ( end offsets ) to group the djcent vertices of dt vertex for ech neighor type. Here, the neighor type refers to the pir of the edge lel nd the vertex lel. For exmple, v 0 in Figure 7d, hs four different neighor types (, C), (, D), (d, ) nd (e, ). Those four neighor types re stored in end offsets, nd ech entry points to the exclusive end offset of the djcent vertex ID. Turo HOM ++ mintins nother djcency list for the incoming edges. We ssume tht grphs in our system re periodiclly updted from n underlying RDF source. For efficient grph updte, trnsctionl grph store is definitely required. We leve this explortion to future work since it is eyond the scope of the pper. Note lso tht Turo HOM ++ cn lso hndle SPARQL queries under the simple entilment regime correctly. In order to del with the simple entilment regime in the type-wre trnsformed grph, Turo HOM ++ distinguishes L simple (v) = {F LV (o) there is n edge from F 1 V (v) to o using triples in T } from L(v). Turo HOM ++ cn process SPARQL query under the simple entilment regime using L simple (v) insted of L(v). A B C D end offsets end offsets of lel groups A B C D vertex IDs v0 v0 v1 v2 end offsets vertex IDs A Bv0 Cv0 v1 D v2 A B C D () inverse lel vertex list v0 v1 v2 v3 v4 end offsets end offsets lel groups ((,C),1) 4 ((,D),2) 4 5((d,_),3) 5 ((e,_),4) 5 ((c,c),5) djcent vertex IDs v0 v1 v2 v3 v4 end offsets ((,C),1) dj(v0) ((,D),2) ((d,_),3) ((e,_),4) dj(v2) ((c,d),5) v1 v2 v3 v4dj(v0) v1 djcent vertex IDs v1 v2 v3 v4 v1 dj(v0,(,d)) () djcency list. dj(v0,(,d)) dj(v2) Figure 9: In-memory dt structures for type-wre trnsformed dt grph of Figure 7d (dj(v) : djcent vertices of v, dj(v, (el, vl)) : djcent vertices v, which hve vertex lel vl nd re connected with edge lel el). As the overll ehvior of Turo HOM ++ is similr to Turo HOM, here, we descrie how Turo HOM ++ uses the dt structures in ChooseStrtQueryV ertex (line 6 of Algorithm 1), Explore- CndidteRegion (line 9 of Algorithm 1), nd IsJoinle (line 7 of Algorithm 2). ChooseStrtQueryVertex. When computing rnk(u) for query vertex u, the inverse vertex list is used to get freq(g, L(u)) (= l L(u) V (g) l ) where V (g) l is the set of vertices hving vertex lel l. When L(u) = 1, Getting the strt nd end offset of specific vertex lel is enough. When L(u) > 1, for ech l L(u), ll dt vertices hving l, V (g) l, re retrieved from the inverse vertex list, nd freq(g, L(u)) is otined y intersecting ll V (g) l. Additionlly, when dt vertex ID v is specified in u, freq(g, L(u)) = 1 if v V (g) l for ech l L(u). Otherwise, freq(g, L(u)) = 0. One lst cse is when SPARQL query hs query vertex which hs no lel or ID t ll. In order to hndle such queries, we mintin n index clled the predicte index where key is predicte, nd vlue is pir of list of suject IDs nd list of oject IDs. This index is used to compute freq(g, L(u)). ExploreCndidteRegion. After query tree is generted, cndidte regions re collected y exploring the dt grph in n inductive wy. In the se cse, ll dt vertices tht correspond to the strt query vertex re gthered in the sme wy of computing freq(g, L(u)). In the inductive cse, once the strting dt vertices re identified, the cndidte region explortion continues y exploiting the djcency informtion stored in the djcency list. If one vertex lel nd one edge lel re specified in the query grph, we cn get the djcent dt vertices directly from the djcency list. If multiple vertex lels nd one edge lel re specified, we collect the djcent dt vertices for ech vertex lel using the djcency list, nd intersect them. In cse where the vertex lel or edge lel is lnk, Turo HOM ++ finds the correct djcent dt vertices y 1) collecting ll djcent vertices which mtch ville informtion (either vertex lel or edge lel) nd 2) unioning them. Additionlly, if the current query vertex hs the dt vertex ID ttriute, we check whether the specified dt vertex is included in the dt vertices collected from the djcency list. IsJoinle. The IsJoinle test is equivlent to the inductive cse of ExploreCndidteRegion when dt vertex ID (previously mtched dt vertex) is specified. 4.3 Optimiztion In this susection, we introduce optimiztions tht we pply to improve the efficiency of Turo HOM ++. Even though these optimiztions do not chnge Turo HOM ++ severely, they could improve the query processing efficiency quite significntly. Use intersection on IsJoinle test (+INT). We optimize the IsJoinle test in SugrphSerch. SugrphSerch clls the IsJoinle test y multiple memership opertions. However, the optimiztion llows ulk of IsJoinle tests with one k-wy intersection opertion where k is the numer of edges etween the current query vertex, u in line 1 of Algorithm 2, nd the previously mtched query vertices connected y non-tree edges. SugrphSerch checks the existence of the edges etween the current cndidte dt vertex nd the lredy ounded dt vertices y clling IsJoinle (line 7 of Algorithm 2) when the corresponding query grph hs non-tree edges. Let us consider the query grph (Figure 8), the query tree (Figure 10) nd dt grph (Figure 11). Suppose tht, for given mtching order u 1 u 2 u 0, the vertex v 1 is ound to u 1, nd the vertex v 2 is ound to u 2. Then, the next step is to ind dt vertex to u 0. Becuse there is non-tree edge etween u 0 nd u 2, to ind dt vertex of ID v i(i = 0, 3, 4,, 1001) to u 0, we need to check whether there exists n edge v i v 2. u2 _,{D} c u1 _,{C} u0 _,{B} v1 {C} _,{D} Figure 10: A query tree of the query grph of Figure 8. u2 u1 _,{C} u0 _,{B} 1244 {D} {A,B} {A,B} {A,B} v2 v0 v3 v1001 } 1000 vertices

8 2,{D} c v1 1, {C} v2 v0 v3 v1001 0,{A,B} 3,{A,B} 1000 vertices 1001,{A,B} Figure 11: An exmple dt grph for illustrting +INT. IsJoinle checks for the existence of the edge etween the current dt vertex nd lredy mtched dt vertices y repetitively clling IsJoinle. Let us consider the ove exmple. For ech v i(i = 0, 3, 4,, 1001), IsJoinle tests whether the edge v i v 2 exists. If v 2 is memer of v i s outgoing djcency list, the test succeeds, nd the grph mtching continues. Insted, our modified IsJoinle tests ll the edge occurrences etween the current cndidte vertices (C R in line 3 of Algorithm 2) nd the djcency lists of the lredy mtched dt vertices y one k-wy intersection opertion. Let us consider the ove exmple gin. The modified IsJoinle finds the edge etween v 2 nd the cndidte dt vertices v 0, v 3,, v 1001 t once. For this, it is enough to perform one intersection opertion etween the v 2 s incoming djcency vertices nd the cndidte dt vertices. Since the modified IsJoinle tkes C R s prmeter, the lines 3 nd 7 of Algorithm 2 re merged into one sttement. Note tht this optimiztion cn improve the performnce significntly. In the ove exmple, since only v 0 nd v 1001 pss the test, we cn void clling the originl IsJoinle 998 times. Formlly speking, let us denote 1) the cndidte dt vertex set for the current query vertex u s C R, 2) the previously mtched query vertex set, which is connected to the current query vertex y non-tree query edges, s {u i} k i=1 nd 3) the djcent vertex set of v i(= M v(u i)) where u i is connected to u with the vertex lel vl i nd the edge lel el i, s dj(v i, vl i, el i). Suppose tht C R nd dj(v i, vl i, el i) re stored in ordered rrys. Then, the complexity of the originl IsJoinle test is C originl = O( C R k log dj(v i, vl i, el i) ) i=1, since IsJoinle is clled for ech v C R, nd O(log dj (v i, vl i, el i) ) time is required to conduct inry serch for dj(v i, vl i, el i) elements. On the contrry, the complexity of the modified IsJoinle test is min(o( C R + k dj(v i, vl i, el i) ), C originl ) i=1 since the modified IsJoinle cn choose the k-wy intersections strtegy etween scnning (k + 1) sorted lists nd performing inry serches. Disle NLF Filter (-NLF). The second optimiztion is to disle the NLF filter in ExploreCndidteRegion. The NLF filter my e effective when the neighor type re very irregulr. However, in prctice, most RDF dtsets re structured [7, 17]. For exmple, in our smple RDF dtset (Figure 3), in most cse, vertex corresponding to grdute student hs telephone, emiladdress, memerof, nd undergrdutedegreefrom predictes. Accordingly, the NLF filter is not helpful for such structured RDF dtsets. Disle Degree Filter (-DEG). The third optimiztion is to disle the degree filter in ExploreCndidteRegion. Similr to the NLF filter, the degree filter is effective when the degree is very irregulr while RDF dtsets typiclly re not. Reuse Mtching Order (+REUSE). The lst optimiztion is to reuse the mtching order of the first cndidte region for ll the other cndidte regions. Tht is, DetermineMtchingOrder (line 6 of Algorithm 1) is clled only once throughout the Turo ISO execution, nd the sme mtching order is used throughout the query processing. Turo HOM ++ uses different mtching order for ech cndidte region, ecuse ech cndidte region could hve very different numer of cndidte vertices for given query pth in the e-grph homomorphism prolems. However, typicl RDF dtsets re regulr t the schem level, i.e. well structured in prctice, nd generting the mtching order for ech cndidte region is ineffective, especilly when the size of ech cndidte region is smll. We lso performed experiments with more heterogeneous dtsets, including Yet Another Gret Ontology (YAGO) [23], nd Billion Triples Chllenge 2012 (BTC2012) [10]. This optimiztion technique still shows good mtching performnce s we will see in our extensive experiments in Section 6, since these heterogeneous dtsets do not show extreme irregulrity t the schem level. 5. RELATED WORK With the incresing populrity of RDF, the demnd for SPARQL support in reltionl dtses is lso growing. To meet such demnd, most open-source nd commercil reltionl dtses support the RDF store nd the RDF query processing. RDF dtsets re stored into reltionl tles with set of indexes. After tht, SPARQL queries re processed y trnslting them into the equivlent join queries or y using specil APIs. To support RDF query processing, mny specilized stores for RDF dt were proposed [2, 4, 18, 19, 26, 29]. Similr to RDBMS, RDF-3X [18, 19] trets RDF triples s ig three-ttriute tle, ut oosts the RDF query processing y uilding exhustive indexes nd mintining sttistics. RDF-3X processes mny SPARQL queries y using merge sed join, which is efficient for disk-sed nd in-memory environments. Different from RDF-3X, Jen [26] exploits multiple-property tles, while BitMt [2] exploits 3-dimensionl it cue, so tht it cn lso support 2D mtrices of SO, PO, nd PS. H-RDF-3X [12] is distriuted RDF processing engine where RDF-3X is instlled in ech cluster node. Severl grph stores support RDF dt in their ntive grph storges [30, 35]. gstore [35] performs grph pttern mtching using the filter-nd-refinement strtegy. It first finds promising sugrphs using the VS -tree index. After tht, the exct sugrphs re enumerted in the refinement step. Trinity.RDF [30] is susystem of distriuted grph processing engine, Trinity [22]. The RDF triples re stored in Trinity s key-vlue store. When processing RDF queries, Trinity.RDF implements specil query processing methods for RDF dt. In 1976, Ullmnn [24] pulished his seminl pper on the sugrph isomorphism solution sed on cktrcking. After his work, mny sugrph isomorphism methods were proposed to improve the efficiency y devising their own mtching order selection lgorithms nd filtering constrints [9, 11, 20, 21, 32, 33]. Among those improved methods, Turo ISO [9] solves the notorious mtching order prolem y generting the mtching order for ech cndidte region nd y grouping the query vertices which hve the sme neighor informtion. The method shows the most efficient performnce mong ll representtive methods. Along with the cktrcking sed methods, the index-sed sugrph isomorphism methods were lso proposed [5, 27, 28, 31, 34]. All of those methods first prune out unpromising dt grphs using low-cost filters sed on the grph indexes. After filtering, 1245

9 ny sugrph isomorphism methods cn e pplied to those unfiltered dt grphs. This technique is only useful when there re mny smll dt grphs. Thus, these index-sed sugrph isomorphism methods do not enhnce RDF grph processing since there is only one ig grph in n RDF dtse. 6. EXPERIMENTS We perform extensive experiments on lrge-scle rel nd synthetic dtsets in order to show the superiority of tmed sugrph isomorphism lgorithm for RDF query processing. In the experiment, we use Turo HOM ++. We ssume tht Turo HOM uses direct trnsformtion, while Turo HOM ++ uses type-wre trnsformtion long with ll optimiztions. The specific gols of the experiments re 1) We show the superior performnce of Turo HOM ++ over the stte-of-the-rt RDF engines (Section 6.2), 2) We nlyze the effect of the type-wre trnsformtion nd the series of optimiztions (Section 6.3), nd 3) We show the liner speed-up of the prllel Turo HOM ++ with n incresing numer of threds (Due to the spce limit, plese refer [13] for the detiled result). 6.1 Experiment Setup Competitors. We choose three representtive RDF engines s competitors of Turo HOM ++ RDF-3X, TripleBit, nd System-X. Note tht these three systems re pulicly ville. RDF-3X [19] is well-known RDF store, showing good performnce for vrious types of SPARQL queries. TripleBit [29] is very recent RDF engine efficiently hndling lrge-scle RDF dt. System-X is populr RDF engine exploiting itmp indexing. We exclude Bit- Mt [2] from performnce evlution since it is clerly inferior to TripleBit [29]. gstore is excluded since it is not pulicly ville. Dtsets. We use four RDF dtsets in the experiment LUBM [8], YAGO [23], BTC2012 [10], nd BSBM [3]. LUBM is de-fcto stndrd RDF enchmrk which provides synthetic dt genertor. Using the genertor, we crete three dtsets LUBM80, LUBM800, nd LUBM8000 where the numer represents the scling fctor. YAGO is rel dtset which consists of fcts from Wikipedi nd the WordNet. BTC2012 is rel dtset crwled from multiple RDF we resources. Lstly, BSBM is n RDF enchmrk which provides synthetic dt genertor nd enchmrk queries. BSBM uses more generl SPARQL query fetures such s FILTER, OPTIONAL, nd UNION. Due to the spce limit, plese refer [13] for the experimentl results for YAGO since the performnce trends of YAGO re similr to those for BTC2012. In order to support the originl enchmrk queries in LUBM, we lod the originl triples s well s inferred triples into dtses. In order to otin inferred triples, we use the stte-of-the-rt RDF inference engine. For exmple, LUBM8000 contins originl triples nd inferred triples. Note tht this is the stndrd wy to perform the LUBM enchmrk. However, regrding BTC2012, we use the originl triples only for dtse loding. This is ecuse the BTC2012 dtset contins mny triples tht violte the RDF stndrd, nd thus the RDF inference engine refuses to lod nd execute inference for the BTC2012 dtset. BSBM contins originl triples nd inferred triples. Tle 1 shows the numer of vertices nd edges of the grphs trnsformed y the direct trnsformtion nd the type-wre trnsformtion. The reduced numer of edges in the type-wre trnsformed grph directly ffects the mount of grph explortion in e-grph homomorphism mtching. Queries. Regrding LUBM, we use the 14 originl enchmrk queries provided in the wesite 3. Previous work such s [29] nd 3 Tle 1: Grph size sttistics (direct: direct trnsformtion, type-wre: type-wre trnsformtion). V direct E direct V type-wre E type-wre LUBM LUBM LUBM BTC BSBM [30] modified some of the originl queries ecuse executing those originl queries without the inferred triples returns n empty result set. Regrding BTC2012, we use the sme query sets proposed in [29], ecuse they do not hve officil enchmrk queries. Regrding BSBM, we used 12 queries in the explore use cse 4 which contin OPTIONAL, FILTER, nd UNION keywords which test the cpility of more generl SPARQL query support. In order to mesure the pure sugrph mtching performnce, (1) we omit modifiers which reorgnize the sugrph pttern mtching results (e.g. DISTINCT nd ORDER BY) in ll queries nd (2) we mesure the elpsed time excluding the dictionry look-up time. Running Environment. We conduct the experiments in server running Linux four Intel Xeon E CPUs nd 1.5T B RAM. The server hs the NUMA [15, 16] rchitecture with 4 sockets in which ech socket hs its own CPU nd locl memory. We mesure the elpsed times with wrm cche. To do tht, we set up the competitors running environment s follows. For RDF-3X nd TripleBit, s done in [30], we put the dtse files in the tmpfs in-memory filesystem, which is kind of RAM disk. For System-X, we set the memory uffer size to 400GB, which is sufficient for loding the entire dtse in memory. We execute every query five times, exclude the est nd worst times, nd compute the verge of the remining three. 6.2 Comprison etween Turo HOM++ nd RDF engines We report the elpsed times of the enchmrk queries using single thred. Since the server hs NUMA rchitecture, memory lloction is lwys done within one CPU s locl memory. LUBM. Tle 2 shows the numer of solutions for ll enchmrk queries in ll LUBM dtsets. Tle 3 shows experimentl results for LUBM80, LUBM800, nd LUBM8000. Note tht Tripleit ws not le to return correct nswers for two queries over LUBM80/LUBM800 nd for ten queries over LUBM8000. In Tle 3, we use X or the superscript * over the elpsed times when TripleBit returns incorrect numers of solutions. In order to nlyze results in depth, we clssify the LUBM queries into two types. The first type of queries hs constnt numer of solutions regrdless of the dtset size. Q1, Q3 Q5, Q7, Q8, nd Q10 Q12 elong to this type. These queries re clled constnt solution queries. The other queries (Q2, Q6, Q9, Q13, nd Q14) hve incresing numers of solutions proportionl to the dtset size. These queries re clled incresing solution queries. Regrding the constnt solution queries, only Turo HOM ++ chieves the idel performnce in LUBM, which mens constnt performnce regrdless of dtset size. This phenomenon is nlyzed s follows. Ech constnt solution query contins query vertex whose ID ttriute is set to n entity in the RDF grph. Thus, Turo HOM ++ chooses tht query vertex s strting query vertex nd genertes cndidte region. Furthermore, in the LUBM 4 de/izer/erlinsprqlenchmrk/spec/ ExploreUseCse/index.html 1246

10 Tle 2: Numer of solutions in LUBM queries. Dtset Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 LUBM LUBM LUBM Tle 3: Elpsed time in LUBM [unit: ms] (X: wrong numer of solutions (# of solutions difference > 3), * : wrong numer of solutions (# of solutions difference 3)). Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Turo HOM RDF-3X TripleBit X System-X () LUBM80. Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Turo HOM RDF-3X TripleBit X System-X () LUBM800. Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Turo HOM RDF-3X TripleBit X X X X X X X X X X System-X (c) LUBM8000. dtsets, lthough we increse the scling fctor in order to increse the dtse size, the size of the cndidte region explored y every constnt solution query remins lmost the sme. In contrst, the elpsed times of RDF-3X increse s the dtset size increses. This is ecuse the dt size to scn for merge join increses s the dtset size increses. Thus, the performnce gp etween Turo HOM ++ nd RDF-3X increses s the dtset size increses. In LUBM80, Turo HOM ++ is (Q11) (Q7) times fster thn RDF-3X. In LUBM800, Turo HOM ++ outperforms RDF-3X y (Q10) (Q7) times. In LUBM8000, Turo HOM ++ outperforms RDF-3X y 43.10(Q1) (Q7) times. TripleBit shows similr trend s RDF-3X. Accordingly,Turo HOM ++ is 4.40 (Q11 in LUBM80) (Q5 in LUBM8000) times fster thn TripleBit. System-X shows constnt elpsed times for these queries, lthough it is consistently slower thn Turo HOM ++ y up to times. For the incresing solution queries (Q2, Q6, Q9, Q13, nd Q14), Turo HOM ++ lso shows the est performnce in ll LUBM dtsets. Overll, the elpsed times of Turo HOM ++ re proportionl to the numer of solutions for these queries. Specificlly, fter type-wre trnsformtion, Q13 hs one query vertex whose ID ttriute is set to n entity in the dt grph. Thus, the numer of cndidte regions is one, which is similr to the constnt solution query. However, s the dtset size increses, the cndidte region size lso increses. The other queries (Q2, Q6, Q9, Q14) do not hve ny query vertex whose ID ttriute is set to n entity in the dt grph. As the dtset increses, the numer of cndidte regions for these queries increses, while ech cndidte region size does not chnge. All systems show the incresing elpsed time s the dtset size increses. RDF-3X shows 7.60 (Q9 in LUBM80) (Q13 in LUBM8000) times longer elpsed times thn Turo HOM ++. TripleBit shows (Q2 in LUBM80) (Q13 in LUBM800) times longer elpsed time thn Turo HOM ++ when considering the queries which hve the right numer of solutions. System-X shows 7.72 (Q14 in LUBM8000) (Q9 in LUBM8000) times longer elpsed time thn Turo HOM ++. For the constnt solution query, System-X seems to e the est competitor of Turo HOM ++. However, regrding the most time-consuming queries (Q2, Q9), System-X shows poor performnce. BTC2012. Tle 4 shows the exct numer of solutions nd elpsed times in BTC2012. Even though BTC2012 contins over 1-illion triples, ll the engines process ll BTC2012 queries quite efficiently. This is ecuse the shpes of query grphs re simple (tree-shped). Furthermore, like LUBM, Q2, Q4, nd Q5 in the BTC2012 query set contin one query vertex whose ID ttriute is set to n entity in the RDF grph. Still, Turo HOM ++ outperforms RDF-3X, TripleBit, nd System-X y up to , 28.57, nd times, respectively. Tle 4: Numer of solutions nd elpsed time [unit: ms] in BTC2012. Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 # of sol Turo HOM RDF3X TripleBit X System-X BSBM.Tle 5 shows the exct numer of solutions nd elpsed times in BSBM. The open source RDF engines, RDF-3X nd TripleBit, re excluded s they do not support OPTIONAL nd FILTER. Like BTC2012, even though BSBM contins out 1- illion triples, Turo HOM ++ processes most BSBM queries less thn 5ms except Q5 nd Q6. Tht is ecuse they hve smll numer of solutions nd contin one query vertex whose ID ttriute is set to n entity in the RDF grph. For those ten queries, 1247

11 Turo HOM ++ outperforms System-X y times. Q5 nd Q6 tke longer thn the other queries ecuse they use expensive filters such s join conditions (Q5) nd regulr expression (Q6) nd filter out lrge numer of solutions fter sic grph pttern mtching is finished. Before evluting FILTER, Q5 (Q6) hs ( ) solutions from the query grph pttern nd only qulifies 6803 (43508) finl solutions. Tle 5: Numer of solutions nd elpsed time [unit: ms] in BSBM. Q1 Q2 Q3 Q4 Q5 Q6 # of sol Turo HOM System-X Q7 Q8 Q9 Q10 Q11 Q12 # of sol Turo HOM System-X Effect of Improvement Techniques We mesure the effect of the improvement techniques including the type-wre trnsformtion (Section 4.1) nd the four optimiztions (Section 4.3). For this purpose, we use the lrgest LUBM dtset, LUBM8000. We first show the effect of the type-wre trnsformtion ecuse it is eneficil to ll LUBM queries. We next show the effect of the four optimiztions (Section 6.3.2) Effect of Type-wre Trnsformtion Tle 6 shows the elpsed times for the LUBM queries in LUBM8000 using the direct trnsformtion (Turo HOM ) nd the type-wre trnsformtion (Turo HOM ++ without optimiztions). Compred with the direct trnsformtion, the type-wre trnsformtion improves the query performnce y 1.01(Q1) to 27.22(Q6). The ovious reson for performnce improvement is the smller query sizes fter the type-wre trnsformtion. The reduced sized query grph leds to smller size cndidte regions nd shorter elpsed times. First of ll, Q6 nd Q14 enefit the most from the type-wre trnsformtion. After the type-wre trnsformtion, these queries ecome point-shped. Tht is, solutions of these two queries re directly otined y iterting the dt vertices which hve the vertex lel of the query vertex, which corresponds to lines 2 4 in Algorithm 1. Q13 lso enefits much from the typewre trnsformtion, since the type-wre trnsformtion chooses etter strting query vertex thn the direct trnsformtion which chooses query vertex hving type informtion. Q1, Q3, Q4, Q5, Q7, Q8, Q10, Q11, nd Q12 do not enefit from the type-wre trnsformtion ecuse they lredy hve smll numer of cndidte vertices under the direct trnsformtion. Q2 enefits less thn the other long running queries from the type-wre trnsformtion. The following is the profiling result of Q2 with the direct/type-wre trnsformtion. Q2 with direct trnsformtion tkes milliseconds in ExploreCndidteRegion nd milliseconds in SugrphSerch. Note tht, with direct trnsformtion, the strting vertex is ritrrily chosen from u 0, u 1, u 2 in Figure 5 since they ll hve sme vertex lel frequency (freq(g, L(u i)) = 1, i = 0, 1, 2) nd the sme degree of 1. In our implementtion, the first query vertex u 0 is chosen nd thus the lel of the non-tree edge is suorgniztionof. However, with type-wre trnsformtion, the strting vertex is u 1 in Figure 8, nd the lel of the non-tree edge is memerof. Although the numer of cndidte regions with u 1 is the minimum mong u 0, u 1, nd u 2, the cost of IsJoinle clls for memerof increses 1.30 times. Thus, Q2 with type-wre trnsformtion tkes milliseconds in ExploreCnditeRegion nd milliseconds in SugrphSerch. We chieve only 1.16 times performnce improvement. However, the cost of the IsJoinle cll is significntly reduced y using +INT. Thus, fter pplying type-wre trnsformtion nd the tilored optimiztions, the finl elpsed time for Q2 ecomes ms, i.e., times performnce improvement compred with direct trnsformtion only Effect of Four Optimiztions In this experiment, we mesure the effect of four optimiztions of Turo HOM ++. We use Q2 nd Q9 in LUBM8000 since these two queries in LUBM8000 re the most time-consuming nd exploit ll optimiztions. All the other queries re omitted since their elpsed times re too short, so tht it is hrd to recognize the effect of optimiztion. Note tht the elpsed times of Q1, Q3 Q5, Q7, Q8, Q10 Q13 re too short (< 2ms), nd Q6 nd Q14 do not enefit from these optimiztions since they re point-shped. Figure 12 shows the reduced times of Q2 nd Q9 in LUBM8000 fter pplying these optimiztions seprtely. The optimiztion techniques in X-xis re ordered y the reduced in decresing mnner +INT, -NLF, -DEG, nd +REUSE. Interestingly, even though Q2 nd Q9 hve the sme shpe (i.e., tringlulr), the most effective optimiztions were different. +INT ws the most effective in Q2. -NLF ws the most effective in Q9 since the size of ech cndidte region ws very smll. -DEG ws more effective in Q9 thn in Q2 since Q9 hs more dt vertices pplied to the degree filter. +REUSE ws effective in Q9 which hs lrge numer of cndidte regions while Q2 did not enefit from +REUSE. Figure 12: Reduced elpsed time of ech optimiztion (Elpsed time of no-optimiztion: ms (Q2) nd ms (Q9)). 7. CONCLUSION The core function of processing RDF dt is sugrph pttern mtching. There hve een two completely different directions for supporting efficient sugrph pttern mtching. One direction is to develop specilized RDF query processing engines exploiting the properties of RDF dt, while the other direction is to develop efficient sugrph isomorphism lgorithms for generl, leled grphs. In this pper, we posed n importnt reserch question, Cn sugrph isomorphism e tmed for efficient RDF processing? In order to ddress this question, we provided the first direct nd comprehensive comprison of the stte-of-the-rt sugrph isomorphism method with representtive RDF processing engines. We first showed tht sugrph isomorphism lgorithm requires miniml modifiction to hndle grph homomorphism with edge lel mpping which is the RDF grph pttern mtching semntics. We then provided novel trnsformtion method, clled 1248

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

In the last lecture, we discussed how valid tokens may be specified by regular expressions. LECTURE 5 Scnning SYNTAX ANALYSIS We know from our previous lectures tht the process of verifying the syntx of the progrm is performed in two stges: Scnning: Identifying nd verifying tokens in progrm.

More information

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs. Lecture 5 Wlks, Trils, Pths nd Connectedness Reding: Some of the mteril in this lecture comes from Section 1.2 of Dieter Jungnickel (2008), Grphs, Networks nd Algorithms, 3rd edition, which is ville online

More information

2 Computing all Intersections of a Set of Segments Line Segment Intersection

2 Computing all Intersections of a Set of Segments Line Segment Intersection 15-451/651: Design & Anlysis of Algorithms Novemer 14, 2016 Lecture #21 Sweep-Line nd Segment Intersection lst chnged: Novemer 8, 2017 1 Preliminries The sweep-line prdigm is very powerful lgorithmic design

More information

Fig.25: the Role of LEX

Fig.25: the Role of LEX The Lnguge for Specifying Lexicl Anlyzer We shll now study how to uild lexicl nlyzer from specifiction of tokens in the form of list of regulr expressions The discussion centers round the design of n existing

More information

COMP 423 lecture 11 Jan. 28, 2008

COMP 423 lecture 11 Jan. 28, 2008 COMP 423 lecture 11 Jn. 28, 2008 Up to now, we hve looked t how some symols in n lphet occur more frequently thn others nd how we cn sve its y using code such tht the codewords for more frequently occuring

More information

A dual of the rectangle-segmentation problem for binary matrices

A dual of the rectangle-segmentation problem for binary matrices A dul of the rectngle-segmenttion prolem for inry mtrices Thoms Klinowski Astrct We consider the prolem to decompose inry mtrix into smll numer of inry mtrices whose -entries form rectngle. We show tht

More information

What are suffix trees?

What are suffix trees? Suffix Trees 1 Wht re suffix trees? Allow lgorithm designers to store very lrge mount of informtion out strings while still keeping within liner spce Allow users to serch for new strings in the originl

More information

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries Tries Yufei To KAIST April 9, 2013 Y. To, April 9, 2013 Tries In this lecture, we will discuss the following exct mtching prolem on strings. Prolem Let S e set of strings, ech of which hs unique integer

More information

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5 CS321 Lnguges nd Compiler Design I Winter 2012 Lecture 5 1 FINITE AUTOMATA A non-deterministic finite utomton (NFA) consists of: An input lphet Σ, e.g. Σ =,. A set of sttes S, e.g. S = {1, 3, 5, 7, 11,

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Dt Mining y I. H. Witten nd E. Frnk Simplicity first Simple lgorithms often work very well! There re mny kinds of simple structure, eg: One ttriute does ll the work All ttriutes contriute eqully

More information

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis CS143 Hndout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexicl Anlysis In this first written ssignment, you'll get the chnce to ply round with the vrious constructions tht come up when doing lexicl

More information

Taming Subgraph Isomorphism for RDF Query Processing

Taming Subgraph Isomorphism for RDF Query Processing Tming Sugrph Isomorphism for RDF Query Proessing Jinh Kim # jinh.kim@orle.om Hyungyu Shin hgshin@dl.posteh..kr Wook-Shin Hn wshn@posteh..kr Sungpk Hong # Hssn Chfi # {sungpk.hong, hssn.hfi}@orle.om POSTECH,

More information

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards A Tutology Checker loosely relted to Stålmrck s Algorithm y Mrtin Richrds mr@cl.cm.c.uk http://www.cl.cm.c.uk/users/mr/ University Computer Lortory New Museum Site Pemroke Street Cmridge, CB2 3QG Mrtin

More information

From Dependencies to Evaluation Strategies

From Dependencies to Evaluation Strategies From Dependencies to Evlution Strtegies Possile strtegies: 1 let the user define the evlution order 2 utomtic strtegy sed on the dependencies: use locl dependencies to determine which ttriutes to compute

More information

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming Lecture 10 Evolutionry Computtion: Evolution strtegies nd genetic progrmming Evolution strtegies Genetic progrmming Summry Negnevitsky, Person Eduction, 2011 1 Evolution Strtegies Another pproch to simulting

More information

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1):

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1): Overview (): Before We Begin Administrtive detils Review some questions to consider Winter 2006 Imge Enhncement in the Sptil Domin: Bsics of Sptil Filtering, Smoothing Sptil Filters, Order Sttistics Filters

More information

Algorithm Design (5) Text Search

Algorithm Design (5) Text Search Algorithm Design (5) Text Serch Tkshi Chikym School of Engineering The University of Tokyo Text Serch Find sustring tht mtches the given key string in text dt of lrge mount Key string: chr x[m] Text Dt:

More information

Notes for Graph Theory

Notes for Graph Theory Notes for Grph Theory These re notes I wrote up for my grph theory clss in 06. They contin most of the topics typiclly found in grph theory course. There re proofs of lot of the results, ut not of everything.

More information

Presentation Martin Randers

Presentation Martin Randers Presenttion Mrtin Rnders Outline Introduction Algorithms Implementtion nd experiments Memory consumption Summry Introduction Introduction Evolution of species cn e modelled in trees Trees consist of nodes

More information

Efficient K-NN Search in Polyphonic Music Databases Using a Lower Bounding Mechanism

Efficient K-NN Search in Polyphonic Music Databases Using a Lower Bounding Mechanism Efficient K-NN Serch in Polyphonic Music Dtses Using Lower Bounding Mechnism Ning-Hn Liu Deprtment of Computer Science Ntionl Tsing Hu University Hsinchu,Tiwn 300, R.O.C 886-3-575679 nhliou@yhoo.com.tw

More information

Topological Queries on Graph-structured XML Data: Models and Implementations

Topological Queries on Graph-structured XML Data: Models and Implementations Topologicl Queries on Grph-structured XML Dt: Models nd Implementtions Hongzhi Wng, Jinzhong Li, nd Jizhou Luo Astrct In mny pplictions, dt is in grph structure, which cn e nturlly represented s grph-structured

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distriuted Systems Principles nd Prdigms Chpter 11 (version April 7, 2008) Mrten vn Steen Vrije Universiteit Amsterdm, Fculty of Science Dept. Mthemtics nd Computer Science Room R4.20. Tel: (020) 598 7784

More information

A Heuristic Approach for Discovering Reference Models by Mining Process Model Variants

A Heuristic Approach for Discovering Reference Models by Mining Process Model Variants A Heuristic Approch for Discovering Reference Models by Mining Process Model Vrints Chen Li 1, Mnfred Reichert 2, nd Andres Wombcher 3 1 Informtion System Group, University of Twente, The Netherlnds lic@cs.utwente.nl

More information

Midterm 2 Sample solution

Midterm 2 Sample solution Nme: Instructions Midterm 2 Smple solution CMSC 430 Introduction to Compilers Fll 2012 November 28, 2012 This exm contins 9 pges, including this one. Mke sure you hve ll the pges. Write your nme on the

More information

Dr. D.M. Akbar Hussain

Dr. D.M. Akbar Hussain Dr. D.M. Akr Hussin Lexicl Anlysis. Bsic Ide: Red the source code nd generte tokens, it is similr wht humns will do to red in; just tking on the input nd reking it down in pieces. Ech token is sequence

More information

AI Adjacent Fields. This slide deck courtesy of Dan Klein at UC Berkeley

AI Adjacent Fields. This slide deck courtesy of Dan Klein at UC Berkeley AI Adjcent Fields Philosophy: Logic, methods of resoning Mind s physicl system Foundtions of lerning, lnguge, rtionlity Mthemtics Forml representtion nd proof Algorithms, computtion, (un)decidility, (in)trctility

More information

An Algorithm for Enumerating All Maximal Tree Patterns Without Duplication Using Succinct Data Structure

An Algorithm for Enumerating All Maximal Tree Patterns Without Duplication Using Succinct Data Structure , Mrch 12-14, 2014, Hong Kong An Algorithm for Enumerting All Mximl Tree Ptterns Without Dupliction Using Succinct Dt Structure Yuko ITOKAWA, Tomoyuki UCHIDA nd Motoki SANO Astrct In order to extrct structured

More information

COMBINATORIAL PATTERN MATCHING

COMBINATORIAL PATTERN MATCHING COMBINATORIAL PATTERN MATCHING Genomic Repets Exmple of repets: ATGGTCTAGGTCCTAGTGGTC Motivtion to find them: Genomic rerrngements re often ssocited with repets Trce evolutionry secrets Mny tumors re chrcterized

More information

Inference of node replacement graph grammars

Inference of node replacement graph grammars Glley Proof 22/6/27; :6 File: id293.tex; BOKCTP/Hin p. Intelligent Dt Anlysis (27) 24 IOS Press Inference of node replcement grph grmmrs Jcek P. Kukluk, Lwrence B. Holder nd Dine J. Cook Deprtment of Computer

More information

Ma/CS 6b Class 1: Graph Recap

Ma/CS 6b Class 1: Graph Recap M/CS 6 Clss 1: Grph Recp By Adm Sheffer Course Detils Adm Sheffer. Office hour: Tuesdys 4pm. dmsh@cltech.edu TA: Victor Kstkin. Office hour: Tuesdys 7pm. 1:00 Mondy, Wednesdy, nd Fridy. http://www.mth.cltech.edu/~2014-15/2term/m006/

More information

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus Unit #9 : Definite Integrl Properties, Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl

More information

Reducing a DFA to a Minimal DFA

Reducing a DFA to a Minimal DFA Lexicl Anlysis - Prt 4 Reducing DFA to Miniml DFA Input: DFA IN Assume DFA IN never gets stuck (dd ded stte if necessry) Output: DFA MIN An equivlent DFA with the minimum numer of sttes. Hrry H. Porter,

More information

Qubit allocation for quantum circuit compilers

Qubit allocation for quantum circuit compilers Quit lloction for quntum circuit compilers Nov. 10, 2017 JIQ 2017 Mrcos Yukio Sirichi Sylvin Collnge Vinícius Fernndes dos Sntos Fernndo Mgno Quintão Pereir Compilers for quntum computing The first genertion

More information

SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs

SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs SAPPER: Sugrph Indexing nd Approximte Mtching in Lrge Grphs Shijie Zhng, Jiong Yng, Wei Jin EECS Dept., Cse Western Reserve University, {shijie.zhng, jiong.yng, wei.jin}@cse.edu ABSTRACT With the emergence

More information

Complete Coverage Path Planning of Mobile Robot Based on Dynamic Programming Algorithm Peng Zhou, Zhong-min Wang, Zhen-nan Li, Yang Li

Complete Coverage Path Planning of Mobile Robot Based on Dynamic Programming Algorithm Peng Zhou, Zhong-min Wang, Zhen-nan Li, Yang Li 2nd Interntionl Conference on Electronic & Mechnicl Engineering nd Informtion Technology (EMEIT-212) Complete Coverge Pth Plnning of Mobile Robot Bsed on Dynmic Progrmming Algorithm Peng Zhou, Zhong-min

More information

Position Heaps: A Simple and Dynamic Text Indexing Data Structure

Position Heaps: A Simple and Dynamic Text Indexing Data Structure Position Heps: A Simple nd Dynmic Text Indexing Dt Structure Andrzej Ehrenfeucht, Ross M. McConnell, Niss Osheim, Sung-Whn Woo Dept. of Computer Science, 40 UCB, University of Colordo t Boulder, Boulder,

More information

An Efficient Divide and Conquer Algorithm for Exact Hazard Free Logic Minimization

An Efficient Divide and Conquer Algorithm for Exact Hazard Free Logic Minimization An Efficient Divide nd Conquer Algorithm for Exct Hzrd Free Logic Minimiztion J.W.J.M. Rutten, M.R.C.M. Berkelr, C.A.J. vn Eijk, M.A.J. Kolsteren Eindhoven University of Technology Informtion nd Communiction

More information

UNIT 11. Query Optimization

UNIT 11. Query Optimization UNIT Query Optimiztion Contents Introduction to Query Optimiztion 2 The Optimiztion Process: An Overview 3 Optimiztion in System R 4 Optimiztion in INGRES 5 Implementing the Join Opertors Wei-Png Yng,

More information

Definition of Regular Expression

Definition of Regular Expression Definition of Regulr Expression After the definition of the string nd lnguges, we re redy to descrie regulr expressions, the nottion we shll use to define the clss of lnguges known s regulr sets. Recll

More information

PLWAP Sequential Mining: Open Source Code

PLWAP Sequential Mining: Open Source Code PL Sequentil Mining: Open Source Code C.I. Ezeife School of Computer Science University of Windsor Windsor, Ontrio N9B 3P4 cezeife@uwindsor.c Yi Lu Deprtment of Computer Science Wyne Stte University Detroit,

More information

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011 CSCI 3130: Forml Lnguges nd utomt Theory Lecture 12 The Chinese University of Hong Kong, Fll 2011 ndrej Bogdnov In progrmming lnguges, uilding prse trees is significnt tsk ecuse prse trees tell us the

More information

PARALLEL AND DISTRIBUTED COMPUTING

PARALLEL AND DISTRIBUTED COMPUTING PARALLEL AND DISTRIBUTED COMPUTING 2009/2010 1 st Semester Teste Jnury 9, 2010 Durtion: 2h00 - No extr mteril llowed. This includes notes, scrtch pper, clcultor, etc. - Give your nswers in the ville spce

More information

Information Retrieval and Organisation

Information Retrieval and Organisation Informtion Retrievl nd Orgnistion Suffix Trees dpted from http://www.mth.tu.c.il/~himk/seminr02/suffixtrees.ppt Dell Zhng Birkeck, University of London Trie A tree representing set of strings { } eef d

More information

documents 1. Introduction

documents 1. Introduction www.ijcsi.org 4 Efficient structurl similrity computtion etween XML documents Ali Aïtelhdj Computer Science Deprtment, Fculty of Electricl Engineering nd Computer Science Mouloud Mmmeri University of Tizi-Ouzou

More information

LR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table

LR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table TDDD55 Compilers nd Interpreters TDDB44 Compiler Construction LR Prsing, Prt 2 Constructing Prse Tles Prse tle construction Grmmr conflict hndling Ctegories of LR Grmmrs nd Prsers Peter Fritzson, Christoph

More information

Stack Manipulation. Other Issues. How about larger constants? Frame Pointer. PowerPC. Alternative Architectures

Stack Manipulation. Other Issues. How about larger constants? Frame Pointer. PowerPC. Alternative Architectures Other Issues Stck Mnipultion support for procedures (Refer to section 3.6), stcks, frmes, recursion mnipulting strings nd pointers linkers, loders, memory lyout Interrupts, exceptions, system clls nd conventions

More information

CS481: Bioinformatics Algorithms

CS481: Bioinformatics Algorithms CS481: Bioinformtics Algorithms Cn Alkn EA509 clkn@cs.ilkent.edu.tr http://www.cs.ilkent.edu.tr/~clkn/teching/cs481/ EXACT STRING MATCHING Fingerprint ide Assume: We cn compute fingerprint f(p) of P in

More information

Systems I. Logic Design I. Topics Digital logic Logic gates Simple combinational logic circuits

Systems I. Logic Design I. Topics Digital logic Logic gates Simple combinational logic circuits Systems I Logic Design I Topics Digitl logic Logic gtes Simple comintionl logic circuits Simple C sttement.. C = + ; Wht pieces of hrdwre do you think you might need? Storge - for vlues,, C Computtion

More information

10.5 Graphing Quadratic Functions

10.5 Graphing Quadratic Functions 0.5 Grphing Qudrtic Functions Now tht we cn solve qudrtic equtions, we wnt to lern how to grph the function ssocited with the qudrtic eqution. We cll this the qudrtic function. Grphs of Qudrtic Functions

More information

File Manager Quick Reference Guide. June Prepared for the Mayo Clinic Enterprise Kahua Deployment

File Manager Quick Reference Guide. June Prepared for the Mayo Clinic Enterprise Kahua Deployment File Mnger Quick Reference Guide June 2018 Prepred for the Myo Clinic Enterprise Khu Deployment NVIGTION IN FILE MNGER To nvigte in File Mnger, users will mke use of the left pne to nvigte nd further pnes

More information

Preserving Constraints for Aggregation Relationship Type Update in XML Document

Preserving Constraints for Aggregation Relationship Type Update in XML Document Preserving Constrints for Aggregtion Reltionship Type Updte in XML Document Eric Prdede 1, J. Wenny Rhyu 1, nd Dvid Tnir 2 1 Deprtment of Computer Science nd Computer Engineering, L Trobe University, Bundoor

More information

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have Rndom Numers nd Monte Crlo Methods Rndom Numer Methods The integrtion methods discussed so fr ll re sed upon mking polynomil pproximtions to the integrnd. Another clss of numericl methods relies upon using

More information

Graphs with at most two trees in a forest building process

Graphs with at most two trees in a forest building process Grphs with t most two trees in forest uilding process rxiv:802.0533v [mth.co] 4 Fe 208 Steve Butler Mis Hmnk Mrie Hrdt Astrct Given grph, we cn form spnning forest y first sorting the edges in some order,

More information

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona Implementing utomt Sc 5 ompilers nd Systems Softwre : Lexicl nlysis II Deprtment of omputer Science University of rizon collerg@gmil.com opyright c 009 hristin ollerg NFs nd DFs cn e hrd-coded using this

More information

An Efficient Algorithm for Discovering Frequent Subgraphs. Technical Report

An Efficient Algorithm for Discovering Frequent Subgraphs. Technical Report An Efficient Algorithm for Discovering Frequent Sugrphs Technicl Report Deprtment of Computer Science nd Engineering Universit of Minnesot 4-192 EECS Building 200 Union Street SE Minnepolis, MN 55455-0159

More information

OUTPUT DELIVERY SYSTEM

OUTPUT DELIVERY SYSTEM Differences in ODS formtting for HTML with Proc Print nd Proc Report Lur L. M. Thornton, USDA-ARS, Animl Improvement Progrms Lortory, Beltsville, MD ABSTRACT While Proc Print is terrific tool for dt checking

More information

Meaningful Change Detection in Structured Data.

Meaningful Change Detection in Structured Data. Meningful Chnge Detection in Structured Dt Sudrshn S. Chwthe Hector Grci-Molin Computer Science Deprtment, Stnford University, Stnford, Cliforni 94305 fchw,hectorg@cs.stnford.edu Astrct Detecting chnges

More information

Functor (1A) Young Won Lim 10/5/17

Functor (1A) Young Won Lim 10/5/17 Copyright (c) 2016-2017 Young W. Lim. Permission is grnted to copy, distribute nd/or modify this document under the terms of the GNU Free Documenttion License, Version 1.2 or ny lter version published

More information

Ma/CS 6b Class 1: Graph Recap

Ma/CS 6b Class 1: Graph Recap M/CS 6 Clss 1: Grph Recp By Adm Sheffer Course Detils Instructor: Adm Sheffer. TA: Cosmin Pohot. 1pm Mondys, Wednesdys, nd Fridys. http://mth.cltech.edu/~2015-16/2term/m006/ Min ook: Introduction to Grph

More information

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFORMATICS 1 COMPUTATION & LOGIC INSTRUCTIONS TO CANDIDATES

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFORMATICS 1 COMPUTATION & LOGIC INSTRUCTIONS TO CANDIDATES UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFORMATICS COMPUTATION & LOGIC Sturdy st April 7 : to : INSTRUCTIONS TO CANDIDATES This is tke-home exercise. It will not

More information

ISG: Itemset based Subgraph Mining

ISG: Itemset based Subgraph Mining ISG: Itemset bsed Subgrph Mining by Lini Thoms, Stynryn R Vlluri, Kmlkr Krlplem Report No: IIIT/TR/2009/179 Centre for Dt Engineering Interntionl Institute of Informtion Technology Hyderbd - 500 032, INDIA

More information

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona CSc 453 Compilers nd Systems Softwre 4 : Lexicl Anlysis II Deprtment of Computer Science University of Arizon collerg@gmil.com Copyright c 2009 Christin Collerg Implementing Automt NFAs nd DFAs cn e hrd-coded

More information

A Sparse Grid Representation for Dynamic Three-Dimensional Worlds

A Sparse Grid Representation for Dynamic Three-Dimensional Worlds A Sprse Grid Representtion for Dynmic Three-Dimensionl Worlds Nthn R. Sturtevnt Deprtment of Computer Science University of Denver Denver, CO, 80208 sturtevnt@cs.du.edu Astrct Grid representtions offer

More information

Intermediate Information Structures

Intermediate Information Structures CPSC 335 Intermedite Informtion Structures LECTURE 13 Suffix Trees Jon Rokne Computer Science University of Clgry Cnd Modified from CMSC 423 - Todd Trengen UMD upd Preprocessing Strings We will look t

More information

Lexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Lexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay Lexicl Anlysis Amith Snyl (www.cse.iit.c.in/ s) Deprtment of Computer Science nd Engineering, Indin Institute of Technology, Bomy Septemer 27 College of Engineering, Pune Lexicl Anlysis: 2/6 Recp The input

More information

F. R. K. Chung y. University ofpennsylvania. Philadelphia, Pennsylvania R. L. Graham. AT&T Labs - Research. March 2,1997.

F. R. K. Chung y. University ofpennsylvania. Philadelphia, Pennsylvania R. L. Graham. AT&T Labs - Research. March 2,1997. Forced convex n-gons in the plne F. R. K. Chung y University ofpennsylvni Phildelphi, Pennsylvni 19104 R. L. Grhm AT&T Ls - Reserch Murry Hill, New Jersey 07974 Mrch 2,1997 Astrct In seminl pper from 1935,

More information

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining EECS150 - Digitl Design Lecture 23 - High-level Design nd Optimiztion 3, Prllelism nd Pipelining Nov 12, 2002 John Wwrzynek Fll 2002 EECS150 - Lec23-HL3 Pge 1 Prllelism Prllelism is the ct of doing more

More information

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016 Solving Prolems y Serching CS 486/686: Introduction to Artificil Intelligence Winter 2016 1 Introduction Serch ws one of the first topics studied in AI - Newell nd Simon (1961) Generl Prolem Solver Centrl

More information

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe CSCI 0 fel Ferreir d Silv rfsilv@isi.edu Slides dpted from: Mrk edekopp nd Dvid Kempe LOG STUCTUED MEGE TEES Series Summtion eview Let n = + + + + k $ = #%& #. Wht is n? n = k+ - Wht is log () + log ()

More information

Address Register Assignment for Reducing Code Size

Address Register Assignment for Reducing Code Size Address Register Assignment for Reducing Code Size M. Kndemir 1, M.J. Irwin 1, G. Chen 1, nd J. Rmnujm 2 1 CSE Deprtment Pennsylvni Stte University University Prk, PA 16802 {kndemir,mji,guilchen}@cse.psu.edu

More information

Functor (1A) Young Won Lim 8/2/17

Functor (1A) Young Won Lim 8/2/17 Copyright (c) 2016-2017 Young W. Lim. Permission is grnted to copy, distribute nd/or modify this document under the terms of the GNU Free Documenttion License, Version 1.2 or ny lter version published

More information

The Greedy Method. The Greedy Method

The Greedy Method. The Greedy Method Lists nd Itertors /8/26 Presenttion for use with the textook, Algorithm Design nd Applictions, y M. T. Goodrich nd R. Tmssi, Wiley, 25 The Greedy Method The Greedy Method The greedy method is generl lgorithm

More information

Compilers Spring 2013 PRACTICE Midterm Exam

Compilers Spring 2013 PRACTICE Midterm Exam Compilers Spring 2013 PRACTICE Midterm Exm This is full length prctice midterm exm. If you wnt to tke it t exm pce, give yourself 7 minutes to tke the entire test. Just like the rel exm, ech question hs

More information

CS201 Discussion 10 DRAWTREE + TRIES

CS201 Discussion 10 DRAWTREE + TRIES CS201 Discussion 10 DRAWTREE + TRIES DrwTree First instinct: recursion As very generic structure, we could tckle this problem s follows: drw(): Find the root drw(root) drw(root): Write the line for the

More information

ON THE DEHN COMPLEX OF VIRTUAL LINKS

ON THE DEHN COMPLEX OF VIRTUAL LINKS ON THE DEHN COMPLEX OF VIRTUAL LINKS RACHEL BYRD, JENS HARLANDER Astrct. A virtul link comes with vriety of link complements. This rticle is concerned with the Dehn spce, pseudo mnifold with oundry, nd

More information

Today. Search Problems. Uninformed Search Methods. Depth-First Search Breadth-First Search Uniform-Cost Search

Today. Search Problems. Uninformed Search Methods. Depth-First Search Breadth-First Search Uniform-Cost Search Uninformed Serch [These slides were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to AI t UC Berkeley. All CS188 mterils re vilble t http://i.berkeley.edu.] Tody Serch Problems Uninformed Serch Methods

More information

Suffix trees, suffix arrays, BWT

Suffix trees, suffix arrays, BWT ALGORITHMES POUR LA BIO-INFORMATIQUE ET LA VISUALISATION COURS 3 Rluc Uricru Suffix trees, suffix rrys, BWT Bsed on: Suffix trees nd suffix rrys presenttion y Him Kpln Suffix trees course y Pco Gomez Liner-Time

More information

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy Recognition of Tokens if expressions nd reltionl opertors if è if then è then else è else relop

More information

UT1553B BCRT True Dual-port Memory Interface

UT1553B BCRT True Dual-port Memory Interface UTMC APPICATION NOTE UT553B BCRT True Dul-port Memory Interfce INTRODUCTION The UTMC UT553B BCRT is monolithic CMOS integrted circuit tht provides comprehensive MI-STD- 553B Bus Controller nd Remote Terminl

More information

Chapter44. Polygons and solids. Contents: A Polygons B Triangles C Quadrilaterals D Solids E Constructing solids

Chapter44. Polygons and solids. Contents: A Polygons B Triangles C Quadrilaterals D Solids E Constructing solids Chpter44 Polygons nd solids Contents: A Polygons B Tringles C Qudrilterls D Solids E Constructing solids 74 POLYGONS AND SOLIDS (Chpter 4) Opening prolem Things to think out: c Wht different shpes cn you

More information

12-B FRACTIONS AND DECIMALS

12-B FRACTIONS AND DECIMALS -B Frctions nd Decimls. () If ll four integers were negtive, their product would be positive, nd so could not equl one of them. If ll four integers were positive, their product would be much greter thn

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

SOME EXAMPLES OF SUBDIVISION OF SMALL CATEGORIES

SOME EXAMPLES OF SUBDIVISION OF SMALL CATEGORIES SOME EXAMPLES OF SUBDIVISION OF SMALL CATEGORIES MARCELLO DELGADO Abstrct. The purpose of this pper is to build up the bsic conceptul frmework nd underlying motivtions tht will llow us to understnd ctegoricl

More information

Outline. Introduction Suffix Trees (ST) Building STs in linear time: Ukkonen s algorithm Applications of ST

Outline. Introduction Suffix Trees (ST) Building STs in linear time: Ukkonen s algorithm Applications of ST Suffi Trees Outline Introduction Suffi Trees (ST) Building STs in liner time: Ukkonen s lgorithm Applictions of ST 2 3 Introduction Sustrings String is ny sequence of chrcters. Sustring of string S is

More information

CSEP 573 Artificial Intelligence Winter 2016

CSEP 573 Artificial Intelligence Winter 2016 CSEP 573 Artificil Intelligence Winter 2016 Luke Zettlemoyer Problem Spces nd Serch slides from Dn Klein, Sturt Russell, Andrew Moore, Dn Weld, Pieter Abbeel, Ali Frhdi Outline Agents tht Pln Ahed Serch

More information

Categorical Skylines for Streaming Data

Categorical Skylines for Streaming Data Ctegoricl Skylines for Streming Dt ABSTRACT Nikos Srks University of Toronto nsrks@cs.toronto.edu Nick Kouds University of Toronto kouds@cs.toronto.edu The prolem of skyline computtion hs ttrcted considerle

More information

Typing with Weird Keyboards Notes

Typing with Weird Keyboards Notes Typing with Weird Keyords Notes Ykov Berchenko-Kogn August 25, 2012 Astrct Consider lnguge with n lphet consisting of just four letters,,,, nd. There is spelling rule tht sys tht whenever you see n next

More information

Topic 2: Lexing and Flexing

Topic 2: Lexing and Flexing Topic 2: Lexing nd Flexing COS 320 Compiling Techniques Princeton University Spring 2016 Lennrt Beringer 1 2 The Compiler Lexicl Anlysis Gol: rek strem of ASCII chrcters (source/input) into sequence of

More information

Geometrical tile design for complex neighborhoods

Geometrical tile design for complex neighborhoods COMPUTATIONAL NEUROSCIENCE ORIGINAL RESEARCH ARTICLE pulished: 23 Novemer 2009 doi: 103389/neuro100202009 Geometricl tile design for complex neighorhoods Eugen Czeizler* nd Lil Kri Deprtment of Computer

More information

1. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES)

1. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES) Numbers nd Opertions, Algebr, nd Functions 45. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES) In sequence of terms involving eponentil growth, which the testing service lso clls geometric

More information

CSCE 531, Spring 2017, Midterm Exam Answer Key

CSCE 531, Spring 2017, Midterm Exam Answer Key CCE 531, pring 2017, Midterm Exm Answer Key 1. (15 points) Using the method descried in the ook or in clss, convert the following regulr expression into n equivlent (nondeterministic) finite utomton: (

More information

Efficient Rerouting Algorithms for Congestion Mitigation

Efficient Rerouting Algorithms for Congestion Mitigation 2009 IEEE Computer Society Annul Symposium on VLSI Efficient Rerouting Algorithms for Congestion Mitigtion M. A. R. Chudhry*, Z. Asd, A. Sprintson, nd J. Hu Deprtment of Electricl nd Computer Engineering

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

Suffix Tries. Slides adapted from the course by Ben Langmead

Suffix Tries. Slides adapted from the course by Ben Langmead Suffix Tries Slides dpted from the course y Ben Lngmed en.lngmed@gmil.com Indexing with suffixes Until now, our indexes hve een sed on extrcting sustrings from T A very different pproch is to extrct suffixes

More information

Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) *

Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) * Pln for Tody nd Beginning Next week Interpreter nd Compiler Structure, or Softwre Architecture Overview of Progrmming Assignments The MeggyJv compiler we will e uilding. Regulr Expressions Finite Stte

More information

MATH 25 CLASS 5 NOTES, SEP

MATH 25 CLASS 5 NOTES, SEP MATH 25 CLASS 5 NOTES, SEP 30 2011 Contents 1. A brief diversion: reltively prime numbers 1 2. Lest common multiples 3 3. Finding ll solutions to x + by = c 4 Quick links to definitions/theorems Euclid

More information

arxiv: v1 [cs.db] 16 Sep 2016

arxiv: v1 [cs.db] 16 Sep 2016 Blech: A Distriuted Strem Dt Clening System Yongcho Tin Eurecom yongcho.tin@eurecom.fr Pietro Michirdi Eurecom pietro.michirdi@eurecom.fr Mrko Vukolić IBM Reserch - Zurich mvu@zurich.im.com rxiv:169.5113v1

More information

CS 241. Fall 2017 Midterm Review Solutions. October 24, Bits and Bytes 1. 3 MIPS Assembler 6. 4 Regular Languages 7.

CS 241. Fall 2017 Midterm Review Solutions. October 24, Bits and Bytes 1. 3 MIPS Assembler 6. 4 Regular Languages 7. CS 241 Fll 2017 Midterm Review Solutions Octoer 24, 2017 Contents 1 Bits nd Bytes 1 2 MIPS Assemly Lnguge Progrmming 2 3 MIPS Assemler 6 4 Regulr Lnguges 7 5 Scnning 9 1 Bits nd Bytes 1. Give two s complement

More information

SERIES. Patterns and Algebra OUT. Name

SERIES. Patterns and Algebra OUT. Name D Techer Student Book IN OUT 8 Nme Series D Contents Topic Section Ptterns Answers nd (pp. functions ) identifying ptterns nd nd functions_ creting ptterns_ skip equtions counting nd equivlence completing

More information

Mobile IP route optimization method for a carrier-scale IP network

Mobile IP route optimization method for a carrier-scale IP network Moile IP route optimiztion method for crrier-scle IP network Tkeshi Ihr, Hiroyuki Ohnishi, nd Ysushi Tkgi NTT Network Service Systems Lortories 3-9-11 Midori-cho, Musshino-shi, Tokyo 180-8585, Jpn Phone:

More information