Towards Unifying Advances in Twig Join Algorithms

Size: px
Start display at page:

Download "Towards Unifying Advances in Twig Join Algorithms"

Transcription

1 Pro. 21st Austrlsin Dtse Conferene (ADC 2010), Brisne, Austrli Towrds Unifying Advnes in Twig Join Algorithms Nils Grimsmo Truls A. Bjørklund Deprtment of Computer nd Informtion Siene Norwegin University of Siene nd Tehnology Astrt Twig joins re key uilding loks in urrent XML indexing systems, nd numerous lgorithms nd useful dt strutures hve een introdued. We give strutured, qulittive nlysis of reent dvnes, whih leds to the identifition of numer of opportunities for further improvements. Cses where omining ompeting or orthogonl tehniques would e dvntgeous re highlighted, suh s lgorithms voiding redundnt omputtions nd shemes for heper intermedite result mngement. We propose some diret improvements over existing solutions, suh s redued memory usge nd stronger filters for ottom-up lgorithms. In ddition we identify ses where previous work hs een overlooked or not used to its full potentil, suh s for virtul strems, or the enefits of previous tehniques hve een underestimted, suh s for skipping joins. Using the identified opportunities s guide for future work, we re hopefully one step loser to unifition of mny dvnes in twig join lgorithms. Keywords: Twig join, indexing, semi-strutured dt. 1 Introdution Twig mthing is the most hevily used uilding lok for systems offering serh in XML with lnguges like XPth nd XQuery [13]. XML hs eome the de-fto stndrd for storge of semi-strutured dt, nd the stndrd for dt exhnge etween disjoint informtion systems. XPth is delrtive lnguge, nd XQuery is n itertive lnguge whih uses XPth s uilding lok. XPth queries n e evluted in polynomil time [12]. Most demi work relted to indexing nd querying XML fouses on the twig mthing prolem, whih is equivlent to su-set of XPth: Given leled dt tree nd leled query tree, find ll mthings of the query nodes to the dt nodes, where the dt nodes stisfy the nestor-desendnt (-d) nd prent-hild (p-) reltionships speified y the query tree edges. The exmple in Figure 1 shows the reltion etween twig mthing nd XML serh. The tree in prt () is n strtion of the XML doument in (). Rel XML seprtes element (tg), ttriute nd text nodes, ut in the strt model there is only one First uthor supported y the Reserh Counil of Norwy under the grnt NFR Copyright 2010, Austrlin Computer Soiety, In. This pper ppered t the Twenty-First Austrlsin Dtse Conferene (ADC2010), Brisne, Austrli, Jnury Conferenes in Reserh nd Prtie in Informtion Tehnology (CRPIT), Vol. 104, Heng To Shen nd Athmn Bouguetty, Ed. Reprodution for demi, not-for profit purposes permitted provided this text is inluded. type of nodes. The XPth nd XQuery exmples in (d) oth speify the sme struture s the strt twig query in (), where doule edges symolize -d reltionships. This work fouses on twig mthing in indexed dt trees. In typil setting, ll dt nodes with the sme lel re stored together, using some enoding whih speifies tree positions. To evlute query, one strem of dt nodes with mthing lel is red for eh query node, nd re joined to form twig mthes. This pper gives strutured nlysis of reent dvnes in twig join lgorithms, whih leds to the identifition of numer of opportunities for further improvements. Some diret improvements re identified, suh s redued memory usge in ottom-up lgorithms nd stronger top-down filters. We highlight ses where new omintions of ompeting nd orthogonl tehniques would hve ler dvntges, ut lso ses where importnt previous work hs een hs een ompred to unfirly in our view. We note some open hllenges, suh s updtility in strong struturl summries, nd more effiient detetion of ses where simpler nd fster lgorithms n e used (Setion 3.7). The nlysis explores tehniques for voiding redundnt omputtions (Setion 3.1), shemes for intermedite result mngement (Setion 3.2), topdown filters for ottom-up lgorithms (Setion 3.3), skipping joins (Setion 3.4), refined ess methods (Setion 3.5) nd virtul strems (Setion 3.6). () () <> </> <> </> </> </> </> </> </> () //[.//][] for $n in // let $n := $n// where $n/ order y $n return ($n, $n) Figure 1: XML nd twig mthing reltion. () Astrt dt tree. () Twig query. () XML Dt. (d) XPth (ove) nd XQuery (elow). (d) 2 Bkground: Conepts nd Tehniques This setion goes through some fundmentl onepts nd tehniques whih re useful for the understnding of lter lgorithms. First we formlly define the prolem. Definition 1 (Twig mthing). Given rooted unordered leled query tree Q nd rooted ordered leled dt tree D, find ll omplete mthings of the nodes in Q, suh tht the mthed nodes in D follow the struturl requirements given y the -d nd p- edges in Q. 57

2 CRPIT Volume Dtse Tehnologies 2010 Pth m-wy loop. PthMPMJ [5] Twig m-wy loop. Not explored. Leverging RDBMS. Speilized loop joins. -d optiml. MPMGJN [30] Tree-Merge-Des/-An [1] Stk joins. -d & p- optiml. Stk-Tree-Des/-An [1] Pth m-wy stk. -d & p- optiml. PthStk [5] Twig m-wy stk. -d optiml. TwigStk [5] Skipping. An des B+ [8] XR-Stk [15] Figure 2: The erly history of twig joins. Continued in Figure 8. Note tht there is slight differene etween the semntis of twig mthing nd XPth. A twig query returns ll legl omintions of node mthes, while in XPth there is single query return node. The generlity of returning ll legl omintions of mthes in twig mthing my hve een the demi fous point euse it is useful for the flexiility in XQuery. XPth n lso speify more thn -d nd p- reltionships, ut mjority of XPth queries in prtie use only the -d nd p- xis [13]. Mny erly pprohes to serh in semistrutured dt used omintions of indexing nd tree nvigtion, ut the min fous the lst dede hs een on indexing with inverted lists nd struturl joins of strems of query node mthes [1]. This pper only onsiders twig join lgorithms. Indexing nd node enoding is ritil for the effiieny of twig joins. Usully dt nodes re indexed (prtitioned) on node lels, using for exmple inverted lists. Two spets of how dt is stored inside prtitions re importnt: How the position of node is enoded, nd how the nodes in the prtition re ordered. For most lgorithms nodes re stored in depth first trversl pre-order, suh tht sendnts re seen efore desendnts. The positionl informtion whih follows nodes must llow deision of -d nd p- reltionships. The most ommon is the regionl egin,end,level (BEL) enoding [30], whih is used in the dt extrtion exmple in Figure 3. It reflets the positions of opening nd losing tgs in XML (see Figure 1). The egin nd end numers re not the sme s pre- nd post-order trversl numers, ut give the sme sorting orders. 1,8,2 3,6 4,4 5,5 7,7 () Dt. tg B E L () Extrted. () Query. ( 1,1,2 7,7 ) ( 1,1 5,5 7,7 ) ( 3,6 5,5 4,4 ) (d) Mthes. Figure 3: Tree indexing nd querying exmple. In the following, let T q denote the strem of mthes for query node q, nd C q denote the urrent dt node in this strem. For simpliity, polynomil ftors in the size of the query re ignored in symptoti nottion. The erly history of twig joins is shown in Figure 2. An erly pproh for shem gnosti XML indexing ws to store nodes with BEL enoding in n RDBMS, nd speify query node reltions s numer of inequlities. But these thet-joins re expensive. Speilized loop struturl joins whih leverged the knowledge tht the dt enoded is tree where introdued [30, 1]. These hve O(I + O) ost for evluting n -d reltionship, where I nd O re the sizes of the input nd output strems, ut qudrti worst-se for p- reltionships. Stk joins were introdued to get optiml evlution for ll inry struturl joins. A prolem with omining the evlution of numer of inry reltionships to nswer query, is tht the intermedite results my e of size exponentil in the query, even if the output is smll. This led to the introdution of multi-wy join lgorithms. Stks re key dt strutures in most modern twig join lgorithms. Their use here is motivted y their use in depth first tree trversls. To join strems of sendnts nd desendnts, stk of urrently nested nestor nodes is mintined. Nodes re popped off the nestor stk when non-ontined (disjoint) node is seen in either strem. In pth or twig multi-wy lgorithm, there must e one stk S qi for eh internl query node q i. The mthes for different query nodes must e proessed in totl preorder, to ensure tht nestor nodes re dded efore desendnts need them. In eh step in the PthStk lgorithm [5], the urrent dt node is used to len ll stks y popping non-ontining nodes, efore it is pushed on stk. Figure 4 shows the stks for query when evluted on the dt in Figure 4, right fter the node 1 hs een pushed. When the urrent query node is lef, ll relted mthes re output. To enle liner time enumertion of the mthes enoded in the stks, eh dt node pushed onto stk hs pointer to the losest ontining dt node in the prent stk, whih would e the top of the prent stk s the dt node ws pushed. Nodes ove on the prent stk nnot e sendnts, s the dt nodes re red in pre-order. In the exmple, nd 2 re not usle together. Beuse stk only ontin nested nodes, the spe needed is O(d), where d is the mximl depth of the dt tree () Dt tree () Query with stks fter pushing 1. Figure 4: Dt strutures for PthStk. A tehnique for getting pth mthes sorted on higher query node mthes first is ritil for the effiieny of TwigStk nd other twig multi-wy lgorithms. Delying out-of-order output is hieved y mintining so-lled self- nd inherit-lists for eh stked node [1]. The lists for the dt nd query in Figure 4 is shown in Figure 5. As node is popped off stk, the ontents of its lists re ppended to the inherit-lists of the node elow on the sme stk, if there is one. This is to mintin orret output order. See for exmple the lists for nd in the exmple. But if the popped node n use some nestor node in the prent stk, whih the node elow in its own stk nnot, the ontents 1 58

3 of the lists must e ppended to the self-lists there. This is deided from the inter-stk pointers. In the exmple, popping node 3 results in dding ( ) to the self-list of 2. PthStk hs O(I + O) omplexity oth with nd without delying output, where I is now the totl input size. 2 1 ( ) ( 1 1 )( 1 1 )( ) ( ) 3 ( 3 1 ) ( 1 ) ( 3 1 ) ( 1 ) ( 1 )( 3 1 ) Figure 5: Stk nodes with finl self- nd inherit lists for the dt nd query in Figure 4. Drker nodes popped first. TwigStk [5] ws the first holisti twig join lgorithm. Using PthStk on eh root-to-lef pth in twig query nd merging the mthes, my led to mny useless intermedite results, euse pth mthes need not e prt of omplete mthes. TwigStk improved on this, nd hieved O(I + O) omplexity for queries with -d edges only. It is two-phse lgorithm, where the first phse outputs mthes for eh root-to-lef pth, nd the seond phse merge joins the pth mthes. The first phse does two things whih re ritil for the performne of the lgorithm: It only outputs pth mthes whih possily re prt of some omplete query mth, nd outputs pths sorted on higher query nodes first, using the tehnique from [1]. This llow liner merge in phse two. TwigStk does dditionl heking efore pushing nodes on stk ompred to PthStk. The dt node t the hed of the strem for query node q is not pushed on stk efore it hs so lled solution extension, whih mens tht the heds of the strems of ll hild query nodes re ontined y C q, nd tht hild nodes reursively stisfy this property. Also, node is not pushed on stk unless there is usele nestor dt node on the stk for the prent query node. Pseudo-ode for TwigStk is shown in Algorithm 1 (dpted from [5]). It is inluded here to ese the depition of the improvements disussed in the following setions. Eh query node q hs n ssoited strem T q with urrent element C q, nd stk S q. The lgorithm revolves round reursive funtion getnext(q), whih returns (lolly) uppermost query node in the sutree of q whih hs solution extension. If the prent of the returned q hs usle nestor dt node on stk, this mens C q is prt of full solution extension identified erlier, nd C q is pushed on S q. A pth mth is found when lef node is pushed on stk, ut output is delyed to mke sure pths re ordered on the query nodes top down (lled loking in [5]). Note tht tully pushing lef node on stk is unneessry, s it will e popped right off. The getnext() trversl is ottom up, nd is short ut if some node does not hve solution extension (see line 20). Leves trivilly hve solution extensions. The trversl hs the side effet of dvning the treted query node t lest until it ontins ll it s hildren (line 23). If it does not ontin ll hildren t this point, the hild urrently with the first pre-order dt node (lowest egin vlue) is returned to e forwrded in line 12. Figure 6 shows the stte of the lgorithm when evluting the query in Figure 3, right fter node 5,5 hs een proessed. After the first ll to getnext(), when ll the strems where t their strt position, Pro. 21st Austrlsin Dtse Conferene (ADC 2010), Brisne, Austrli Algorithm 1 TwigStk 1: funtion TwigStk(Q) 2: while not tend(q) 3: q := getnext(q.root) 4: if not isroot(q) 5: lenstk(s prent(q), C q) 6: if isroot(q) or not empty(s prent(q) ) 7: lenstk(s q, C q) 8: push(s q, C q, top(s prent (q))) 9: if islef (q) 10: outputpthsdelyed(c q) 11: pop(s q) 12: dvne(t q) 13: mergepthsolutions() 14: funtion getnext(q) 15: if islef (q) 16: return q 17: for q i hildren(q) 18: q j := getnext(q i) 19: if q j q i 20: return q j 21: q min = min rg qi hildren(q) {Cq i.egin} 22: q mx = mx rg qi hildren(q) {Cq i.egin} 23: while C q.end < C qmx.egin 24: dvne(c q) 25: if C q.egin < C qmin.egin 26: return q 27: else 28: return q min T,2 5,5 T 1,8 3,6 () Strems. T 4,4 7,7 5,5,2 3,6 1,8 S 4,4 S () Stks S 1,8,2 1,8 5,5 1,8 4,4 3,6 5,5 3,6 7,7 () Pth mthes. Figure 6: TwigStk stte when evluting query in Figure 3, fter proessing node 5,5. itself ws returned s it hd solution extension, nd C = 1,8 ws pushed on stk. For the seond ll to getnext(), this ws not the se, nd ws returned, with hed C =,2. Sine,2 hd usle nestor 1,8 on the prent stk S, 1,8 must hve hd solution extension, in whih the sutree rooted t,2 ws usle. So C =,2 ws pushed on its own stk S, nd sine it ws lef, the pth mthing ( 1,8,2 ) ws output. After ll pths hve een found they re merge joined. TwigStk suoptimlity for mixed -d nd p- queries omes from hving to output pth mthes without knowing whether the dt nodes used n stisfy ll their p- reltionships. The lgorithm nnot lwys deide this from the nodes on the stks nd the heds of the strems. For the exmple in Figure 7 it nnot e deided if the pth mthes ( 1, ),..., ( 1, n ) re prt of full mth efore the node n+1 is seen. () Query. 1 n 2 n+1 n+1 1 n () Dt. Figure 7: Bd se for TwigStk. For -d only queries, queries where p- edge never follows n -d edge [25], or on dt with nonreursive shem [10], twig joins n e solved with liner ost in the size of the input nd the output, using O(d) memory. Sdly, reursive shem, where nodes with given lel my nest nodes with the 59

4 CRPIT Volume Dtse Tehnologies 2010 Nested loop twig m-wy. Fst on simple q.? 2-phse holisti top-down. -d optiml. TwigStk [5] Refined ess methods. tg, tg+level, pth. itwigjoin [7] twig. TwigVersion [28] Virtul strems. Virtul Cursors [29] TJFst [22] (TwigOptiml [11]) Skipping joins. TwigStkXB [5] Avoiding redundnt omp. TJEssentil [20] TwigStk + [31] 1-phse top-down. -d & p- optiml. HolistiTwigStk [17] Bottom-up + filtering. Twig 2 Stk+PStk [6] Twig 2 Stk+TStk [3] Skipping in lef strems. Virtul Cursors [29] Effiient n. skip. TSGeneri+ [16] Holisti skipping. TwigOptiml [11] TJEssentil* [20] TwigStk + B [31] Singe phse essentil? Simplified intermed. result mngement, top-down. TwigFst [21]. Holisti skipping in mthed lef strems. Holisti n. skipping. Unifition? Optiml dt ess? Comintion possile? Optiml lgorithm? Chnging dt nd dt ess Chnging lgorithms TwigMix [21] 1-phse ottom-up. -d & p- optiml. Twig 2 Stk [6] Simplified intermed. result mngement, ottom-up. TwigList [24] Figure 8: Advnes nd opportunities in twig joins. sme lel, re ommon in XML in prtie [9], nd so re mixed queries of the type mentioned ove. No lgorithm n solve the generl prolem given tg streming, liner index size, nd query evlution memory requirement of O(d) [25]. One lterntive is storing multiple sort orders on disk, insted of only tree pre-order. This would require Ω(m min m,d D) disk spe in the worst se, where m is the numer of struturlly reursive lels nd D is the size of the doument [10]. Another lterntive is to do multiple sns of the strems, ut this would require Ω(d t ) psses in the worst se, where t is liner metri on the omplexity of the query [10]. So, the only vile lterntives left seem to e relxing the O(d) spe requirement, or using something different thn tg prtitioning. The following setion investigtes this, ut lso mny prtil speedups to TwigStk. 3 Advnes A multitude of different improvements hve een presented fter the introdution of TwigStk. Figure 8 gives n overview of these, with seprtion etween improved join lgorithms nd hnges to how dt is indexed nd essed. The rest of this pper is devoted to strutured review of these dvnes. Our gol is to identify further improvements, nd to shed light on whether it is likely tht omining these dvnes is possile nd enefiil. 3.1 Avoiding Redundnt Computtion TwigStk my perform mny redundnt heks in the lls to getnext(). Eh time node is returned, the full sutree elow hs een inspeted. The TJEssentil [20] lgorithm improved three speifi defiienies, exemplified in Figure 9. The first defiieny is from self-nested mthing nodes. For query node nd dt nodes 1 to p in the exmple, it is unneessry to reursively hek the full sutrees elow nd d in eh round while pushing the nodes onto S. The usefulness of 2,..., p n e seen from the ft tht 1 hd new solution extension, nd tht 2,..., p ontins nd d 1, the heds of the strems of s hildren. The seond oservtion is on the order in whih hild nodes re inspeted. If the hild is inspeted efore d in line 18 of Algorithm 1, getnext() will ll getnext() efore getnext(d) shortuts the serh. There will e m 1 redundnt lls getnext() while forwrding the lef node e. The third oservtion is tht mny useless lls ould e mde fter strem hs rehed its end. Assuming tht ws the lst -node in Figure 9, no node lter in the tree order would ever e pushed onto stk, nd T ould e forwrded to its end. Also, if S ws empty, ny desendnt of in the query ould hve their strem forwrded to the end, s the remining nodes ould not e prt of solution. TJEssentil is totl rewrite of TwigStk, nd is more omplex thn the originl lgorithm. TwigStk + [31] is less involved modifition, whih only hnges the getnext() proedure, suh tht it does not return efore solution is found. TwigStk + does not th ny of the tree ove ses, ut redues omputtion for sttered node mthes in prtie. Opportunity 1 (Removing redundnt omputtion in top-down one-phse joins). The improvement of TwigStk + n trivilly e ported to reent lgorithms suh s HolistiTwigStk nd TwigFst, whih improve other spets of TwigStk (see Setion 3.2). A hllenge is to do the sme for ll the three improvements of TJEssentil. Also, se three ove ould e extended to more effiient ligning for multi-doument XML olletions. e 1 d 1 e m 1 p q 1 () Dt. n e d () Query. Figure 9: Giving redundnt heks in TwigStk. 60

5 3.2 Top-down vs. Bottom-up There re two min lines of lgorithmi improvements over TwigStk whih give optiml evlution of mixed -d nd p- queries y relxing the O(d) memory requirement: Bottom-up lgorithms whih red nodes in post-order, nd lter lgorithms whih go k to top-down nd pre-order. Differenes etween these re illustrted in Figure 10. Twig 2 Stk [6] genertes single omined strem with post-order sorting for ll query node mthes with the help of single stk. With postorder proessing it n e deided if n entire sutree hs mth t the time the top node is seen. Figure 10 shows the hierrhies of stks uilt while proessing query. For eh query node, list of trees of stks is mintined. A dt node stritly nests ll nodes elow in the stk, nd ll nodes in hild stks in the tree. The lists of trees re stored sorted in post-order, nd re linked together y ommon root if n nestor node is proessed. From the post-order, the nodes to e linked will lwys e found t the end of the list, nd the new root will lwys e put t the end. The order nturlly mintins itself, nd good lolity is hieved. Insted of eh node on stk hving pointer to n nestor node on prent stk s in TwigStk, eh stked dt node hs for eh relted hild query node, list of pointers to top stk nodes mthing the query xis reltionship. Nodes re only dded if -d nd p- reltionships n e stisfied, nd p- pointers re only dded when levels re orret, s seen for the nd nodes in the exmple. TwigList [24] is simplifition of Twig 2 Stk using simple lists nd intervls given y pointers, whih improves performne in prtie. For eh query node, there is post-order list of the dt nodes used so fr. Eh node in list hs, for eh hild query node, single reorded intervl of ontined nodes, () Dt () Twig 2 Stk (e) HolistiTwigStk 5 6 () Query 4 1 { { (d) TwigList { til { { { til (f) TwigFst Figure 10: () Dt. () Query. () Hierrhies of stks for Twig 2 Stk. (d) Intervls for TwigList. Curved rrows re siling pointers. (e) Lists of stks for HolistiTwigStk right efore 5 is proessed. Previously popped nodes shown in gry. (f) Intervls for TwigFst fter 5 hs een proessed. Curved rrows re nestor pointers. til Pro. 21st Austrlsin Dtse Conferene (ADC 2010), Brisne, Austrli s shown in Figure 10d. Intervl strt nd end positions re reorded s nodes re pushed nd popped on nd off the glol stk. All desendnt dt nodes re proessed in etween. Compred with the list of pointers in Twig 2 Stk, enumertion of mthes is not s effiient for p- edges, ut siling pointers n remedy this. HolistiTwigStk [17] is modifition of TwigStk whih uses pre-order proessing, ut mintins omplex stk strutures like Twig 2 Stk. The rgument ginst Twig 2 Stk ws high memory usge, used y the ft tht ll query lef mthes re kept in memory until the tree is ompletely proessed, s they ould e prt of mth. HolistiTwigStk differentites etween the top-most rnhing node nd its nestors, for whih regulr stk is used, nd lower query nodes, whih hve multiple linked lists of stks, s shown in Figure 10e. Eh query node mth hs one pointer to the first desendnt in pre-order for eh hild query node. For lower query nodes, new dt nodes re pushed onto the urrent stk if ontined, otherwise new stk is reted nd ppended to the list. As mth for n upper query node is popped, the node elow on stk must inherit the pointers. Node 1 would inherit the pointers from oth 2 nd 4 in the exmple in Figure 10e, nd the relted lists of hild mthes would e linked. TwigFst [21] is simplifition of HolistiTwigStk similr to TwigList. There is one list ontining mthes for eh query node, nturlly sorted in pre-order, nd dt nodes in the lists hve pointers giving the intervl of ontined mthes for hild query nodes, s shown in Figure 10f. Eh dt node put into the list hs pointer to its losest nestor in the sme list, nd there is til pointer, whih gives the lst position where node n e the nestor of following nodes in the strems. These pointers re used for the onstrution of the intervls. Different dvntges of top-down nd ottomup lgorithms n e seen in Figure 10. A top-down lgorithm n void storing nd 2, while ottomup lgorithm is unle to deide tht these nodes nnot e prt of solution. On the other hnd, ottom-up lgorithm n deide tht 2 is not usle, euse it nnot stisfy the p- reltionship etween nd. Both pprohes n deide tht 3 is not useful euse it does not hve desendnt. The worst se spe omplexity of twig pttern mthing is n open prolem, nd the known ounds re Ω(mx d, u) nd O(I), where u is the numer of nodes whih re prt of solution [25]. However, prtil spe svings re possile. Opportunity 2 (Top-down memory usge). TwigStk trets queries s -d only in the stk onstrution prt of phse one. A node returned from getnext() is pushed on stk if it hs usle nestor on the prent stk, even if the query speifies p- reltionship. For exmple does not 3 hve to e pushed on stk in Figure 10e, euse it does not hve usle prent. Stritly heking p- reltionships efore dding intermedite results would redue memory usge in prtie. This optimiztion ws identified for TJFst [22] (see Setion 3.6), ut the lter HolistiTwigStk nd TwigFst do not tke dvntge of this opportunity. Opportunity 3 (Bottom-up memory usge). Assume query node q with p- reltionship to the prent query node. If ndidte mth for q is pushed onto stk in Twig 2 Stk, nd the dt node elow on the stk does not hve n inoming pointer, this mens the node elow will never get mthing prent, nd n e popped off stk. For exmple 61

6 CRPIT Volume Dtse Tehnologies () Query. () No filter. () PthStk pop filter. (d) Solution extension filter. (e) TwigStk pop filter. (f) Opportunity: PthStk useful. (g) Opportunity: TwigStk useful. Figure 11: Filtering pprohes for ottom-up up proessing. Filtered nodes shown in gry. ould the node 6 e dropped in Figure 10. Also, when stk trees for q re merged, some nestor dt node i is seen. Then ll the stk trees whih do not get or hve n inoming pointer n e dropped, s ll lter ndidtes for the prent query node will e fter in the post-order. In the exmple, the stk trees ontining the single nodes 2 nd 3 ould e dropped when 1 is seen. Note tht improvements on this is hrd to trnsfer diretly to TwigList unless the lists re implemented s linked lists. But this is y fr inferior to using rrys nd rry douling on modern hrdwre, s done in TwigList [23]. Another solution is to keep one list for eh level for query nodes whih hve p- reltionship to the prent query node. Silings would then e stored ontiguously, nd intervl pointers would impliitly e to list on given level. When the first nestor of segment of nodes in need of prent is seen, the useless nodes n e over-written. This modifition would lso mke siling pointers unneessry nd improve effiieny of result enumertion. Figure 12 shows the proposed pproh for the dt nd query in Figure 10, where gry list items n e overwritten. 1: 1 2: 2 3: 5 4: 4 6 5: 3 { { 4 1 { Figure 12: Proposl for multi-level lists for TwigList. 3.3 Filtering Low memory top-down pprohes hve een used s filters to ottom-up lgorithms to redue spe usge y voiding useless nodes. Note tht this does not result in perfet solution. Assume tht node 1 in Figure 10 hd different lel. A O(d) spe top-down pre-order pproh ould not deide tht in the exmple ws not prt of mth, nd ottom-up lgorithm would hve to keep it in memory until the entire tree ws red. Figure 11-e shows the effets of different previously proposed filters. PthStk Pop Filter. In the originl Twig 2 Stk pper [6], Pth- Stk ws proposed s pre-filter to llow erly result enumertion. PthStk is run s usul, ut without its result enumertion. As disjoint nodes re popped off their stks, they re pssed to Twig 2 Stk. When the ottom node is popped from the stk of the root query node, ll results n e output, nd the hierrhil stks destroyed. A side effet of this proedure is tht only nodes tht re prt of some prefix pth mth re used (these re not neessrily prt of full root-to-lef pth mth). In Figure 11, node 1 is voided. Note tht one dt node my result in the popping of multiple nodes on multiple stks, nd tht Twig 2 Stk must reeive desendnts efore sendnts. Solution Extension Filter. TwigMix [21] is n lgorithm whih omines the simplified dt strutures in TwigList with the getnext() funtion from TwigStk s filter. This omintion gives effiient evlution for queries involving p- edges, nd redued memory usge in prtie. An dvntge of this pproh over Twig 2 Stk+PthStk is tht there is no overhed of mintining n extr set of stks, nd tht internl nodes re filtered holistilly. The downside is tht nodes re dded without even hving possile prent or nestor. Figure 11d shows tht node 2 is filtered, euse it never hs solution extension (misses node elow), while nodes 1 nd re not filtered. TwigStk Pop Filter. TwigStk n lso e used s filter for Twig 2 Stk [3]. A node is never dded to the hierrhil stks if it is not popped from top-down stk in TwigStk. As node is never pushed on stk if it does not hve usle nestor, whih gin hs solution extension, this gives dditionl filtering, t the ost of mintining the top-down stks. Figure 11e shows the improvements oth over PthStk nd solution extension s filters. An issue is tht Twig 2 Stk expets the strem of nodes to e in post-order, nd tht TwigStk my pop nodes off stks out of this order. When node is returned from getnext(), only the relted stk nd the prent stk re inspeted. Also, TwigStk does not keep lef mthes on stk, ut nested lef mthes my rrive lter. In [3] this is solved y keeping n extr queue of dt nodes into whih popped nodes re pled if the lgorithm deides lter popped nodes my preede them. A different solution ould e to llow nested nodes on query lef stks, nd to inspet ll stks when popping disjoint nodes to ensure post-order, s with the PthStk filter. Also, Twig 2 Stk tully does not need to see nodes in strit post-order, ut only to see desendnts efore sendnts. Hene, not ll stks in the query would hve to inspeted, only sendnt nd desendnt stks of the urrent node. Opportunity 4 (Stronger filters). There re further possiilities for filters with O(d) spe usge. Insted of using ll nodes popped off stks in PthStk, one ould use the nodes whih would e used in full pth mth. As lef nodes re pushed on stk, simplified enumertion lgorithm ould e run, tgging nodes whih tke prt in solutions. As n e seen in Figure 11f, this is n improvement over the previous PthStk filter, ut only prtilly over the solution extension filter, whih to greter extent filters mthes for higher query nodes. Leves trivilly hve solution extensions. The PthStk useful filter works well on lower query nodes. Note tht s the ottom-up lgorithms to greter extent hndle upper nodes themselves, filter is of most use if it removes lower query node ndidtes effetively. An even stronger filter would e to only use nodes whih would hve een output s prts of pth mthes in TwigStk, s shown in Figure 11g. None of [6, 21, 3] ompre with using ny other type of filter. A thorough omprison should ompre oth the prtil 62

7 spe redutions filters give, their solute osts, nd how their use ffet the totl omputtionl ost. Opportunity 5 (Unifition or ssimiltion). When ompring solute performne gins presented in the respetive ppers, TwigFst is the winner on performne for pure tg streming. As this is very importnt result, it should e verified independently. Before TwigFst is piked s the method of hoie, t lest the following should e nswered: (i) Cn the improvements disussed in Setion 3.1 e pplied? (ii) Is it superior to improved top-down nd ottomup omintions? (iii) Does the piture hnge when rndom ess gets more expensive ompred to omputtion? [21] does not omment on the sptil lolity of memory ess ptterns in the intermedite result dt strutures in TwigFst, while they re very good for TwigList [24]. 3.4 Skipping Joins Skipping is useful tehnique when the strems to e joined hve very different sizes. Skipping is used to jump forwrd in strem to void reding nd proessing prts of the strems whih nnot ontin useful nodes. Figure 13 shows ses where different skipping tehniques nd dt strutures n e used. () x 1 p () (e) q p x p p 1 1 p 2 q q () 1 q p q Figure 13: Benefits of skipping tehniques. () Query. () Desendnts esily skipped with B-tree. () Skip pst disrded sendnt. (d) XR-tree needed to skip sendnts. (e) Holisti skipping preferred. (f) Holisti skipping with XR-tree needed. Simple B-tree skipping n e used to skip in desendnt strems, nd to some extent in nestor strems. It is trivil to skip in the desendnt strem to find the first possile ontined node, whih is the first node with lrger egin vlue. In Figure 13, T is forwrded from 1 to q to find the first possile sendnt of. But skipping to find the next sendnt of node using the sme pproh is not effetive, s ny node with lower egin vlue my e mth. A trik for nestor skipping ws introdued in [8]. If node i is popped off stk S due to disjointness with the urrent dt node in some query node, T is forwrded to the first node not ontined y the popped node, j suh tht i.end < j.egin. An exmple of this n e seen in Figure 13. If 2 pops off stk, T n e forwrded eyond to q, euse no desendnt of ould e useful. XR-trees enle nestor skipping in the generl se [15]. Figure 13d shows n exmple where the ove trik nnot e used. The XR-tree is B-tree vrint whih n retrieve ll R nestors or desendnts of node from N ndidtes in O(log N + R) time. Typilly one tree is uilt for eh tg. To find ll sendnts of node d k, find the node i (f) p (d) x p p 1 q q q Pro. 21st Austrlsin Dtse Conferene (ADC 2010), Brisne, Austrli with the nerest preeding egin vlue, nd then ll sendnts of i in the XR-tree for. Coneptully the XR-tree ontins for eh node, the list of sendnts, whih gives qudrti spe usge when implemented nively. Liner spe usge is hieved y not storing informtion redundntly internlly in XR-tree nodes, nd y storing ommon informtion in internl XR-tree nodes. TSGeneri+ (lso lled XRTwig) [16] extends the use of the XR-tree to TwigStk, nd does two mjor modifitions to the lgorithm. The first is to skip forwrd to ontinment of the first hild in the getnext() proedure (see line 23 in Algorithm 1). The seond hnge is more involved. Before lling getnext() on ll hildren in line 18, roken edge in the query su-tree is repetedly piked, nd the two relted nodes re zig-zg forwrded until they mth. This is only done if the query node does not hve dt nodes on the stk. Choosing whih edge to fix is either top-down, ottom-up or y sttistis. Holisti skipping ws introdued in the TwigOptiml [11] lgorithm, whih uses B-trees. Figure 13e shows se where the pproh from TSGeneri+ would e very expensive, reding ll nodes - p nd 2 - p to fix the edge etween nd. TwigOptiml proesses the query ottom-up then top-down. In the ottom-up phse, nodes re forwrded to ontin their desendnts, nd in the top-down phse, nodes re forwrded until they re ontined y their prent. To void s mny dt struture reds s possile, nodes re forwrded to virtul positions, whih hve only egin vlues. When full trversl did not forwrd ny node, the node with the miniml urrent egin vlue is forwrded to rel dt node. The nme of the TwigOptiml lgorithm my e slightly misleding, s the optimlity is given skip strutures on egin vlues only. Only TSGeneri+ using simple B-trees is ompred with. The effets of the two ontriutions, holisti skipping nd the virtul positions, re not seprtely tested. TwigOptiml would not e effiient neither on the exmple in Figure 13 nor 13d. The pproh is est when there re more mthes for lower query nodes. An ommon exeption from this is queries with lef vlue predites in XML. [11] mentions skipping to the losest nestor nd then ktrking to the first nestor s possile prtil speed-up. Opportunity 6 (Holisti effetive nestor skipping). Figure 13f shows se where oth TS- Generi+ with XR-trees nd TwigOptiml would fil to e effiient. The former would zig-zg join 2 - p nd - p, nd the ltter would e unle to forwrd T to q without heking t lest ll of 2 - p for nestry of q. Comining holisti skipping nd dt strutures for effiient nestor skipping is required in roust solution. Opportunity 7 (Simpler nd fster skipping dt strutures). The XR-tree is dynmi dt struture whih supports insertions nd deletions [15]. In regulr keyword serh engines, simpler dt strutures re usully preferred to the hevier B-trees when the dt is stti or semi-stti. Similr simpler dt strutures should lso e reted for effiient nestor skipping. If their use is still expensive, tehniques similr to the trik used to skip pst disrded sendnts should e pplied when possile. 3.5 Refined Aess Methods There re lterntives to indexing nd essing dt y node lels, suh s using lel nd level, or the root-to-node pth strings of lels (lled tg+level nd prefix pth streming [7]). With refined prtitioning some method must e used to identify the 63

8 CRPIT Volume Dtse Tehnologies 2010 useful prtitions for eh query node. For prefix pth streming this would e the prtitions with dt pths mthing the root-to-node downwrd pths in the query. Struturl summries re diretory strutures used to lssify nodes sed on their surrounding struture. They where first used in omintion with tree trversls, ut hve lter een integrted with pure prtitioning shemes [19]. The most ommon is pth summry, whih is miniml tree ontining ll unique root-to-node lel pths seen in the dt. The dt nodes ssoited with summry node is lled the extent of the node. Figure 14 shows the pth summry for the dt in Figure 14, nd the extents re shown in Figure 14d. (3) (4) (1) (2) () (5) (6) [2] [3] [5] [4] [6] [1] () (1) [1] (2) [2, 7] (3) [5, 9] (4) [6, 10] (5) [3, 8] (6) [4] (d) [7] [8] [9] [10] [1] 1 [2] 2, 10 [3] 3, 6, 11 [4] 4, 5, 7, 12 [5] 8, (e) () [6] 9, 14 [7] 15 [8] 16 [9] 17 [10] Figure 14: Struturl prtitioning exmple. () is pth summry for (), with extents shown in (d). () is F&B summry for (), with extents shown in (e). Mny lterntive summry strutures hve een devised for generl grphs. A struture whih is lso diretly useful for trees is the stronger F&B-index [18], where two nodes re in the sme equivlene lss if they hve the sme prefix pth, nd hve hildren of the sme equivlene lsses. In the exmple, () is the F&B summry of the tree in (). For grphs, the F&B index n e found in O(m log n), where m nd n re the numer of edges nd nodes in the grph. It is not known whether the F&B index n e found more effiiently for trees. Opportunity 8 (Updtes in stronger summries). Simple pth summries re usully smll, nd re esily updtele. When trversing dt tree for indexing, the pth summry is used s deterministi utomton, where new nodes re dded on the fly when needed. Dt nodes n e put in the orret extents immeditely. If dt tree is updted, only the dt nodes whose extent hnges re ffeted. An interesting question is the updtility of stronger struturl summries. In the worst se for the F&B index, the struture of n entire ontining sutree elow the glol root ould hnge if dt node is dded or removed, y using or removing equivlene with nother sutree. Wht re the implitions of strtegies lessening the restritions on the F&B index? Would this give ritil frgmenttion effets in prtie? And re updtes heper in orser vrints, suh s the F+B-index [18]? Opportunity 9 (Hshing F&B summries). In some serh senrios, there re mny smll douments, whih re prts of virtul tree. Doument updtes n e implemented s doument deletes nd inserts. With simple pth summries, douments n e dded with ost liner in the doument size, y trversing the summry deterministilly. However, more refined summries re not deterministi. Are stronger summries like the F&B index suitle in 8 this model? A hllenge is tht mthing new doument in the F&B index hs ost liner in the size of the summry in the worst se, not the doument. Assume now tht 2, 10 nd 15 in Figure 14 re doument roots. The struture of eh doument is lssified y node on depth two in the F&B summry in Figure 14. If new doument is dded elow, it will either hve the struture defined y [2] or [7], or new sutree will e dded elow [1]. One possiility is to index the F&B summry y hshing eh level 2 sutree, s these represent full doument strutures. When new doument is indexed, summry of the doument struture n e uilt nd hshed, to identify mth in the glol F&B index. TwigVersion [28] is twig mthing pproh whih introdues novel two-lyer indexing sheme, with n F&B summry of the dt, nd pth summry of the F&B summry. This redues the expense of mthing in the F&B index. But s they only ompre to twig join lgorithms whih do not use struturl summries, nd lso introdue mny other ides, it is hrd to ssess the usefulness of the two-lyer pproh itself. They ompre their two lyer pproh with pure F&B index, ut do not stte how they serh in it. A ommon wy to use pth summries is to mth eh individul root-to-lef pth, nd prune wy mthes whih nnot e prt of full mth [7, 2]. Another solution, whih is more roust for lrge pth summries, is to lel prtition the summry nd run full twig join lgorithm on it. In [3] novel omintion of Twig 2 Stk nd TwigStk is used for mthing in lrge pth summries (see Setion 3.3). Opportunity 10 (Exploring summry strutures nd how to serh them). Mny twig join lgorithms hve leverged the enefits of pth summries. Stronger summries like the F&B index re not s ommonly used, mye euse of worst se size nd implementtionl omplexity. Using different lgorithms to serh vrious types nd omintions of summries hs not een thoroughly explored. An evlution should ddress the totl enefits of different single- nd multi-level strtegies, ut lso detil the lol ost of speifi mthing methods in speifi summry types of different sizes. Multi-strem ess my e required for single query node when prtitioning on pth, s there my e mny pth mthes. One solution is to merge the strems. Another is to hve tg prtitioned store, nd filter the dt nodes on mthing pth ID [29]. A speedup to this pproh is to hin together nodes with the sme pth [19]. This is lso useful when indexing text nodes on vlue nd integrting struture informtion. S 3 [14] is twig mthing system whih tkes ll possile omintions of individul strems mthing query nodes sed on prefix pth, nd solves eh omintion seprtely, merging the results of eh evlution. This pproh does not give polynomil worst-se ounds. Bloking is the reson for the su-optimlity of TwigStk. When prtil mthes must e output to evlute the dt nd query in Figure 15, it is euse nd 1 loks eh others ess to 2 nd respetively. Using more fine grined prtitioning nd streming solves some loking issues, euse there re multiple heds of strems. A prtitioning whih refines nother lwys inherits the redued loking [7]. In tg+level streming there is no loking when p- edges only re used elow the query root [7]. But in mixed queries, suh s in Figure 15, loking n 64

9 still our. There the strem for tg d level 3 is [d 1, d 2 ] nd the strem for level 4 is [ 1, 2 ]. There re only two mthes for the query, nd dt nodes 1 nd d 1 lok eh other. Prefix pth streming results in no loking when there is single rnhing node in the query. It solves the se in Figure 15 optimlly, ut not the one in 15. There the strem for the pth is [ 1, 2 ], nd the strem for the pth e is [e 1, e 2 ]. Even though 2 is lso usle in the mth with root 1, it nnot e known whether or not 1 is usle, euse e 1 loks for e 2. Pro. 21st Austrlsin Dtse Conferene (ADC 2010), Brisne, Austrli () (1) (2) (6) (3) (7) (4) (5) () (1) 1 (2) (3) (4) (5) (6) 1.1 (7) () (d) (7) (4) (6) 1.1 (3) (3) Figure 16: Virtul strem exmple. () Dt. () Pth summry. () Extents of summry nodes. (d) Query. (e) Lef strems. (e) () e 1 d d 2 1 () d 3 2 d d 2 5 e 1 d 1 e 2 Figure 15: Cses of dt nd query loking with () tg streming, () tg+level streming, () prefix pth streming. Adpted from [7]. itwigjoin [7] uses speilized pproh for essing multiple useful strems, whih supports ny prtitioning sheme. In its vrint of getnext(q), it onsiders for eh mthing strem, the strems whih re usle together with this strem, for eh hild of q. This redues the mount of loking when more fine grined prtitioning is used. The spe usge for itwigjoin is O(d), nd the running time is O(t(I + O)) when no loking ours, where t is the numer of useful strems. Opportunity 11 (Aess methods for multiple mthing prtitions). Strtegies for essing multiple useful strems inlude merging, filtering nd hining of input, informed merging ess like in itwigjoin, nd merging the output of multiple joins. Mny uthors do not rgue for the rtionle of their hoie of how to ess multiple useful prtitions for node. A new ess prdigm is presented in [7], ut only tg streming is ompred with. The enefit of the method for essing multiple mthing strems is not seprted from the enefit of redued totl input size. The tehnique redues the numer of intermedite pths output y phse one in TwigStk, nd would undoutedly redue the mount of memory needed for intermedite results in time optiml topdown lgorithms like HolistiTwigStk nd Twig- Fst, ut it is not ertin if it is win-win oth on memory nd speed in prtie. 3.6 Virtul Strems Another pproh tht n led to reding less input is using virtul strems for internl nodes, y inferring the existene of nodes from their desendnts. This requires position enoding whih llows nestor position reonstrution [27], suh s Dewey [26]. Anestor lel pths must lso e inferele, nd pth summry is n exellent tool for this. Consider the exmple in Figure 16, where strems of nodes mthing lef lel pths re shown. For the node with pth (4)..., it n e inferred tht one ndidte for the query root is 1.2 with pth (2).. Virtul Cursors is n implementtion of virtul strems using Deweys nd pth summries [29]. Generting next mth for n internl query node () e d is done y going through the prefixes of lef node s Dewey nd pth, nd using those where the ending tg is orret. The serh stops when the new Dewey is lexiogrphilly greter thn the previous, mening lter in the pre-order. [29] does not give detils on how nestor ndidtes re generted, ut this n e done in time liner in the depth of the lef mth used. Forwrding the entire query is done y repetedly piking lef with roken pth, forwrding it to ontinment y the mximl nestor, nd then forwrding ll nestors virtully to ontin the lef. In the system desried y [29], tg streming with pth ID filtering ws used, nd B-trees were used for skipping during lef forwrding. Other virtul strem pprohes hve lter een introdued. TJFst [22] is n independently developed lgorithm whih does not use struturl summry, ut stores with eh dt node the root-tonode lel pth nd the Dewey enoding together in ompressed formt. Lel pths re mthed for eh node proessed. An improvement over Virtul Cursors s desried, is tht pth mthing is lso done when generting internl nodes, giving fewer useless ndidtes. Also, non-rnhing internl nodes n e ignored during query evlution, euse they re impliit from the pth mthings of elow nd ove nodes. TJFst does not produe strems for internl nodes, ut mintins sets of urrently possile ndidtes. TwigVersion [28] nd S 3 [14] (see Setion 3.5) re non-holisti pprohes whih omine struturl summries nd inferene of internl node mthes. TwigVersion omputes sets of mthes ottom-up. Eh hild node query genertes set of ndidtes for its prent query node sed on its own mthes, nd these sets re then interseted. S 3 uses the potentilly exponentil numer of wys query mthes the summry, nd evlutes eh suh mth, merging the results. For one summry mthing, it looks t the query lef nodes pirwise, nd merge joins sets sed on lowest ommon nestor query nodes. This ould give lrge useless intermedite results. The holisti skipping lgorithm TwigOptiml [11] does prtilly implement virtul strems through its virtul positions (see Setion 3.4). Opportunity 12 (Improved virtul strems). To redue the numer of mthes nd mke it possile to ignore non-rnhing internl nodes, only struturlly mthing internl nodes should e generted. A struturl summry n e used to void repeted mthing of pths. However, how to store pth mthing informtion is not ovious. Given mthing pth for lef query node, there my e n exponentil numer of omintions for mthing the ove query nodes. Should the mthes e lulted on the fly s in TJFst, kept in independent sets for eh node ove lef mth, or enoded in stks? Or is it enough to store ndidtes for the lowest rnhing node ove eh lef mth, if the 65

10 CRPIT Volume Dtse Tehnologies 2010 query nodes on pth re proessed ottom-up? It is ommon to store Deweys in suint formt to redue spe usge, ut in ddition, some sheme should lso e devised to redue the redundny of using relted Deweys in sendnt nd desendnt nodes. It is lso preferle if node enodings do not hve to e fully de-ompressed to e ompred during query evlution, ut tht the ompressed storge formt llows for hep diret omprisons. Opportunity 13 (Holisti skipping mong lef strems). In some sense, virtul strems re skipping y not generting unrelted mthes for internl query nodes. The Virtul Cursors lgorithm does perform skipping whih is pth-holisti, in the wy roken root-to-lef pths re fixed. The order in whih leves re piked is not speified [29], ut query node pre-order ould hve een used. If the lexiogrphilly lrgest roken lef ws piked, the skipping would eome truly holisti. The work in [29], in omintion with some intermedite result hndling method from Setion 3.2, my e suitle strting point in the hunt for the ultimte twig mthing pproh, ut the work is not muh ompred with, or even referened. 3.7 Query Diffiulty Clsses As mentioned in Setion 2, the diffiulty of twig joins omes from mixture of -d edges followed y p- edges in queries, in omintion with struturlly reursive lels in the dt. [25] shows tht when n -d edge is never followed y p- edge downwrds in query, it n e evluted in liner time nd O(d) spe. When there in ddition is single return node (s in XPth), it n lso e evluted in O(1) spe. If fter omining ll the dvnes listed in this pper, fster evlution methods still exist for some lsses of queries, prtil implementtions should tke dvntge of this. Opportunity 14 (Identifying nd using diffiulty lsses). Cn the orretness of using simpler mthing lgorithm e deided not only from the query, ut lso from the query nd the dt? Struturl summries give possiilities for this. Wht hppens if there re only single pth ndidtes for some query nodes? Wht hppens when the tree level for mthes of query node is fixed? Wht hppens if dt node mthes with given pths for some query nodes fix pth mthes for other query nodes? In [2] dditionl informtion is olleted in pth summry, noting whether node lwys hs given hild, nd whether there is t most one hild. This informtion is used there to simplify query evlution when there re non-return nodes in the query, suh s in XPth. Could suh sttistis lso llow detetion of more ses where query evlution n e simplified for generl twig mthing? 4 Conlusion We hve given strutured nlysis of reent dvnes, nd identified numer of opportunities for further reserh, fousing oth on join lgorithms nd index orgniztion strtegies. Hopefully this hs given n overview whih hs led us one step further towrds unifition of the numerous dvnes in this field. One onlusion is tht given its sheer volume, it seems nerly impossile to onsider ll relted work when presenting new twig join tehniques. The field would enefit gretly from n open soure repository of lgorithms nd dt strutures. Referenes [1] S. Al-Khlif, H. Jgdish, N. Kouds, J. Ptel, nd D. Srivstv. Struturl joins: A primitive for effiient XML query pttern mthing. In Pro. ICDE, [2] A. Arion, A. Bonifti, I. Mnolesu, nd A. Pugliese. Pth summries nd pth prtitioning in modern XML dtses. In Pro. WWW, [3] R. Bč, M. Krátký, nd V. Snášel. On the effiient serh of n XML twig query in lrge DtGuide trees. In Pro. IDEAS, [4] R. Bč nd M. Krátký. On the effiieny of prefix pth holisti lgorithm. In Pro. XSym, [5] N. Bruno, N. Kouds, nd D. Srivstv. Holisti twig joins: Optiml XML pttern mthing. In Pro. SIGMOD, [6] S. Chen, H.-G. Li, J. Ttemur, W.-P. Hsiung, D. Agrwl, nd K. S. Cndn. Twig 2 Stk: ottom-up proessing of generlized-tree-pttern queries over XML douments. In Pro. VLDB, [7] T. Chen, J. Lu, nd T. W. Ling. On oosting holism in XML twig pttern mthing using struturl indexing tehniques. In Pro. SIGMOD, [8] S. Chien, Z. Vgen, D. Zhng, V. Tsotrs, nd C. Zniolo. Effiient struturl joins on indexed XML douments. In Pro. VLDB, [9] B. Choi. Wht re rel DTDs like. Tehnil Report MS- CIS-02-05, University of Pennsylvni, [10] B. Choi, M. Mhoui, nd D. Wood. On the optimlity of holisti lgorithms for twig queries. In Pro. DEXA, [11] M. Fontour, V. Josifovski, E. Shekit, nd B. Yng. Optimizing ursor movement in holisti twig joins. In Pro. CIKM, [12] G. Gottlo, C. Koh, nd R. Pihler. Effiient lgorithms for proessing XPth queries. In Pro. VLDB, [13] G. Gou nd R. Chirkov. Effiiently querying lrge XML dt repositories: A survey. Knowl. nd Dt Eng., [14] S. K. Izdi, T. Härder, nd M. S. Hghjoo. S 3 : Evlution of tree-pttern XML queries supported y struturl summries. Dt Knowl. Eng., [15] H. Jing, H. Lu, W. Wng, nd B. C. Ooi. XR-tree: Indexing XML dt for effiient struturl joins. In Pro. ICDE, [16] H. Jing, W. Wng, H. Lu, nd J. Yu. Holisti twig joins on indexed XML douments. In Pro. VLDB, [17] Z. Jing, C. Luo, W.-C. Hou, nd Q. Z. D. Che. Effiient proessing of XML twig pttern: A novel one-phse holisti solution. In Pro. DEXA, [18] R. Kushik, P. Bohnnon, J. F. Nughton, nd H. F. Korth. Covering indexes for rnhing pth queries. In Pro. SIGMOD, [19] R. Kushik, R. Krishnmurthy, J. F. Nughton, nd R. Rmkrishnn. On the integrtion of struture indexes nd inverted lists. In Pro. SIGMOD, [20] G. Li, J. Feng, Y. Zhng, nd L. Zhou. Effiient holisti twig joins in lef-to-root omining with root-to-lef wy. In Pro. Advnes in Dtses: Conepts, Systems nd Applitions, [21] J. Li nd J. Wng. Fst mthing of twig ptterns. In Pro. DEXA, [22] J. Lu, T. Ling, C. Chn, nd T. Chen. From region enoding to extended Dewey: On effiient proessing of XML twig pttern mthing. In Pro. VLDB, [23] L. Qin. Personl orrespondene, [24] L. Qin, J. X. Yu, nd B. Ding. TwigList: Mke twig pttern mthing fst. In Pro. DASFAA, [25] M. Shlem nd Z. Br-Yossef. The spe omplexity of proessing XML twig queries over indexed douments. In Pro. ICDE, [26] I. Ttrinov, S. D. Vigls, K. Beyer, J. Shnmugsundrm, E. Shekit, nd C. Zhng. Storing nd querying ordered XML using reltionl dtse system. In Pro. SIGMOD, [27] F. Weigel. Struturl summries s ore tehnology for effiient XML retrievl. PhD thesis, Ludwig-Mximilins- Universität Münhen, [28] X. Wu nd G. Liu. XML twig pttern mthing using version tree. Dt & Knowl. Eng., [29] B. Yng, M. Fontour, E. Shekit, S. Rjgopln, nd K. Beyer. Virtul ursors for XML joins. In Pro. CIKM, [30] C. Zhng, J. Nughton, D. DeWitt, Q. Luo, nd G. Lohmn. On supporting ontinment queries in reltionl dtse mngement systems. SIGMOD Re., [31] J. Zhou, M. Xie, nd X. Meng. TwigStk + : Holisti twig join pruning using extended solution extension. Wuhn University Journl of Nturl Sienes,

CS 241 Week 4 Tutorial Solutions

CS 241 Week 4 Tutorial Solutions CS 4 Week 4 Tutoril Solutions Writing n Assemler, Prt & Regulr Lnguges Prt Winter 8 Assemling instrutions utomtilly. slt $d, $s, $t. Solution: $d, $s, nd $t ll fit in -it signed integers sine they re 5-it

More information

Duality in linear interval equations

Duality in linear interval equations Aville online t http://ijim.sriu..ir Int. J. Industril Mthemtis Vol. 1, No. 1 (2009) 41-45 Dulity in liner intervl equtions M. Movhedin, S. Slhshour, S. Hji Ghsemi, S. Khezerloo, M. Khezerloo, S. M. Khorsny

More information

Outline. Motivation Background ARCH. Experiment Additional usages for Input-Depth. Regular Expression Matching DPI over Compressed HTTP

Outline. Motivation Background ARCH. Experiment Additional usages for Input-Depth. Regular Expression Matching DPI over Compressed HTTP ARCH This work ws supported y: The Europen Reserh Counil, The Isreli Centers of Reserh Exellene, The Neptune Consortium, nd Ntionl Siene Foundtion wrd CNS-119748 Outline Motivtion Bkground Regulr Expression

More information

Midterm Exam CSC October 2001

Midterm Exam CSC October 2001 Midterm Exm CSC 173 23 Otoer 2001 Diretions This exm hs 8 questions, severl of whih hve suprts. Eh question indites its point vlue. The totl is 100 points. Questions 5() nd 6() re optionl; they re not

More information

Pattern Matching. Pattern Matching. Pattern Matching. Review of Regular Expressions

Pattern Matching. Pattern Matching. Pattern Matching. Review of Regular Expressions Pttern Mthing Pttern Mthing Some of these leture slides hve een dpted from: lgorithms in C, Roert Sedgewik. Gol. Generlize string serhing to inompletely speified ptterns. pplitions. Test if string or its

More information

Paradigm 5. Data Structure. Suffix trees. What is a suffix tree? Suffix tree. Simple applications. Simple applications. Algorithms

Paradigm 5. Data Structure. Suffix trees. What is a suffix tree? Suffix tree. Simple applications. Simple applications. Algorithms Prdigm. Dt Struture Known exmples: link tble, hep, Our leture: suffix tree Will involve mortize method tht will be stressed shortly in this ourse Suffix trees Wht is suffix tree? Simple pplitions History

More information

Introduction to Algebra

Introduction to Algebra INTRODUCTORY ALGEBRA Mini-Leture 1.1 Introdution to Alger Evlute lgeri expressions y sustitution. Trnslte phrses to lgeri expressions. 1. Evlute the expressions when =, =, nd = 6. ) d) 5 10. Trnslte eh

More information

Parallelization Optimization of System-Level Specification

Parallelization Optimization of System-Level Specification Prlleliztion Optimiztion of System-Level Speifition Luki i niel. Gjski enter for Emedded omputer Systems University of liforni Irvine, 92697, US {li, gjski} @es.ui.edu strt This pper introdues the prlleliztion

More information

Internet Routing. IP Packet Format. IP Fragmentation & Reassembly. Principles of Internet Routing. Computer Networks 9/29/2014.

Internet Routing. IP Packet Format. IP Fragmentation & Reassembly. Principles of Internet Routing. Computer Networks 9/29/2014. omputer Networks 9/29/2014 IP Pket Formt Internet Routing Ki Shen IP protool version numer heder length (words) for qulity of servie mx numer remining hops (deremented t eh router) upper lyer protool to

More information

Type Checking. Roadmap (Where are we?) Last lecture Context-sensitive analysis. This lecture Type checking. Symbol tables

Type Checking. Roadmap (Where are we?) Last lecture Context-sensitive analysis. This lecture Type checking. Symbol tables Type Cheking Rodmp (Where re we?) Lst leture Contet-sensitie nlysis Motition Attriute grmmrs Ad ho Synt-direted trnsltion This leture Type heking Type systems Using synt direted trnsltion Symol tles Leil

More information

COMP108 Algorithmic Foundations

COMP108 Algorithmic Foundations Grph Theory Prudene Wong http://www.s.liv..uk/~pwong/tehing/omp108/201617 How to Mesure 4L? 3L 5L 3L ontiner & 5L ontiner (without mrk) infinite supply of wter You n pour wter from one ontiner to nother

More information

COMP 423 lecture 11 Jan. 28, 2008

COMP 423 lecture 11 Jan. 28, 2008 COMP 423 lecture 11 Jn. 28, 2008 Up to now, we hve looked t how some symols in n lphet occur more frequently thn others nd how we cn sve its y using code such tht the codewords for more frequently occuring

More information

Minimal Memory Abstractions

Minimal Memory Abstractions Miniml Memory Astrtions (As implemented for BioWre Corp ) Nthn Sturtevnt University of Alert GAMES Group Ferury, 7 Tlk Overview Prt I: Building Astrtions Minimizing memory requirements Performnes mesures

More information

Distributed Systems Principles and Paradigms. Chapter 11: Distributed File Systems

Distributed Systems Principles and Paradigms. Chapter 11: Distributed File Systems Distriuted Systems Priniples nd Prdigms Mrten vn Steen VU Amsterdm, Dept. Computer Siene steen@s.vu.nl Chpter 11: Distriuted File Systems Version: Deemer 10, 2012 2 / 14 Distriuted File Systems Distriuted

More information

CMPUT101 Introduction to Computing - Summer 2002

CMPUT101 Introduction to Computing - Summer 2002 CMPUT Introdution to Computing - Summer 22 %XLOGLQJ&RPSXWHU&LUFXLWV Chpter 4.4 3XUSRVH We hve looked t so fr how to uild logi gtes from trnsistors. Next we will look t how to uild iruits from logi gtes,

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distriuted Systems Priniples nd Prdigms Christoph Dorn Distriuted Systems Group, Vienn University of Tehnology.dorn@infosys.tuwien..t http://www.infosys.tuwien..t/stff/dorn Slides dpted from Mrten vn Steen,

More information

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

In the last lecture, we discussed how valid tokens may be specified by regular expressions. LECTURE 5 Scnning SYNTAX ANALYSIS We know from our previous lectures tht the process of verifying the syntx of the progrm is performed in two stges: Scnning: Identifying nd verifying tokens in progrm.

More information

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries Tries Yufei To KAIST April 9, 2013 Y. To, April 9, 2013 Tries In this lecture, we will discuss the following exct mtching prolem on strings. Prolem Let S e set of strings, ech of which hs unique integer

More information

Lesson 4.4. Euler Circuits and Paths. Explore This

Lesson 4.4. Euler Circuits and Paths. Explore This Lesson 4.4 Euler Ciruits nd Pths Now tht you re fmilir with some of the onepts of grphs nd the wy grphs onvey onnetions nd reltionships, it s time to egin exploring how they n e used to model mny different

More information

Algorithm Design (5) Text Search

Algorithm Design (5) Text Search Algorithm Design (5) Text Serch Tkshi Chikym School of Engineering The University of Tokyo Text Serch Find sustring tht mtches the given key string in text dt of lrge mount Key string: chr x[m] Text Dt:

More information

What are suffix trees?

What are suffix trees? Suffix Trees 1 Wht re suffix trees? Allow lgorithm designers to store very lrge mount of informtion out strings while still keeping within liner spce Allow users to serch for new strings in the originl

More information

Lecture 13: Graphs I: Breadth First Search

Lecture 13: Graphs I: Breadth First Search Leture 13 Grphs I: BFS 6.006 Fll 2011 Leture 13: Grphs I: Bredth First Serh Leture Overview Applitions of Grph Serh Grph Representtions Bredth-First Serh Rell: Grph G = (V, E) V = set of verties (ritrry

More information

Lecture 8: Graph-theoretic problems (again)

Lecture 8: Graph-theoretic problems (again) COMP36111: Advned Algorithms I Leture 8: Grph-theoreti prolems (gin) In Prtt-Hrtmnn Room KB2.38: emil: iprtt@s.mn..uk 2017 18 Reding for this leture: Sipser: Chpter 7. A grph is pir G = (V, E), where V

More information

2 Computing all Intersections of a Set of Segments Line Segment Intersection

2 Computing all Intersections of a Set of Segments Line Segment Intersection 15-451/651: Design & Anlysis of Algorithms Novemer 14, 2016 Lecture #21 Sweep-Line nd Segment Intersection lst chnged: Novemer 8, 2017 1 Preliminries The sweep-line prdigm is very powerful lgorithmic design

More information

CS 551 Computer Graphics. Hidden Surface Elimination. Z-Buffering. Basic idea: Hidden Surface Removal

CS 551 Computer Graphics. Hidden Surface Elimination. Z-Buffering. Basic idea: Hidden Surface Removal CS 55 Computer Grphis Hidden Surfe Removl Hidden Surfe Elimintion Ojet preision lgorithms: determine whih ojets re in front of others Uses the Pinter s lgorithm drw visile surfes from k (frthest) to front

More information

Error Numbers of the Standard Function Block

Error Numbers of the Standard Function Block A.2.2 Numers of the Stndrd Funtion Blok evlution The result of the logi opertion RLO is set if n error ours while the stndrd funtion lok is eing proessed. This llows you to rnh to your own error evlution

More information

Chapter 9. Greedy Technique. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

Chapter 9. Greedy Technique. Copyright 2007 Pearson Addison-Wesley. All rights reserved. Chpter 9 Greey Tehnique Copyright 2007 Person Aison-Wesley. All rights reserve. Greey Tehnique Construts solution to n optimiztion prolem piee y piee through sequene of hoies tht re: fesile lolly optiml

More information

[SYLWAN., 158(6)]. ISI

[SYLWAN., 158(6)]. ISI The proposl of Improved Inext Isomorphi Grph Algorithm to Detet Design Ptterns Afnn Slem B-Brhem, M. Rizwn Jmeel Qureshi Fulty of Computing nd Informtion Tehnology, King Adulziz University, Jeddh, SAUDI

More information

Problem Final Exam Set 2 Solutions

Problem Final Exam Set 2 Solutions CSE 5 5 Algoritms nd nd Progrms Prolem Finl Exm Set Solutions Jontn Turner Exm - //05 0/8/0. (5 points) Suppose you re implementing grp lgoritm tt uses ep s one of its primry dt strutures. Te lgoritm does

More information

Distance vector protocol

Distance vector protocol istne vetor protool Irene Finohi finohi@i.unirom.it Routing Routing protool Gol: etermine goo pth (sequene of routers) thru network from soure to Grph strtion for routing lgorithms: grph noes re routers

More information

Information Retrieval and Organisation

Information Retrieval and Organisation Informtion Retrievl nd Orgnistion Suffix Trees dpted from http://www.mth.tu.c.il/~himk/seminr02/suffixtrees.ppt Dell Zhng Birkeck, University of London Trie A tree representing set of strings { } eef d

More information

Presentation Martin Randers

Presentation Martin Randers Presenttion Mrtin Rnders Outline Introduction Algorithms Implementtion nd experiments Memory consumption Summry Introduction Introduction Evolution of species cn e modelled in trees Trees consist of nodes

More information

Greedy Algorithm. Algorithm Fall Semester

Greedy Algorithm. Algorithm Fall Semester Greey Algorithm Algorithm 0 Fll Semester Optimiztion prolems An optimiztion prolem is one in whih you wnt to fin, not just solution, ut the est solution A greey lgorithm sometimes works well for optimiztion

More information

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016 Solving Prolems y Serching CS 486/686: Introduction to Artificil Intelligence Winter 2016 1 Introduction Serch ws one of the first topics studied in AI - Newell nd Simon (1961) Generl Prolem Solver Centrl

More information

From Dependencies to Evaluation Strategies

From Dependencies to Evaluation Strategies From Dependencies to Evlution Strtegies Possile strtegies: 1 let the user define the evlution order 2 utomtic strtegy sed on the dependencies: use locl dependencies to determine which ttriutes to compute

More information

Computational geometry

Computational geometry Leture 23 Computtionl geometry Supplementl reding in CLRS: Chpter 33 exept 33.3 There re mny importnt prolems in whih the reltionships we wish to nlyze hve geometri struture. For exmple, omputtionl geometry

More information

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs. Lecture 5 Wlks, Trils, Pths nd Connectedness Reding: Some of the mteril in this lecture comes from Section 1.2 of Dieter Jungnickel (2008), Grphs, Networks nd Algorithms, 3rd edition, which is ville online

More information

Enterprise Digital Signage Create a New Sign

Enterprise Digital Signage Create a New Sign Enterprise Digitl Signge Crete New Sign Intended Audiene: Content dministrtors of Enterprise Digitl Signge inluding stff with remote ess to sign.pitt.edu nd the Content Mnger softwre pplition for their

More information

Fault tree conversion to binary decision diagrams

Fault tree conversion to binary decision diagrams Loughorough University Institutionl Repository Fult tree onversion to inry deision digrms This item ws sumitted to Loughorough University's Institutionl Repository y the/n uthor. Cittion: ANDREWS, J.D.

More information

CS553 Lecture Introduction to Data-flow Analysis 1

CS553 Lecture Introduction to Data-flow Analysis 1 ! Ide Introdution to Dt-flow nlysis!lst Time! Implementing Mrk nd Sweep GC!Tody! Control flow grphs! Liveness nlysis! Register llotion CS553 Leture Introdution to Dt-flow Anlysis 1 Dt-flow Anlysis! Dt-flow

More information

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5 CS321 Lnguges nd Compiler Design I Winter 2012 Lecture 5 1 FINITE AUTOMATA A non-deterministic finite utomton (NFA) consists of: An input lphet Σ, e.g. Σ =,. A set of sttes S, e.g. S = {1, 3, 5, 7, 11,

More information

Today. CS 188: Artificial Intelligence Fall Recap: Search. Example: Pancake Problem. Example: Pancake Problem. General Tree Search.

Today. CS 188: Artificial Intelligence Fall Recap: Search. Example: Pancake Problem. Example: Pancake Problem. General Tree Search. CS 88: Artificil Intelligence Fll 00 Lecture : A* Serch 9//00 A* Serch rph Serch Tody Heuristic Design Dn Klein UC Berkeley Multiple slides from Sturt Russell or Andrew Moore Recp: Serch Exmple: Pncke

More information

COSC 6374 Parallel Computation. Non-blocking Collective Operations. Edgar Gabriel Fall Overview

COSC 6374 Parallel Computation. Non-blocking Collective Operations. Edgar Gabriel Fall Overview COSC 6374 Prllel Computtion Non-loking Colletive Opertions Edgr Griel Fll 2014 Overview Impt of olletive ommunition opertions Impt of ommunition osts on Speedup Crtesin stenil ommunition All-to-ll ommunition

More information

COMBINATORIAL PATTERN MATCHING

COMBINATORIAL PATTERN MATCHING COMBINATORIAL PATTERN MATCHING Genomic Repets Exmple of repets: ATGGTCTAGGTCCTAGTGGTC Motivtion to find them: Genomic rerrngements re often ssocited with repets Trce evolutionry secrets Mny tumors re chrcterized

More information

CS481: Bioinformatics Algorithms

CS481: Bioinformatics Algorithms CS481: Bioinformtics Algorithms Cn Alkn EA509 clkn@cs.ilkent.edu.tr http://www.cs.ilkent.edu.tr/~clkn/teching/cs481/ EXACT STRING MATCHING Fingerprint ide Assume: We cn compute fingerprint f(p) of P in

More information

4.3 Balanced Trees. let us assume that we can manipulate them conveniently and see how they can be put together to form trees.

4.3 Balanced Trees. let us assume that we can manipulate them conveniently and see how they can be put together to form trees. 428 T FOU 4.3 Blned Trees T BT GOIT IN T VIOU setion work well for wide vriety of pplitions, ut they hve poor worst-se performne. s we hve noted, files lredy in order, files in reverse order, files with

More information

CS453 INTRODUCTION TO DATAFLOW ANALYSIS

CS453 INTRODUCTION TO DATAFLOW ANALYSIS CS453 INTRODUCTION TO DATAFLOW ANALYSIS CS453 Leture Register llotion using liveness nlysis 1 Introdution to Dt-flow nlysis Lst Time Register llotion for expression trees nd lol nd prm vrs Tody Register

More information

Efficient Subscription Management in Content-based Networks

Efficient Subscription Management in Content-based Networks Effiient Susription Mngement in Content-sed Networks Rphël Chnd, Psl A. Feler Institut EURECOM 06904 Sophi Antipolis, Frne {hnd feler}@eureom.fr Astrt Content-sed pulish/susrie systems offer onvenient

More information

COSC 6374 Parallel Computation. Communication Performance Modeling (II) Edgar Gabriel Fall Overview. Impact of communication costs on Speedup

COSC 6374 Parallel Computation. Communication Performance Modeling (II) Edgar Gabriel Fall Overview. Impact of communication costs on Speedup COSC 6374 Prllel Computtion Communition Performne Modeling (II) Edgr Griel Fll 2015 Overview Impt of ommunition osts on Speedup Crtesin stenil ommunition All-to-ll ommunition Impt of olletive ommunition

More information

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. General Tree Search. Uniform Cost. Lecture 3: A* Search 9/4/2007

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. General Tree Search. Uniform Cost. Lecture 3: A* Search 9/4/2007 CS 88: Artificil Intelligence Fll 2007 Lecture : A* Serch 9/4/2007 Dn Klein UC Berkeley Mny slides over the course dpted from either Sturt Russell or Andrew Moore Announcements Sections: New section 06:

More information

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. Example: Pancake Problem. Example: Pancake Problem

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. Example: Pancake Problem. Example: Pancake Problem Announcements Project : erch It s live! Due 9/. trt erly nd sk questions. It s longer thn most! Need prtner? Come up fter clss or try Pizz ections: cn go to ny, ut hve priority in your own C 88: Artificil

More information

Calculus Differentiation

Calculus Differentiation //007 Clulus Differentition Jeffrey Seguritn person in rowot miles from the nerest point on strit shoreline wishes to reh house 6 miles frther down the shore. The person n row t rte of mi/hr nd wlk t rte

More information

Approximate Joins for Data Centric XML

Approximate Joins for Data Centric XML Approximte Joins for Dt Centri XML Nikolus Augsten 1, Mihel Böhlen 1, Curtis Dyreson, Johnn Gmper 1 1 Fulty of Computer Siene, Free University of Bozen-Bolzno Dominiknerpltz 3, Bozen, Itly {ugsten,oehlen,gmper}@inf.uniz.it

More information

Width and Bounding Box of Imprecise Points

Width and Bounding Box of Imprecise Points Width nd Bounding Box of Impreise Points Vhideh Keikh Mrten Löffler Ali Mohdes Zhed Rhmti Astrt In this pper we study the following prolem: we re given set L = {l 1,..., l n } of prllel line segments,

More information

String comparison by transposition networks

String comparison by transposition networks String omprison y trnsposition networks Alexnder Tiskin (Joint work with Peter Krushe) Deprtment of Computer Siene University of Wrwik http://www.ds.wrwik..uk/~tiskin (inludes n extended version of this

More information

SMALL SIZE EDGE-FED SIERPINSKI CARPET MICROSTRIP PATCH ANTENNAS

SMALL SIZE EDGE-FED SIERPINSKI CARPET MICROSTRIP PATCH ANTENNAS Progress In Eletromgnetis Reserh C, Vol. 3, 195 22, 28 SMALL SIZE EDGE-FED SIERPINSKI CARPET MICROSTRIP PATCH ANTENNAS W.-L. Chen nd G.-M. Wng Rdr Engineering Deprtment Missile Institute of Air Fore Engineering

More information

Fig.25: the Role of LEX

Fig.25: the Role of LEX The Lnguge for Specifying Lexicl Anlyzer We shll now study how to uild lexicl nlyzer from specifiction of tokens in the form of list of regulr expressions The discussion centers round the design of n existing

More information

To access your mailbox from inside your organization. For assistance, call:

To access your mailbox from inside your organization. For assistance, call: 2001 Ative Voie, In. All rights reserved. First edition 2001. Proteted y one or more of the following United Sttes ptents:,070,2;,3,90;,88,0;,33,102;,8,0;,81,0;,2,7;,1,0;,90,88;,01,11. Additionl U.S. nd

More information

SOFTWARE-BUG LOCALIZATION WITH GRAPH MINING

SOFTWARE-BUG LOCALIZATION WITH GRAPH MINING Chpter 17 SOFTWARE-BUG LOCALIZATION WITH GRAPH MINING Frnk Eihinger Institute for Progrm Strutures nd Dt Orgniztion (IPD) Universit-t Krlsruhe (TH), Germny eihinger@ipd.uk.de Klemens B-ohm Institute for

More information

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Solving Prolems y Serching CS 486/686: Introduction to Artificil Intelligence 1 Introduction Serch ws one of the first topics studied in AI - Newell nd Simon (1961) Generl Prolem Solver Centrl component

More information

FASTEST METHOD TO FIND ALTERNATIVE RE-ROUTE

FASTEST METHOD TO FIND ALTERNATIVE RE-ROUTE INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 FASTEST METHOD TO FIND ALTERNATIVE RE-ROUTE 1 M.JothiLkshmi, M.S., M.Phil. 2 C.Theeendr, M.S., M.Phil. 3 M.K.Pvithr,

More information

Lecture 12 : Topological Spaces

Lecture 12 : Topological Spaces Leture 12 : Topologil Spes 1 Topologil Spes Topology generlizes notion of distne nd loseness et. Definition 1.1. A topology on set X is olletion T of susets of X hving the following properties. 1. nd X

More information

Inter-domain Routing

Inter-domain Routing COMP 631: NETWORKED & DISTRIBUTED SYSTEMS Inter-domin Routing Jsleen Kur Fll 2016 1 Internet-sle Routing: Approhes DV nd link-stte protools do not sle to glol Internet How to mke routing slle? Exploit

More information

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards A Tutology Checker loosely relted to Stålmrck s Algorithm y Mrtin Richrds mr@cl.cm.c.uk http://www.cl.cm.c.uk/users/mr/ University Computer Lortory New Museum Site Pemroke Street Cmridge, CB2 3QG Mrtin

More information

McAfee Web Gateway

McAfee Web Gateway Relese Notes Revision C MAfee We Gtewy 7.6.2.11 Contents Aout this relese Enhnement Resolved issues Instlltion instrutions Known issues Additionl informtion Find produt doumenttion Aout this relese This

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Dt Mining y I. H. Witten nd E. Frnk Simplicity first Simple lgorithms often work very well! There re mny kinds of simple structure, eg: One ttriute does ll the work All ttriutes contriute eqully

More information

CS201 Discussion 10 DRAWTREE + TRIES

CS201 Discussion 10 DRAWTREE + TRIES CS201 Discussion 10 DRAWTREE + TRIES DrwTree First instinct: recursion As very generic structure, we could tckle this problem s follows: drw(): Find the root drw(root) drw(root): Write the line for the

More information

Distance Computation between Non-convex Polyhedra at Short Range Based on Discrete Voronoi Regions

Distance Computation between Non-convex Polyhedra at Short Range Based on Discrete Voronoi Regions Distne Computtion etween Non-onvex Polyhedr t Short Rnge Bsed on Disrete Voronoi Regions Ktsuki Kwhi nd Hiroms Suzuki Deprtment of Preision Mhinery Engineering, The University of Tokyo 7-3-1 Hongo, Bunkyo-ku,

More information

6.045J/18.400J: Automata, Computability and Complexity. Quiz 2: Solutions. Please write your name in the upper corner of each page.

6.045J/18.400J: Automata, Computability and Complexity. Quiz 2: Solutions. Please write your name in the upper corner of each page. 6045J/18400J: Automt, Computbility nd Complexity Mrh 30, 2005 Quiz 2: Solutions Prof Nny Lynh Vinod Vikuntnthn Plese write your nme in the upper orner of eh pge Problem Sore 1 2 3 4 5 6 Totl Q2-1 Problem

More information

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis CS143 Hndout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexicl Anlysis In this first written ssignment, you'll get the chnce to ply round with the vrious constructions tht come up when doing lexicl

More information

CS 340, Fall 2016 Sep 29th Exam 1 Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string.

CS 340, Fall 2016 Sep 29th Exam 1 Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string. CS 340, Fll 2016 Sep 29th Exm 1 Nme: Note: in ll questions, the speil symol ɛ (epsilon) is used to indite the empty string. Question 1. [10 points] Speify regulr expression tht genertes the lnguge over

More information

V = set of vertices (vertex / node) E = set of edges (v, w) (v, w in V)

V = set of vertices (vertex / node) E = set of edges (v, w) (v, w in V) Definitions G = (V, E) V = set of verties (vertex / noe) E = set of eges (v, w) (v, w in V) (v, w) orere => irete grph (igrph) (v, w) non-orere => unirete grph igrph: w is jent to v if there is n ege from

More information

The Greedy Method. The Greedy Method

The Greedy Method. The Greedy Method Lists nd Itertors /8/26 Presenttion for use with the textook, Algorithm Design nd Applictions, y M. T. Goodrich nd R. Tmssi, Wiley, 25 The Greedy Method The Greedy Method The greedy method is generl lgorithm

More information

ITEC2620 Introduction to Data Structures

ITEC2620 Introduction to Data Structures ITEC0 Introduction to Dt Structures Lecture 7 Queues, Priority Queues Queues I A queue is First-In, First-Out = FIFO uffer e.g. line-ups People enter from the ck of the line People re served (exit) from

More information

Compilers. Topic 4. The Symbol Table and Block Structure PART II. Mick O Donnell: Alfonso Ortega:

Compilers. Topic 4. The Symbol Table and Block Structure PART II. Mick O Donnell: Alfonso Ortega: Compilers Topi 4 The ol Tle nd Blok Struture PART II Mik O Donnell: mihel.odonnell@um.es Alfonso Orteg: lfonso.orteg@um.es Topi 2: Blok Struture 2 1 ol tles with lok strutures Blok Struture Progrmming

More information

Suffix trees, suffix arrays, BWT

Suffix trees, suffix arrays, BWT ALGORITHMES POUR LA BIO-INFORMATIQUE ET LA VISUALISATION COURS 3 Rluc Uricru Suffix trees, suffix rrys, BWT Bsed on: Suffix trees nd suffix rrys presenttion y Him Kpln Suffix trees course y Pco Gomez Liner-Time

More information

Lexical Analysis: Constructing a Scanner from Regular Expressions

Lexical Analysis: Constructing a Scanner from Regular Expressions Lexicl Anlysis: Constructing Scnner from Regulr Expressions Gol Show how to construct FA to recognize ny RE This Lecture Convert RE to n nondeterministic finite utomton (NFA) Use Thompson s construction

More information

Topological Queries on Graph-structured XML Data: Models and Implementations

Topological Queries on Graph-structured XML Data: Models and Implementations Topologicl Queries on Grph-structured XML Dt: Models nd Implementtions Hongzhi Wng, Jinzhong Li, nd Jizhou Luo Astrct In mny pplictions, dt is in grph structure, which cn e nturlly represented s grph-structured

More information

Outline. Introduction Suffix Trees (ST) Building STs in linear time: Ukkonen s algorithm Applications of ST

Outline. Introduction Suffix Trees (ST) Building STs in linear time: Ukkonen s algorithm Applications of ST Suffi Trees Outline Introduction Suffi Trees (ST) Building STs in liner time: Ukkonen s lgorithm Applictions of ST 2 3 Introduction Sustrings String is ny sequence of chrcters. Sustring of string S is

More information

LINX MATRIX SWITCHERS FIRMWARE UPDATE INSTRUCTIONS FIRMWARE VERSION

LINX MATRIX SWITCHERS FIRMWARE UPDATE INSTRUCTIONS FIRMWARE VERSION Overview LINX MATRIX SWITCHERS FIRMWARE UPDATE INSTRUCTIONS FIRMWARE VERSION 4.4.1.0 Due to the omplex nture of this updte, plese fmilirize yourself with these instrutions nd then ontt RGB Spetrum Tehnil

More information

Augmenting Sux Trees, with Applications Yossi Matias 1?, S. Muthukrishnan 2??,Suleyman Cenk Ṣahinalp 3???, and Jacob Ziv 4 y 1 Tel-Aviv University, an

Augmenting Sux Trees, with Applications Yossi Matias 1?, S. Muthukrishnan 2??,Suleyman Cenk Ṣahinalp 3???, and Jacob Ziv 4 y 1 Tel-Aviv University, an Augmenting Sux Trees, with Applitions Yossi Mtis 1?, S. Mhukrishnn??,Suleymn Cenk Ṣhinlp 3???, nd Jo Ziv 4 y 1 Tel-Aviv University, nd Bell Ls, Murry Hill Bell Ls, Murry Hill 3 University ofwrwik nd University

More information

10.2 Graph Terminology and Special Types of Graphs

10.2 Graph Terminology and Special Types of Graphs 10.2 Grph Terminology n Speil Types of Grphs Definition 1. Two verties u n v in n unirete grph G re lle jent (or neighors) in G iff u n v re enpoints of n ege e of G. Suh n ege e is lle inient with the

More information

Balanced Trees. 2-3 trees red-black trees B-trees. 2-3 trees red-black trees B-trees smaller than. 2-node. 3-node E J S X A C.

Balanced Trees. 2-3 trees red-black trees B-trees. 2-3 trees red-black trees B-trees smaller than. 2-node. 3-node E J S X A C. ymol tle review Blned Trees implementtion gurntee verge se serh insert delete serh hit insert delete ordered itertion? opertions on keys sequentil serh (linked list) N N N N/2 N N/2 no equls() 2-3 trees

More information

INTEGRATED WORKFLOW ART DIRECTOR

INTEGRATED WORKFLOW ART DIRECTOR ART DIRECTOR Progrm Resoures INTEGRATED WORKFLOW PROGRAM PLANNING PHASE In this workflow phse proess, you ollorte with the Progrm Mnger, the Projet Mnger, nd the Art Speilist/ Imge Led to updte the resoures

More information

Applied Databases. Sebastian Maneth. Lecture 13 Online Pattern Matching on Strings. University of Edinburgh - February 29th, 2016

Applied Databases. Sebastian Maneth. Lecture 13 Online Pattern Matching on Strings. University of Edinburgh - February 29th, 2016 Applied Dtses Lecture 13 Online Pttern Mtching on Strings Sestin Mneth University of Edinurgh - Ferury 29th, 2016 2 Outline 1. Nive Method 2. Automton Method 3. Knuth-Morris-Prtt Algorithm 4. Boyer-Moore

More information

MITSUBISHI ELECTRIC RESEARCH LABORATORIES Cambridge, Massachusetts. Introduction to Matroids and Applications. Srikumar Ramalingam

MITSUBISHI ELECTRIC RESEARCH LABORATORIES Cambridge, Massachusetts. Introduction to Matroids and Applications. Srikumar Ramalingam Cmrige, Msshusetts Introution to Mtrois n Applitions Srikumr Rmlingm MERL mm//yy Liner Alger (,0,0) (0,,0) Liner inepenene in vetors: v, v2,..., For ll non-trivil we hve s v s v n s, s2,..., s n 2v2...

More information

Stack Manipulation. Other Issues. How about larger constants? Frame Pointer. PowerPC. Alternative Architectures

Stack Manipulation. Other Issues. How about larger constants? Frame Pointer. PowerPC. Alternative Architectures Other Issues Stck Mnipultion support for procedures (Refer to section 3.6), stcks, frmes, recursion mnipulting strings nd pointers linkers, loders, memory lyout Interrupts, exceptions, system clls nd conventions

More information

Troubleshooting. Verify the Cisco Prime Collaboration Provisioning Installation (for Advanced or Standard Mode), page

Troubleshooting. Verify the Cisco Prime Collaboration Provisioning Installation (for Advanced or Standard Mode), page Trouleshooting This setion explins the following: Verify the Ciso Prime Collortion Provisioning Instlltion (for Advned or Stndrd Mode), pge 1 Upgrde the Ciso Prime Collortion Provisioning from Smll to

More information

10.5 Graphing Quadratic Functions

10.5 Graphing Quadratic Functions 0.5 Grphing Qudrtic Functions Now tht we cn solve qudrtic equtions, we wnt to lern how to grph the function ssocited with the qudrtic eqution. We cll this the qudrtic function. Grphs of Qudrtic Functions

More information

Shared Memory Architectures. Programming and Synchronization. Today s Outline. Page 1. Message passing review Cosmic Cube discussion

Shared Memory Architectures. Programming and Synchronization. Today s Outline. Page 1. Message passing review Cosmic Cube discussion Tody s Outline Arhitetures Progrmming nd Synhroniztion Disuss pper on Cosmi Cube (messge pssing) Messge pssing review Cosmi Cube disussion > Messge pssing mhine Shred memory model > Communition > Synhroniztion

More information

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe CSCI 0 fel Ferreir d Silv rfsilv@isi.edu Slides dpted from: Mrk edekopp nd Dvid Kempe LOG STUCTUED MEGE TEES Series Summtion eview Let n = + + + + k $ = #%& #. Wht is n? n = k+ - Wht is log () + log ()

More information

COMMON FRACTIONS. or a / b = a b. , a is called the numerator, and b is called the denominator.

COMMON FRACTIONS. or a / b = a b. , a is called the numerator, and b is called the denominator. COMMON FRACTIONS BASIC DEFINITIONS * A frtion is n inite ivision. or / * In the frtion is lle the numertor n is lle the enomintor. * The whole is seprte into "" equl prts n we re onsiering "" of those

More information

Efficient Answering of Set Containment Queries for Skewed Item Distributions

Efficient Answering of Set Containment Queries for Skewed Item Distributions Eiient Answering o Set Continment Queries or Skewed Item Distributions Mnolis Terrovitis IMIS, RC Athen Greee mter@imis.theninnovtion.gr Timos Sellis NTU Athens nd IMIS, RC Athen Greee timos@imis.theninnovtion.gr

More information

Preserving Constraints for Aggregation Relationship Type Update in XML Document

Preserving Constraints for Aggregation Relationship Type Update in XML Document Preserving Constrints for Aggregtion Reltionship Type Updte in XML Document Eric Prdede 1, J. Wenny Rhyu 1, nd Dvid Tnir 2 1 Deprtment of Computer Science nd Computer Engineering, L Trobe University, Bundoor

More information

c s ha2 c s Half Adder Figure 2: Full Adder Block Diagram

c s ha2 c s Half Adder Figure 2: Full Adder Block Diagram Adder Tk: Implement 2-it dder uing 1-it full dder nd 1-it hlf dder omponent (Figure 1) tht re onneted together in top-level module. Derie oth omponent in VHDL. Prepre two implementtion where VHDL omponent

More information

Convex Hull Algorithms. Convex hull: basic facts

Convex Hull Algorithms. Convex hull: basic facts CG Leture D Conve Hull Algorithms Bsi fts Algorithms: Nïve, Gift wrpping, Grhm sn, Quik hull, Divide-nd-onquer Lower ound 3D Bsi fts Algorithms: Gift wrpping, Divide nd onquer, inrementl Conve hulls in

More information

Suffix Tries. Slides adapted from the course by Ben Langmead

Suffix Tries. Slides adapted from the course by Ben Langmead Suffix Tries Slides dpted from the course y Ben Lngmed en.lngmed@gmil.com Indexing with suffixes Until now, our indexes hve een sed on extrcting sustrings from T A very different pproch is to extrct suffixes

More information

UT1553B BCRT True Dual-port Memory Interface

UT1553B BCRT True Dual-port Memory Interface UTMC APPICATION NOTE UT553B BCRT True Dul-port Memory Interfce INTRODUCTION The UTMC UT553B BCRT is monolithic CMOS integrted circuit tht provides comprehensive MI-STD- 553B Bus Controller nd Remote Terminl

More information

Ma/CS 6b Class 1: Graph Recap

Ma/CS 6b Class 1: Graph Recap M/CS 6 Clss 1: Grph Recp By Adm Sheffer Course Detils Adm Sheffer. Office hour: Tuesdys 4pm. dmsh@cltech.edu TA: Victor Kstkin. Office hour: Tuesdys 7pm. 1:00 Mondy, Wednesdy, nd Fridy. http://www.mth.cltech.edu/~2014-15/2term/m006/

More information

Graphs with at most two trees in a forest building process

Graphs with at most two trees in a forest building process Grphs with t most two trees in forest uilding process rxiv:802.0533v [mth.co] 4 Fe 208 Steve Butler Mis Hmnk Mrie Hrdt Astrct Given grph, we cn form spnning forest y first sorting the edges in some order,

More information