Meaningful Change Detection in Structured Data.

Size: px
Start display at page:

Download "Meaningful Change Detection in Structured Data."

Transcription

1 Meningful Chnge Detection in Structured Dt Sudrshn S. Chwthe Hector Grci-Molin Computer Science Deprtment, Stnford University, Stnford, Cliforni Astrct Detecting chnges y compring dt snpshots is n importnt requirement for dierence queries, ctive dtses, nd version nd congurtion mngement. In this pper we focus on detecting meningful chnges in hierrchiclly structured dt, such s nested-oject dt. This prolem is much more chllenging thn the corresponding one for reltionl or t-le dt. In order to descrie chnges etter, we se our work not just on the trditionl \tomic" insert, delete, updte opertions, ut lso on opertions tht move n entire su-tree of nodes, nd tht copy n entire su-tree. These opertions llows us to descrie chnges in semnticlly more meningful wy. Since this chnge detection prolem is N P-hrd, in this pper we present heuristic chnge detection lgorithm tht yields close to \miniml" descriptions of the chnges, nd tht hs fewer restrictions thn previous lgorithms. Our lgorithm is sed on trnsforming the chnge detection prolem to prolem of computing minimum-cost edge cover of iprtite grph. We study the qulity of the solution produced y our lgorithm, s well s the running time, oth nlyticlly nd experimentlly. Introduction Detection of chnges etween dt structures is n importnt function in mny pplictions. For exmple, in the World-Wide We n nlyst my e interested in knowing how competitor's site hs chnged since the lst time visited. This my e chieved y sving snpshot of the previous HTML pges t the site (something tht most rowsers do for eciency nywy). In CAD design environment, n engineer my wish to understnd the dierences etween two relted ut concurrently developed chip designs. In This work ws supported y the Air Force Wright Lortory Aeronuticl Systems Center under DARPA Contrct F , y the Deprtment of the Air Force Rome Lortories under DARPA Contrct F C-09, nd y equipment grnts from IBM Corportion, Digitl Equipment Corportion, nd Sun Microsystems. distriuted le system, n dministrtor my need to detect dierences etween two mirror le systems tht ecme prtitioned nd independently modied. In wrehousing environment, the chnges t site need to e identied so tht mterilized view cn e incrementlly mintined. In this pper we present n ecient lgorithm, mh-diff, for meningful chnge detection etween two hierrchiclly structured dt snpshots, or trees. The key word here is meningful (the \M" in the nme). Tht is, our gol is to portry the chnges etween two trees in succinct nd descriptive wy. As is commonly done, we portry the chnges s n edit script tht gives the sequence of opertions needed to trnsform one tree into nother. However, in this pper we use richer set of opertions thn hs ever een used efore, nd this leds, we elieve, to much higher qulity edit scripts. In prticulr, we use move nd copy opertions, in ddition to the more trditionl insert, delete, nd updte opertions. Thus, if sustructure (e.g., section of text, shift register) is moved to nother loction, our lgorithm will report it s single opertion. If the sustructure is copied (e.g., second shift register is dded which is identicl to one lredy in the circuit), then our lgorithm will identify it s such. Trditionl chnge detection lgorithms would report such chnges s sequences of inserts nd deletes (or simply inserts in the cse of copy), which do not convey the true mening of the chnge. Note tht detecting moves nd copies ecomes more importnt if the moved or copied sutree is lrge. For instnce, if we re compring le systems, nd lrge directory with thousnds of les is mounted elsewhere, we clerly do not wish to report the chnge s thousnds of le deletes followed y thousnds of le cretions. Also note tht to detect moves nd copies, it is essentil tht our lgorithm understnd the structure s well s the content of the dt. Thus, our lgorithm cnnot tret the dt s \t" informtion, e.g., s les with records or reltions with tuples. This mens tht techniques developed for t chnge detection [Mye86, LGM96] re not pplicle here. Algorithm mh-diff hs two dditionl importnt fetures: It does not rely on the existence of node (tomic oject) identiers tht cn mtch nodes in one tree to nodes in the other. In mny pplictions such identiers do not exist. For instnce, sentences nd prgrphs in text documents do not come with unique

2 identiers ttched. Even when the nodes re stored in dtse system (e.g., circuit components), we my e compring copies with the sme content ut dierent identiers. Thus, for full generlity, mh-diff does not ssume unique identiers tht spn the two trees, nd insted compres the contents of nodes to determine if they re relted. (If the trees hve such identiers, mh-diff could esily tke dvntge of them, ut we do not discuss tht here.) Algorithm mh-diff is sed on firly exile cost model. Ech opertion in the repertoire is given userdened xed cost, except for the updte opertion, whose cost is determined y user-provided function tht compres the vlues of two nodes. This gives end users gret ltitude in sying wht types of edit scripts re preferle for n ppliction. There is good reson why dierence lgorithms with the fetures we hve descried here hve not een developed erlier, even though they re clerly desirle. The reson is the inherent complexity of the prolem; one cn show tht the prolem is N P-hrd. Algorithm mh-diff provides heuristic solution, which is sed on trnsforming the prolem to the \edge cover domin." Tht is, insted of working with edit scripts, the lgorithm works with edge covers tht represent how one set of nodes mtch nother set. In this trnsformtion, the costs of the edit opertions re trnslted into costs on the edges of the cover. In n erlier pper [CRGMW96] we studied much simpler version of the chnge detection prolem. In tht work we did not consider copy opertions, we ssumed tht the numer of duplictes of node ws very limited, we ssumed ordered trees, nd we ssumed tht nodes hd \tgs" tht reect the structurl constrints on the input trees. (For exmple, nodes were tgged s sy \prgrphs" or \sections," mking it esier to mtch nodes.) All these restrictions mde it much simpler to nd minimum-cost edit script, nd indeed we developed n ecient lgorithm tht found minimum-cost script. Here, on the other hnd, here we drop these restrictions, nd introduce copy opertions. This leds to n lgorithm tht is very dierent from the one in [CRGMW96], nd tht yields heuristic solution in worst-cse O(n 3 ) time, where n is the numer of nodes, ut most often in roughly O(n 2 ) time. In Section 7 we compre in more detil mh-diff to our erlier work, s well s to other work on chnge detection. 2 Model nd Prolem Denition We use rooted, leled trees s our model for structured dt. These re trees in which ech node n hs lel l(n) tht is chosen from n ritrry domin L. The prolem of snpshot chnge detection in structured dt is thus the prolem of nding wy to edit the tree representtion of one snpshot to tht of the other. We denote tree T y its nodes N, the prent function p, nd the leling function l, nd write T = (N; p; l). The children of node n 2 N re denoted y C(n). We egin y dening the tree edit opertions tht we consider. Since there re mny wys to trnsform one tree to nother using these edit opertions, we dene cost model for these edit opertions, nd then dene the prolem of By reduction from the \exct cover y three-sets" prolem. nding minimum-cost edit script tht trnsforms one tree to nother. 2. Edit Opertions nd Edit Scripts In the following, we will ssume tht n edit opertion e is pplied to T = (N ; p ; l ), nd produces the tree T 2 = (N 2; p 2; l 2). We write this s T e! T2. We consider the following six edit opertions: Insertion: Intuitively, n insertion opertion cretes new tree node with given lel, nd plces it t given position in the tree. The position of the new node n in the tree is specied y giving its prent node p nd suset C of the children of p. The result of this opertion is tht n is child of p, nd the nodes C, tht were originlly children of p, re now children of the newly inserted node n. Formlly, n insertion opertion is denoted y ins(n; v; p; C), where n is the (unique) identier of the new node, v is the lel of the new node, p 2 N is the node tht is to e the prent of n, nd C C(p) is the set of nodes tht re to e the children of n. When pplied to T = (N ; p ; l ), we get tree T 2 = (N 2; p 2; l 2), where N 2 = N [ fng, p 2(n) = p, p 2(c) = n; 8c 2 C, p 2(c) = p (c); 8c 2 N? C, l 2(n) = v, nd l 2(m) = l (m); 8m 2 N. Due to spce constrints, we descrie the remining edit opertions only informlly elow; the forml denitions re in [CGM97]. Deletion: This opertion is the inverse of the insertion opertion. Intuitively, del(n) cuses n to dispper from the tree; the children of n re now the children of the (old) prent of n. The root of the tree cnnot e deleted. Updte: The opertion upd(n; v) chnges the lel of the node n to v. Move: A move opertion mov(n; p) moves the sutree rooted t n to nother position in the tree. The new position is specied y giving the new prent of the node, p. The root cnnot e moved. Copy: A copy opertion cpy(m; p) copies the sutree rooted t n to nother position. The new position is specied y giving the node p tht is to e the prent of the new copy. The root cnnot e copied. Glue: This opertion is the inverse of copy opertion. Given two nodes n nd n 2 such tht the sutrees rooted t n nd n 2 re isomorphic, glu(n ; n 2) cuses the sutree rooted t n to dispper. (It is conceptully \united" with the sutree rooted t n 2.) The root cnnot e glued. Although the glu opertion my seem unusul, note tht it is nturl choice for n edit opertion given the existence of the cpy opertion. As we will see in Exmple 2., inverting n edit script contining cpy opertions results in n edit script with glu opertion. This symmetry in the structure of edit opertions is useful in the design of our lgorithms. In ddition to the ove tree edit opertions, one my wish to consider opertions such s sutree delete opertion tht deletes ll nodes in given sutree. Similrly, one could dene sutree merge opertion tht merges two

3 or more sutrees. We do not consider such more complex edit opertions in this pper, ut note tht some of these opertions, (e.g., sutree deletes) my e detected y postprocessing the output of our lgorithm. We dene n edit script to e sequence of zero or more edit opertions tht cn e pplied in the order in which they occur in the sequence. Tht is, given tree T 0, sequence of edit opertions E = e ; e 2; : : : ; e k is n edit script e if there exist trees T i; i k such tht i T i?! Ti; i k. We sy tht the edit script E trnsforms T 0 to T k, nd write T E 0! T k. T 2 e d ins(, g,, {9}) del() 3 d 5 f 6 7 cc d 0 d f 7 cc 9 c mov(2,6) d 8 8 c cpy(7,) 0 T2 glu(2,7) mov(2,) 2 cc 4 e g 4 e g c f 7 cc f 7 cc 9 8 c d e 2 3 d g 8 c d 0 Figure : Edit opertions on leled trees Exmple 2. Consider the tree T depicted in Figure. We represent the identier of ech node y the numer inside the circle representing the node. The lel of ech node is depicted to the right of the node. Thus, the root of the tree T hs n identier, nd lel. Figure shows how T is trnsformed y pplying the edit script to E = (ins(; g; ; f9g); mov(2; 6); cpy(7; )) T. Similrly, if we strt with the tree T 2 in the gure, the edit script E 2 = (glu(2; 7); mov(2; ); del()) trnsforms it ck to E E2 T. We write T! T 2, nd T 2! T. 2.2 Cost Model Given pir of trees, there re, in generl, severl edit scripts tht trnsform one tree to the other. For exmple, there is the trivil edit script tht deletes ll the nodes of one tree nd then inserts ll the nodes of the second tree. There re mny other edit scripts tht, informlly, do more work thn seems necessry. Formlly, we would like to nd n edit script tht is \miniml" in the sense tht it does no more work tht wht is solutely required. To this end, we dene cost model for edit opertions nd edit scripts. There re two mjor criteri for choosing cost model. Firstly, the cost model should ccurtely cpture the domin chrcteristics of the dt eing considered. For exmple, if we re compring the schemtics for two printed-circuit ords, we my prefer n edit script tht hs s few inserts s possile, nd insted descries chnges with moves nd copies of the old components. However, if we re compring text documents, we my prefer to see prgrph s new insertion, rther thn description of how it ws ssemled from its nd pieces of sentences from the old document. Secondly, the cost model should e simple to specify, nd should require little eort from the user. For exmple, cost model tht requires the user to specify dozens of prmeters is not desirle y this criterion, even though it my ccurtely model the domin. Another issue is the trde-o etween generlity of the cost model nd diculty in computing minimum-cost edit script. For exmple, very generl cost model would hve user-specied function to determine the cost of ech edit opertion, sed on the type of the edit opertion, s well s the prticulr nodes on which it opertes. However, such model is not menle to the design of ecient lgorithms for computing the minimum-cost edit script, since it does not permit us to reson out the reltive costs of the possile edit opertions. With the ove criteri in mind, we propose simple cost model in which the costs of insertion, deletion, move, copy, nd glue opertions re given y constnts, c i, c d, c m, c c, nd c g, respectively. Furthermore, given the symmetry etween ins nd del, nd cpy nd glu, it is resonle to use c i = c d, nd c c = c g. Since, intuitively, mov opertion cuses smller chnge thn either cpy or glu, it is lso resonle to use c m < c c. Note, however, tht our lgorithms do not depend on these reltionships etween the cost prmeters. The cost of n updte opertion depends on the old nd new vlues of the lel eing updted; tht is, c(upd(n; v)) = c u(v 0; v), where v 0 is the old lel of n, nd c u is domin-dependent function tht returns nonnegtive rel numer. Finlly, the cost P of n edit script E, denoted y c(e), is dened s the sum of the costs of the edit opertions in E. Tht is, c(e) = c(d). d2e Prolem Sttement: Given two rooted, leled trees T nd T 2, nd n edit script E such tht E trnsforms T to tree tht is isomorphic to T 2, nd such tht for every edit script E 0 with this property, C(E 0 ) C(E). 3 Method Overview In this Section, we present n overview of lgorithm mhdiff for computing minimum-cost edit script etween two trees. We present our lgorithm informlly using running exmple; the detils re deferred to lter sections. 3 d T 2 4 e 9 5 d 6 f 7 cc 0 c 8 5 T2 52 cc e 55 c f d g 60 cc 6 63 c 62 d 64 Figure 2: The trees for the running exmple in Section 3. Consider the two trees depicted in Figure 2. We would like to nd minimum-cost edit script tht trnsforms tree T into tree T 2. The reder my oserve tht these trees re isomorphic to the initil nd nl trees from Exmple 2. in Section 2. Note, however, tht there is no correspondence etween the node identiers of T nd T 2 in Figure 2. This is ecuse in Exmple 2. we pplied known edit script to

4 tree, trnsforming it to nother tree in the process, wheres in this section, we re trying to nd n edit script, given two trees with no informtion on the reltionship etween their nodes. Therefore, our rst step consists of nding correspondence etween the nodes of the two given trees. For exmple, consider the node 8 in Figure 2. We wnt to nd the node in T 2 tht corresponds to this node in T. The dshed lines in Figure 2 represent some of the possiilities. Intuitively, we cn see tht mtching the node 8 to the node 5 does not seem like good ide, since not only do the lels of the two nodes dier, ut the two nodes lso hve very dierent loctions in their respective trees; node 8 is lef node, while node 5 is the root node. Similrly, we my intuitively rgue tht mtching node 8 to node 62 seems promising, since they re oth lef nodes nd their lels mtch. However, note tht mtching nodes sed simply on their lels ignores the structure of the trees, nd thus is not, in generl, the est choice. We mke this intuitive notion of correspondence etween nodes more precise elow. 3. The Induced Grph Consider the complete iprtite grph B consisting of the nodes of T on one side, nd the nodes of T 2 on the other, plus the specil nodes (on T 's side) nd (on T 2's side). We cll B the induced grph of T nd T 2. The dshed lines in Figure 2 correspond to few edges of the induced grph. Intuitively, we would like to nd suset K of the edges of B tht tells us the correspondence etween the nodes of T nd T 2. If n edge connects node m 2 T to node n 2 T 2, it mens tht n ws \derived" from m. (For exmple, n my e copy of m.) We sy m is mtched to n. A node mtched to the specil node indictes tht it ws inserted, nd node mtched to indictes tht it ws deleted. Note tht this mtching etween nodes need not e one-to-one; node my e mtched to more thn one other nodes. (For exmple, referring to Figure 2 node 7 my e mtched to oth node 52 nd node 6.) The only restriction is tht node e mtched to t lest one other node. Thus, nding the correspondence etween the nodes of two trees consists essentilly of nding n edge cover 2 of their induced grph. The induced grph hs lrge numer of edge covers (this numer eing exponentil in the numer of nodes). However, we my intuitively oserve tht most of these possile edge covers of B re undesirle. For exmple, nd edge cover tht mps ll nodes in T to, nd ll nodes in T 2 to seems like d choice, since it corresponds to deleting ll the nodes of T nd then inserting ll the nodes of T 2. We will dene the correspondence etween n edge cover of n induced grph nd n edit script for the underlying trees formlly in Section 4, where we lso descrie how to compute n edit script corresponding to n edge cover. For now, we simply note tht, given n edge cover of the induced grph, we cn compute corresponding edit script for the underlying trees. Hence, we would like to select n edge cover of the induced grph tht corresponds to minimumcost edit script. 2 An edge cover of grph is suset K of the edges of the grph such tht ny node in the grph is incident on t lest one edge in K. 3.2 Pruning the Induced Grph We noted erlier tht mny of the potentil edge covers of the induced grph re undesirle ecuse they correspond to expensive nd undesirle edit scripts. Intuitively, we my therefore expect sustntil numer of the edges of the induced grph to e extrneous. Our next step, therefore, consists of removing (pruning) s mny of these extrneous edges s possile from the induced grph, y using some pruning rules. The pruning rules tht we use re conservtive, mening tht they remove only those edges tht we cn e sure re not needed y minimum-cost edit script. We discuss pruning rules in detil in Section 5.3, presenting only simple exmple here. As n exmple of the ction of simple pruning rule, consider the edge e = [5; 53], representing the correspondence etween nodes 5 nd 53 in Figure 2. Suppose tht the cost c U(; c) of updting the lel of node 5 to the lel c of node 53 is 3 units. Furthermore, let the cost of inserting node nd deleting node e unit ech. Then we cn sfely prune the edge [5; 53] ecuse, intuitively, given ny edge cover K tht includes the edge e, we cn generte nother edge cover tht excludes e, nd tht corresponds to n edit script tht is t lest s good s the one corresponding to K. As n illustrtion of such pruning, consider the edge cover K 2 = K? feg [ f[5; ]; [; 53]g. This edge cover corresponds to n edit script tht deletes the node 5, nd inserts the node 53. These two opertions cost totl of 2 units, which is less thn the cost of the updte opertion suggested y the edge e in edge cover K. We therefore conclude tht the edge [5; 53] in our running exmple my sfely e pruned. In Section 5.3 we present Pruning Rule 2, which is generliztion of this exmple Figure 3: The pruned induced grph for the trees in Figure Finding n Edge Cover By pplying the pruning rules (Section 5.3) to the induced grph of our running exmple, sy we otin the pruned induced grph depicted in Figure 3 (ignore for the present the dierence etween dotted nd solid lines in the gure). Although the pruned induced grph typiclly hs fr fewer edges thn the originl induced grph does, it my still contin more edges thn needed to form n edge cover. In Section 4.2 we will see tht we need only consider edge covers tht re miniml; tht is, edge covers tht re not proper supersets of ny edge cover. In other words, we would like to remove from the pruned induced grph those edges tht re not needed to cover nodes. For exmple, in the pruned induced grph shown in Figure 3, hving ll four of the edges [7; 6], [7; 63], [9; 6], nd [9; 63] is unnecessry; we my remove either [7; 63] nd [9; 6]; or [7; 6] nd [9; 63]. However, it is not possile to decide priori which of these options is the etter one; tht is, it is not ovious which choice would led to n edit script of lower cost. With pruning, on the other hnd, there ws no dout tht certin edges could e -

5 removed. One wy to decide mong these options is to enumerte ll possile miniml edge covers of the pruned induced grph, nd the edit script corresponding to ech one (using the method descried lter in Section 5), nd to pick the one with the lest cost. However, given the exponentilly lrge numer of edge covers, this is oviously not n ecient lgorithm. To compute n optiml edge cover eciently, we need to e le to determine how much ech edge in the edge cover contriutes to the totl cost of n edit script corresponding to n edge cover contining it. Tht is, we need to distriute the cost of the edit script corresponding to n edge cover over the individul edges of the edge cover. Once we hve cost dened for ech edge in the pruned induced grph, we cn nd minimum-cost edge cover using stndrd techniques sed on reducing the edge cover prolem to weighted mtching prolem [PS82, Lw76]. For exmple, if the edges [7; 6], [7; 63], [9; 6], nd [9; 63], hve costs 0,.3, 0.2, nd 2.4, respectively, then we generte n edge cover tht includes [7; 6] nd [9; 6], nd excludes [7; 63] nd [9; 6]. Note, however, tht such reduction of the edit script prolem to n edge cover (nd thus, weighted mtching) prolem cnnot e exct, given the hrdness of the edit script prolem. 3 Indeed, our method of ssigning costs to edges of the induced grph (Section 5.) is only pproximte, nd thus the minimum-cost edge cover is not gurnteed to produce the est solution for the edit script prolem. 3.4 Generting the Edit Script Returning to the pruned induced grph of our running exmple, let us ssume tht we hve gone through the process of determining the cost of ech edge, nd hve computed minimum-cost edge cover ccording to these costs, otining the edge cover represented y the old edges in Figure 3. Our next step consists of using this edge cover to compute n edit script tht trnsforms the tree T to the tree T 2. Our lgorithm CtoS (Cover-to-Script) for this purpose is descried in Section 5. Here, we riey illustrte some of the ides used y the lgorithm y considering its ction on n edge in the edge cover for our running exmple. 2 e 4 3 d T 5 6 f cc 7 8 c 9 d 0 cpy g 52 cc e c f cc 6 63 nil 5 T2 d c 62 d 64 Figure 4: Annotting edges in the edge cover of Figure 3 Consider the edge e = [7; 52] of the edge cover depicted y the old lines in Figure 3. In Figure 4, we depict this edge in reltion to the originl trees. (We lso depict two other edges from the edge cover. The edge cover edges re shown s dshed lines in Figure 4. We oserve tht there is one other edge in the edge cover tht is incident on node 7, viz. 3 unless P = N P, since we re considering polynomil-time reduction. [7; 6], suggesting tht the node 7 ws copied either directly, or indirectly (due to one of its ncestors eing copied). Furthermore, we note tht the prent (node 4) of node 7 is mtched to the prent (node 55) of node 6 (i.e., the edge [4; 55] exists in the edge cover), while the prent of node 52 is not mtched to the prent of node 7. This mtching of the prents suggests tht node 6 is the originl instnce of node 7, while node 52 is the copy. We therefore generte copy opertion tht copies the sutree rooted t node 7 to the loction of node 52. A convenient wy of depicting this copy opertion is y nnotting the corresponding edge ([7; 52] in our exmple) with cpy mrk; this scheme llows us to tlk out edit opertions without hving to refer to explicit node identiers. Edges tht do not correspond to ny edit opertion (e.g., [6; 57] in our exmple) re nnotted with nil mrk. In the sequel, we will use such edge nnottions interchngely with the ctul edit opertions tht they represent. Consider next the edges [8; 53] nd [8; 62]. Although oth these edge cover edges re incident on node 8, neither of them corresponds to cpy opertion, since the copy 52 of node 8 is generted \for free" when node 7 is copied. Therefore, oth these edges re nnotted nil. Proceeding thusly, we nnotte ll the edges in the edge cover of our running exmple, to otin the nnotted edge cover depicted in Figure 5, which shows only the edges with non-nil nnottions, for clrity. These nnottions correspond to the edit script (ins(g; ; f9g); mov(2; 6); cpy(7; )). We see tht this edit script is identicl to the one in Exmple 2., which hppens to e minimum cost edit script for our exmple. Of course, the ove edit opertions my lso e listed in the order (mov(2; 6); cpy(7; ); ins(g; ; f9g)). Both edit scripts hve the sme nl eect, nd hve the sme cost. In generl, ll edit scripts corresponding to set of nnotted edges hve the sme overll eect nd the sme cost. d 3 T 2 e f mov cc 7 8 c 9 d 0 ins cpy 5 T2 g 52 cc e c f cc 6 63 nil d c 62 d 64 Figure 5: Annotted edges of the edge cover of Figure 3 For the ove exmple mh-diff produces minimumcost edit script, ut it my sometimes not nd one with glolly minimum cost. In Section 6 we evlute how often this hppens nd we riey discuss how one could perform dditionl serching in the neighorhood of the script found y mh-diff. This concludes the overview of mh-diff. To summrize, the process consists of constructing n induced grph from the input trees, pruning the induced grph, nding minimum-cost edge cover of the pruned induced grph, nd nlly, using this edge cover to otin n edit script. In the following sections, we descrie these phses in detil. For ese of presenttion, we present these phses in different order thn the order in which they re performed. In prticulr, in Section 4, we egin y formlly dening the correspondence etween nd edit script nd n edge cover of the induced grph. In tht section, we lso descrie the

6 method for generting n edit script from n edge cover of the induced grph. In Section 5, we descrie how the cost of n edit script is distriuted over the edges of the corresponding edge cover of the induced grph. In tht section, we lso descrie how this cost function is pproximted y deriving upper nd lower ounds on the cost of n edge of the induced grph, nd how these ounds re used to prune the induced grph. Since nding minimum-cost edge cover for iprtite grph with xed edge costs is prolem tht hs een previously studied in the literture [PS82, Lw76], we do not present the detils in this pper. 4 Edge Covers nd Edit Scripts In this section, we descrie lgorithm CtoS, which genertes n edit script etween two trees, given n edge cover of their induced grph. Before we cn descrie this lgorithm, we need to understnd the reltionship etween n edit scripts etween two trees nd edge covers of their induced grph. Therefore, we rst dene the edge cover induced y n edit script. Tht is, we descrie how, given n edit script etween two trees, we generte n edge cover of the induced grph. (Note tht this process is the reverse of the process the lgorithm CtoS performs. However, denition of this reverse process is needed for the description of the lgorithm.) 4. Edge Cover Induced y n Edit Script In Section 3, we introduced the grph induced y two trees T nd T 2 s the complete iprtite grph B = (U; V; U V ), with U = N [ fg nd V = N 2 [ f g (where N nd N 2 re the nodes of T nd T 2, respectively). Let E e n edit script tht trnsforms T to T 2; tht is, T E! T 2. We now dene the edge cover K(E) induced y E. Intuitively, we otin K(E) s follows. Crete copy T 3 of T, nd introduce n edge etween ech node in T nd its copy in T 3. Apply the edit script to T 3, moving, copying, etc. the end-points of the edges with the nodes they re ttched to s nodes re moved, copied, etc. Thus, when n node n 2 T 3 is copied, producing node n 0, ny edge [m; n] is split to produce n new edge [m; n 0 ]. The other edit opertions re hndled nlogously. Furthermore, n edge etween the specil nodes nd is dded initilly, nd removed when it is no longer needed to cover either or. Due to spce limittions, we illustrte the denition of the edge cover induced y n edit script informlly using n exmple; the forml denition is in [CGM97]. T 2 e 4 d f - 9 e T3 39 cc d d d f cc c c 8 All edges [n, n30] exist implicitly 38 Figure 6: Exmple 4.: the initil edge cover Exmple 4. Consider the edit script from Exmple 2., nd the initil tree T from Figure. As descried ove, our rst step consists of creting copy T 3 of T, nd dding n edge etween ech node of T nd its counterprt in T 3. We lso dd the specil nodes nd, long with n edge connecting them. The result of this step is depicted in Figure 6. For clrity in presenttion, the edges etween the nodes of T nd their counterprts in T 3 re not shown in Figure 6; insted, we encode these edges using the node identiers of T nd T 2. Tht is, s indicted in the gure, imgine n edge [n; n 30]; 8n = : : : 0. T e 2 4 d f - cc 7 9 d 0 8 c All edges [n, n30] exist implicitly cc 42 c 43 e d 32 T3 g 4 f 37 cc 39 c d Figure 7: Exmple 4.: the nl edge cover Our next step consists of pplying the edit script from Exmple 2. to the tree T 3. To enle this ppliction of the edit script for T to T 3, we chnge the node identiers in the edit script from the identiers of the nodes of T to those of T 3, otining E = (ins(4; g; 3; f39g), mov(32; 36), cpy(37; 3)). As result of the ins opertion, node with identier 4 nd lel g is inserted s child of node 3, nd node 37 is mde its child. In ddition, we dd n edge [; 4] to the induced edge cover. Next, consider the ction of the mov opertion, which moves node 32 to ecome child of node 37. This opertion does not dd ny new edges to the edge cover. (The existing edges [2; 32] nd [3; 33] continue to exist.) Finlly, the cpy opertion cretes copy of the sutree rooted t node 36, nd inserts this copy s child of node 3. In ddition, the edges [7; 42] nd [8; 43] re dded to the edge cover. The result is depicted in Figure 7, (which lso omits edges [n; n 30]; 8n = : : : 0 for clrity). Note tht the trnsformed tree T 3 is now isomorphic to the tree T 2 in Exmple 2., so tht essentilly, we now hve n edge cover of the induced grph of T nd T Using Edge Covers to Generte Edit Scripts The gol of using n edge cover is tht it should cpture the essentil spects of n edit script; tht is, no importnt informtion should e lost in going from n edit script to the edge cover induced y it. However, there re certin edit scripts for which this property does not hold. For exmple, consider n edit script E 2 tht inserts node p s the prent of ten silings (children of the sme prent) n ; : : : ; n 0, then moves p to nother loction in the tree, nd nlly deletes p. The node p is sent from oth the initil tree nd the nl tree. Therefore, n edge cover of the initil nd nl trees contins no record of the temporry insertion of node p. Thus, we hve lost some informtion in going from E 2 to the edge cover. Is the fct tht our edge covers cnnot cpture edit scripts like E 2 prolem? On the one hnd, E 2 could e the minimum cost edit script mh-diff is trying to nd. For exmple, sy tht insert, delete, nd move opertions ll cost one unit. The cost of E 2 would then e the cost of one insert, plus the

7 cost of one move, plus the cost of one delete, for totl cost of 3. If we do not use the \ulk move trick" tht E 2 uses, we need to move ech of n ; : : : ; n 0 individully, for cost of 0. Thus, E 2 could e the minimum cost edit script, nd if we rule it out, then mh-diff would miss it. On the other hnd, scripts like E 2 do not represent trnsformtions tht re meningful or intuitive to n end user. In other words, if user sw E 2, he would not understnd why node p ws inserted, since it relly hs no function in his ppliction. True, the costs provided y the user re intended to descrie the desirility of edit opertions, ut if we use these numers we cn end up with \tricky" scripts like E 2 tht re more confusing thn helpful. Another exmple of potentilly unintuitive edit script is the following: Consider n edit script E 3 tht moves node n to ecome child of nother node n 2, then mkes severl copies of the sutree rooted t n 2 (thus mking copies of n s well), nd nlly deletes the originl copy of n. This edit script moves n to plce where it does not need to e (under n 2) only to generte free copies of n. The cuse of the unintuitive nture of the edit scripts descried ove is n interction etween dierent edit opertions, which gives rise to \compound" eect. For exmple, in the edit script E 2 ove, the eect of the move opertion is compounded ecuse it cts on node tht ws previously inserted. Similrly, in edit script E 3 ove, the eects of the copy opertions re compounded ecuse they ct on sutree into which node ws previously moved. Our pproch is to disllow such unintuitive compound eects. A simple wy of chrcterizing edit scripts tht disllow undesirle compound eects is to require edit opertions to occur in phses, nd to order the phses ppropritely. In the following discussion, we use the nmes ins, del, etc. to denote phses consisting of, respectively, ins opertions, del opertions, etc. First, we require tht the ins phse occur fter the del phse, so tht n edit script cnnot rst insert node nd then delete it. Next, we require the other edit phses (upd, mov, cpy, nd glu) to occur fter the del phse (so tht nodes operted on y these phses cnnot e lter deleted), nd efore the ins phse (so tht inserted nodes cnnot e operted on y these phses). Furthermore, we require tht the upd (respectively, mov) phse occur fter the cpy phse nd efore the glu phse, so tht n edit script cnnot compound the eect of n upd (respectively, mov) opertion y copying the updted node (nd similrly for glues). These ordering constrints yield the following order of edit phses: del, cpy, upd, mov, glu, ins. (We chose the reltive order of the upd nd mov phses ritrrily.) One dditionl restriction, not covered y the ove ordering constrint, is the following: A node in sutree operted on y cpy opertion cnnot e operted on y glu opertion. We cll edit scripts tht stisfy these restrictions structured edit scripts. In the sequel, we consider only structured edit scripts. Structured edit scripts hve the following importnt property tht llows us to consider only miniml edge covers in the sequel. (A miniml edge cover is n edge cover tht is not proper superset of ny edge cover.) Lemm 4. The edge cover induced y structured edit script is miniml. The reder my oserve tht, in ddition to disllowing unintuitive compound eects, the ove restrictions lso disllow some intuitive sequences of opertions. For exmple, structured edit script cnnot delete node produced s result of cpy opertion. Therefore, structured edit script cnnot copy sutree contining 00 nodes if 99 of them re needed, ecuse it would e unle to delete the unwnted copy of the 00th node. An nlogous sitution exists for ins nd glu opertions. Our lgorithms [CGM97] ctully do permit such deletions (clled ghost deletions) fter copies, nd insertions (clled ghost insertions) efore glues. For similr resons, we lso permit certin move opertions to occur efore the cpy phse. Furthermore, we llow move or copy opertion to destintion tht is currently unville (e.g., ecuse it is produced y copy opertion) to e \pused" until the destintion ecomes ville. Lemm 4. remins true under these weker restrictions. We now descrie how, given miniml edge cover K of the grph induced y trees T nd T 2, we compute minimum-cost edit script corresponding to this edge cover. As explined in Section 3, we lso represent the edit opertions of such n edit script s nnottions on the ected edges. Due to spce constrints, we do not present the full detils of our lgorithm CtoS (cover-to-script) in this pper, nd present insted rief explntion of the sic ides ehind the lgorithm. The detiled lgorithm is presented in [CGM97]. The lgorithm proceeds in phses tht roughly reect the phses of structured edit script descried ove. We refer to edges elonging to the given edge cover K s K-edges. We sy two nodes re mtched to ech other if there is K-edge connecting them. The rst phse of the lgorithms is the delete phse, in which we generte n edit opertion del(m) for ech node m tht is mtched to the specil node. We clim tht ny edit script tht mtches m to must contin this del opertion, due to the following oservtions: Firstly, ny node mtched to is sent from the nl tree. Furthermore, there re only two wys in which node cn e mde to dispper: either it is deleted explicitly, or it is glued to some other node. (We use here the fct tht structured edit scripts cnnot rst glue node to nother nd then delete the second node.) However, the second method will not result in m mtching in the edge cover induced y the script; insted, m will mtch the node to which it ws glued. Therefore we cn sfely produce del(m) opertion for ll such nodes m. The next phse of the lgorithm hndles copy opertions. In prticulr, it looks for sets two or more of K-edges incident on common node m 2 T. Note tht from Lemm 4., nd the oservtion tht miniml edge covers cnnot contin ny pth of length three, it follows tht if e = [m; n] is such n edge, there cn e no other K-edge incident on n. We cll such set of edges ower with se m. This set of edges represents copies of the node m. However, s we hve seen in Section 3, some of the copies of m could e produced s result of some ncestor of m eing copied. We cll such copies free copies of m. Our lgorithm considers owers in preorder of the se nodes. As copy opertions re generted for some node m, we lso keep trck of the numer of free copies of nodes in the copied sutree. Knowing the numer of ville free copies llows us to determine exctly which owers correspond to explicit copy opertions nd which correspond to implicit (free) copies. Furthermore, ny unused free copies re nodes tht need to e deleted fter the copy opertion is performed. These re the ghost deletions we introduced ove. Finlly, note tht free copy my need to e moved to its nl loction; this sitution is esily detected y checking whether the prents of the ected nodes mtch.

8 The updte phse of the lgorithm is strightforwrd, nd produces n updte opertion for ech edge [m; n] such tht the lels of m nd n dier. Since we re considering only structured edit scripts, there is no wy to void such n updte; in prticulr, \tricks" like updting node nd then copying it re disllowed. The glue nd delete phses of the lgorithm re nlogous to the copy nd insert phses, respectively. The detils re in [CGM97]. 5 Finding the Edge Cover In this section we descrie how mh-diff nds miniml edge cover of the induced grph. The resulting cover will serve s input to lgorithm CtoS (Section 4). Our gol is to nd not just ny miniml edge cover, ut one tht corresponds to minimum-cost edit script. Let us cll such n miniml edge cover the trget cover. Consider n edge e in our pruned induced grph. To get to the trget cover, mh-diff must decide whether e should e included in the cover. To rech this decision, it would e nice if mh-diff knew the \cost" of e. Tht is, if e remins in the trget cover, then it would e nnotted (y lgorithm CtoS) with some opertion, nd we could sy tht the cost of this opertion is the cost of e. Unfortuntely, we hve \chicken nd the egg prolem" here: CtoS cnnot run until we hve the trget cover, nd we cnnot get the trget cover until we know the costs it will imply. To rek the impsse, our pproch uses the following ide: Insted of trying to compute the ctul cost of e, we compute n upper nd lower ound to this cost. These ounds cn e computed without the knowledge of which other edges re included in the trget cover, nd serve two purposes: Firstly, they llow us to design pruning rules tht re used to conservtively eliminte unnecessry edges from the induced grph. Secondly, fter pruning, the ounds cn guide our serch for the trget cover. As n enhncement, we ctully use vrition on the edge cost suggested ove. The following exmple shows tht simply \chrging" ech nnottion to the edge it is on is not entirely \fir." We re given tree T contining two nodes, n nd n 2 with the sme lel l. Furthermore n hs children n nd n 2 with lels nd, respectively, nd n 2 hs children n 2 nd n 22 with lels c nd d, respectively. Suppose T 2 is logicl copy of T. (Tht is, T nd T 2 re isomorphic.) Consider n edge cover tht mtches ech node in T to its copy in T 2 except tht it \cross mtches" n nd n 2 cross the trees, s shown in Figure 8. Given this edge cover, lgorithm CtoS will produce move opertion for ech of the nodes n, n 2, n 2, nd n 22. However, these move opertions were cused not y ny mismtching of the nodes n, n 2, n 2, or n 22, ut insted, y the mismtching of n nd n 2. Therefore it would e intuitively more fir to chrge these move opertions to the edges responsile for the mismtch, viz. [n ; n 0 2] nd [n 2; n 0 ]. To chieve this, we use the following scheme: If e is nnotted with ins, del, or upd in the trget cover, we do chrge e for this opertion. However, if e is nnotted y mov, cpy, or glu, then the prent of e, nd not e is chrged. We cll the edge costs computed in such fshion fir costs, nd dene them elow: m m n0 n0 l l l l n n2 n n2 n n2 n2 n22 n n2 n2 n22 mov mov mov Figure 8: Distriuting edge costs firly 5. An Edge-wise Cost Function Let K e n nnotted miniml edge cover. For n edge e 2 K, if the nnottion on e is mov, cpy, or glu, let c x(e) denote the cost of tht opertion. If e is nnotted with ins, del, or upd, then let c s(e) denote the cost of the opertion. Furthermore, let E(m) e the set of edges in K tht re incident on m, tht is, E(m) = f[m; n] 2 Kg. Let C(m) e the set of the children of m. We then dene the fir cost of ech edge [m; n] 2 K s follows: c K([m; n]) = c s(m; n) 2jE(m)j 2jE(n)j m 0 2C(m) [m 0 ;n 0 ]2K n 0 2C(n) [m 0 ;n 0 ]2K mov c x([m 0 ; n 0 ]) c x([m 0 ; n 0 ]) () Note tht this cost depends on K, nd thus is not function of e lone. The following lemm, proved in [CGM97], sttes tht the ove scheme of distriuting the cost of n edge cover over its component edges is sound one; tht is, dding up the cost edge-wise yields the overll cost of the edge cover (i.e., the cost of the corresponding edit script). Lemm 5. If K is n nnotted, miniml edge cover of the grph induced y two trees, then c(k) = P e2k ck(e). 5.2 Bounds on Edge Costs Although Lemm 5. suggests method of distriuting the cost of n nnotted edge cover (nd thus n edit script) over the component edges, the cost of ech edge depends on the other edges present in the edge cover, nd is thus not directly useful for computing minimum-cost edge cover. However, we use tht distriution scheme to derive upper nd lower ounds on the fir cost c K(e) of n edge e over ll miniml edge covers K. Intuitively, given tht the cost of ny upd nnottion on n edge is chrged to tht edge (y Eqution ), simple choice for the lower ound on the cost of n edge [m; n] is simply the cost c u(m; n) of updting the lel m to tht of n. However, we cn do little etter. In some cses, selecting n edge [m; n] (s prt of the edge cover eing constructed) my force some of the children m 0 of m to e moved to n. In prticulr, this hppens for those children of m 0 for which there is no edge tht could possily mtch m 0 to child of n. We cll such moves forced moves. In cses where we cn determine forced move exists, the cost of mov is dded to the lower ound cost. However, ccording to Eqution not ll the cost of forced move goes to edge [m; n]. In the worst

9 cse, the numer of edges incident on m, je(m)j, is lrge, leving [m; n] with n insignicnt contriution. However, if je(m)j is greter thn, we know y Lemm 4. tht je(n)j =, so forced moves on the n side would contriute to [m; n]. Thus, we my dd the minimum of the second nd the third terms in Eqution to the lower ound function. Formlly, let E e the set of edges in the induced grph of T nd T 2. 4 We dene the forced move cost, c mf (m 0 ; n) of node m 0 2 T with respect to nother node n 2 T 2 s follows: c mf (m 0 ; n) = c m, if 69n 0 2 C(n) such tht [m 0 ; n 0 ] 2 E, nd 0 otherwise. The cost c mf(m; n 0 ) is dened nlogously. We then dene the lower ound fir cost, c l, of n edge s follows: c l([m; n]) = c u(m; n) 8 <: 2 min m 0 2C(m) c mf (m 0 ; n); n 0 2C(n) c mf(m; n 0 ) 9= ; To help us compute the upper ound, let us now dene conditionl move cost, c mc. Intuitively, c mc(m 0 ; n) costs one mov cost unless there is prtner of m 0 tht is child of n. Formlly, c mc(m 0 ; n) = 0, if 9n 0 2 C(n) such tht [m 0 ; n 0 ] 2 E, nd c m otherwise. The cost c mc(n 0 ; m) is de- ned nlogously. Furthermore, dene c w(m; n) = c u(m; n) if m nd n re regulr nodes, 0 if (m = ) ^ (n = ), c i if(m = ) ^ (n 6= ), nd c d if (m 6= ) ^ (n = ). Using resoning similr to tht used for deriving the lower ound cost ove, we rrive t the following denition for the upper ound fir cost, c u, of n edge: c u([m; n]) = c w(m; n) 2 2 m 0 2C(m) n 0 2C(n) (c c(je(m 0 )j? ) c mc(m 0 ; n)) (c g(je(n 0 )j? ) c m?(n 0 ; m)) Note tht oth c u(e) nd c l(e) cn e computed y mh-diff without knowing the trget cover. Furthermore, the following lemm, proved in [CGM97], sttes tht the ove denitions of c u(e) nd c l(e), re upper nd lower ounds, respectively, on the fir cost contriution c K(e) of edge e to ny miniml edge cover K tht contins e. Lemm 5.2 Let B = (U; V; E) e the iprtite grph induced y trees T nd T 2. Let B 0 = (U; V; E 0 ), where E 0 E. Let K denote the collection of ll miniml edge covers of B 0. We then hve the following inequlities: c l(e) min c K(e) nd c u(e) mx c K(e) K2K K2K 5.3 Pruning Rules We now use the upper nd lower ound functions for the cost of n edge s dened ove to introduce the pruning rules we use to reduce the size of the induced grph of the two trees eing compred. Let e = [m; n] e ny edge in 4 As we will see lter, lthough E initilly includes ll edges in the complete iprtite grph, the pruning of edges results in successive reduction of the size of E. the induced grph. Let e 2 e ny edge incident on m, nd let e 3 e ny edge incident on n. Intuitively, our rst pruning rules removes n edge with lower ound cost tht is so high tht it is preferle to mtch ech of its nodes using some other edge tht hs suitly low upper ound cost. Pruning Rule Let C t = mxfc m; c c; c gg. c u(e 2) c u(e 3) 2C t then prune e. If c l(e ) Exmple 5. To illustrte this rule, consider tree T contining, mong others, two childless nodes (lel f) nd 2 (lel g). Similrly, T 2 contins childless nodes 3 (lel g) nd 4 (lel f), mong others. Sy the costs c m, c c, nd c g re one unit ech, while the updte costs re c u(f; g) = 3, nd c u(f; f) = c u(g; g) = 0. Let us now consider if edge e = [; 3] cn e pruned ecuse edges e 2 = [; 4] nd e 3 = [2; 3] exist. Since the nodes hve no children, it is esy to compute c l(e ) = c u(f; g) = 3, c u(e 2) = c u(f; f) = 0, nd c u(e 3) = c u(g; g) = 0. Since C t =, we see tht Pruning Rule holds nd e cn e sfely removed. The intuition is tht in the worst cse we cn replce e y edges e 2 nd e 3. Using the ltter edges could introduce t most the costs c u(e 2) nd c u(e 3), plus the cost of two mov, cpy, or glu opertions. The lst fctor cn rise, for instnce, if node 2 ends up eing mtched not only to node 3 ut to nother node in T 2. This mens tht node 2 needs to e copied, which would not hve een necessry if we hd kept edge e nd not used e 2. Similrly, the removl of edge e my cuse n extr glue opertion for node 4. However, even in this worst cse scenrio, the costs would e less thn the cost of updting the lel of node to tht of node 2, so we cn sfely remove the [; 2] edge. Our second pruning rule (lredy illustrted in Section 3) sttes tht if it is less expensive to delete node nd insert nother, we do not need to consider mtching the two nodes to ech other. More precisely, we stte the following: Pruning Rule 2 If c l(e ) c d(m) c i(n) then prune e. Note tht the ove pruning rules re simpler to pply if we let e 2 nd e 3 e the minimum-cost edge incident on m nd n, respectively. The following lemm, proved in [CGM97], tells us tht the pruning rules re conservtive: Lemm 5.3 Let E p e the set of edges pruned y repeted ppliction of Pruning Rules nd 2. Let K e ny miniml edge cover of the grph B. There exists miniml edge cover K 2 such tht () K 2\E p = ;, nd (2) C(K 2) C(K ). The pruning phse of our lgorithm consists of repetedly pplying Pruning Rules nd 2. Note tht the sence of edges rises the lower ound function, nd lowers the upper ound function, thus possily cusing more edges to get pruned. Our lgorithm updtes the cost ounds for the edges ected y the pruning of n edge whenever the edge is pruned. By mintining the pproprite dt structures, such cost-updte step fter n edge is pruned cn e performed in O(logn) time, where n is the numer of nodes in the induced grph. 5.4 Computing Min-Cost Edge Cover After ppliction of the pruning rules descried ove, we otin pruned induced grph, contining (typiclly smll)

10 suset of the edges in the originl induced grph. In fvorle cses, the remining edges contin only one miniml edge cover. However, typiclly, there my e severl miniml edge covers possile for the pruned induced grph. We now descrie how we select one of these miniml edge covers. We rst pproximte the fir cost of every edge e tht remins fter pruning y its lower ound e l(e). (We could hve lso use the upper ound, or n verge of oth ounds, since this is only n estimte.) Then, given these constnt estimted costs, we compute minimum-cost edge cover y reducing the edge cover prolem to iprtite weighted mtching prolem, s suggested in [PS82]. Since the weighted mtching prolem cn e solved using stndrd techniques, we do not present the detils in this pper, noting only tht given iprtite grph with n nodes nd e edges, the weighted mtching prolem cn e solved in time O(ne). For our ppliction, e is the numer of edges tht remin in the induced grph fter pruning. 6 Implementtion nd Performnce In this section, we descrie our implementtion of mh-diff, nd discuss its nlyticl nd empiricl performnce. Figure 9 depicts the overll rchitecture of our implementtion, with rectngles representing the modules (numered, for reference) of the progrm, nd other shpes representing dt. Given two trees T nd T 2 s input, Module constructs the induced grph (Section 3.). This induced grph is next pruned (Module 2) using the pruning rules of Section 5.3 to give the pruned induced grph. In Module 2, the updte cost for ech edge in the induced grph is computed using the domin-dependent comprison function for node lels (Section 2.2). The next three modules together compute minimum-cost edge cover of the pruned induced grph using the reduction of the edge cover prolem to weighted mtching prolem [PS82]. Tht is, the pruned induced grph is rst trnslted (y Module 3) into n instnce of weighted mtching prolem. This weighted mtching prolem is solved using pckge (Module 4) [Rot] sed on stndrd techniques [PS82]. The output of the weighted mtching solver is minimum-cost mtching, which is trnslted y Module 5 into K 0, minimum-cost edge cover of the pruned induced grph. Next, Module 6 uses the minimum-cost edge cover computed, to produce the desired edit script, using the method descried in Section 4.2). T T2 (4) weighted mtching solver min-cost mtching () Induced Grph Builder wt. mtching prolem (5) Mtching to cover trnsltor Induced Grph (3) Edge cover to wt. mtch Trnsltor K0 min-cost edge cover Edit Script Figure 9: System Architecture (2) Pruner Pruned Induced Grph (6) Cover to Script Recll tht since we use heuristic cost function to compute minimum-cost edge cover, the edge cover produced y our progrm, nd hence the edit script my not e the optiml one. We hve lso implemented simple serch module tht strts with minimum-cost edge cover K 0 (see Figure 9) computed y our progrm nd explores its neighorhood of miniml edge covers in n eort to nd etter solution. The serch proceeds y rst exploring miniml edge covers tht contin only one edge not in K 0. Next, we explore miniml edge covers contining two edges not in K 0, nd so on. The intuition is tht we expect the optiml solution to e \close" to the initil solution K 0. Although, in the worst cse, such n explortion my e extremely time-consuming, note tht s result of pruning edges, the serch spce is typiclly much smller thn the worst cse. Due to spce constrints, we do not descrie the detils of this serch phse in this pper. We hve used our implementtion to compute the dierences etween query results s prt of the Tsimmis nd C 3 projects t Stnford [CGMH 94, WU95]. These projects use the oem dt model, which is simple leled-oject model to represent tree-structured query results. In prticulr, we hve run our system on the output of Tsimmis queries over iliogrphic informtion source tht contins informtion out dtse-relted pulictions in formt similr to BiTe. Since the dt in this informtion source is minly textul, we tret ll lels s strings. For the domin-dependent lel-updte cost function, we use weighted chrcter-frequency histogrm dierence scheme tht compres strings sed on the numer of occurrences of ech chrcter of the lphet in them. For exmple, consider compring the lels \foor" nd \crowr." The chrcter-frequency histogrms re, respectively, ( : ; : ; f : ; o : 2; r : ) nd ( : ; : ; c : ; o : ; r : 2; w : ). The difference histogrm is (c :?; f : ; o : ; r :?; w :?). Adding up the mgnitudes of the dierences gives us 5, which we then normlize y the totl numer of chrcters in the strings (3), nd scle y prmeter (currently 5), to get the updte cost (5=3) 5 = :9. Let us now nlyze the running time of our progrm. Let n e the totl numer of nodes in oth input trees T nd T 2. Constructing the induced grph (Module, in Figure 9) involves uilding complete iprtite grph with O(n) nodes on ech side. We lso evlute the domin-dependent lelcomprison function for ech pir of nodes, nd store this cost on the corresponding edge. Thus, uilding the induced grph requires time O(kn 2 ), where k is the cost of the domin-dependent comprison function. Next, consider the pruning phse (Module 2). By mintining priority queue (sed on edge costs) of edges incident on ech node of the induced grph, the test to determine whether n edge my e pruned cn e performed in constnt time. If the edge is pruned, removing it from the induced grph requires constnt time, while removing it from the priority queues t ech of its nodes requires O(logn) time. When n edge [m; n] is pruned, we lso record the chnges to the costs c mc(m; p(n)), c mc(n; p(m)), c mf (m; p(n)), nd c mf (n; p(m)), which cn e done in constnt time. Thus, pruning n edge requires O(logn) time. Since t most O(n 2 ) re pruned, the totl worst cse cost of the pruning phse is O(n 2 logn). Let e e the numer of edges tht remin in the induced grph fter pruning. The minimum-cost edge cover is computed in time O(ne) y Modules 3, 4, nd 5. The computtion of the edit script from the minimum-cost edge cover cn e done in O(n) time y Module 6. (Note tht the numer of edges

COMP 423 lecture 11 Jan. 28, 2008

COMP 423 lecture 11 Jan. 28, 2008 COMP 423 lecture 11 Jn. 28, 2008 Up to now, we hve looked t how some symols in n lphet occur more frequently thn others nd how we cn sve its y using code such tht the codewords for more frequently occuring

More information

What are suffix trees?

What are suffix trees? Suffix Trees 1 Wht re suffix trees? Allow lgorithm designers to store very lrge mount of informtion out strings while still keeping within liner spce Allow users to serch for new strings in the originl

More information

2 Computing all Intersections of a Set of Segments Line Segment Intersection

2 Computing all Intersections of a Set of Segments Line Segment Intersection 15-451/651: Design & Anlysis of Algorithms Novemer 14, 2016 Lecture #21 Sweep-Line nd Segment Intersection lst chnged: Novemer 8, 2017 1 Preliminries The sweep-line prdigm is very powerful lgorithmic design

More information

Presentation Martin Randers

Presentation Martin Randers Presenttion Mrtin Rnders Outline Introduction Algorithms Implementtion nd experiments Memory consumption Summry Introduction Introduction Evolution of species cn e modelled in trees Trees consist of nodes

More information

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs. Lecture 5 Wlks, Trils, Pths nd Connectedness Reding: Some of the mteril in this lecture comes from Section 1.2 of Dieter Jungnickel (2008), Grphs, Networks nd Algorithms, 3rd edition, which is ville online

More information

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

In the last lecture, we discussed how valid tokens may be specified by regular expressions. LECTURE 5 Scnning SYNTAX ANALYSIS We know from our previous lectures tht the process of verifying the syntx of the progrm is performed in two stges: Scnning: Identifying nd verifying tokens in progrm.

More information

Fig.25: the Role of LEX

Fig.25: the Role of LEX The Lnguge for Specifying Lexicl Anlyzer We shll now study how to uild lexicl nlyzer from specifiction of tokens in the form of list of regulr expressions The discussion centers round the design of n existing

More information

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1):

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1): Overview (): Before We Begin Administrtive detils Review some questions to consider Winter 2006 Imge Enhncement in the Sptil Domin: Bsics of Sptil Filtering, Smoothing Sptil Filters, Order Sttistics Filters

More information

A dual of the rectangle-segmentation problem for binary matrices

A dual of the rectangle-segmentation problem for binary matrices A dul of the rectngle-segmenttion prolem for inry mtrices Thoms Klinowski Astrct We consider the prolem to decompose inry mtrix into smll numer of inry mtrices whose -entries form rectngle. We show tht

More information

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries Tries Yufei To KAIST April 9, 2013 Y. To, April 9, 2013 Tries In this lecture, we will discuss the following exct mtching prolem on strings. Prolem Let S e set of strings, ech of which hs unique integer

More information

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming Lecture 10 Evolutionry Computtion: Evolution strtegies nd genetic progrmming Evolution strtegies Genetic progrmming Summry Negnevitsky, Person Eduction, 2011 1 Evolution Strtegies Another pproch to simulting

More information

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis CS143 Hndout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexicl Anlysis In this first written ssignment, you'll get the chnce to ply round with the vrious constructions tht come up when doing lexicl

More information

10.5 Graphing Quadratic Functions

10.5 Graphing Quadratic Functions 0.5 Grphing Qudrtic Functions Now tht we cn solve qudrtic equtions, we wnt to lern how to grph the function ssocited with the qudrtic eqution. We cll this the qudrtic function. Grphs of Qudrtic Functions

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Dt Mining y I. H. Witten nd E. Frnk Simplicity first Simple lgorithms often work very well! There re mny kinds of simple structure, eg: One ttriute does ll the work All ttriutes contriute eqully

More information

Algorithm Design (5) Text Search

Algorithm Design (5) Text Search Algorithm Design (5) Text Serch Tkshi Chikym School of Engineering The University of Tokyo Text Serch Find sustring tht mtches the given key string in text dt of lrge mount Key string: chr x[m] Text Dt:

More information

MATH 25 CLASS 5 NOTES, SEP

MATH 25 CLASS 5 NOTES, SEP MATH 25 CLASS 5 NOTES, SEP 30 2011 Contents 1. A brief diversion: reltively prime numbers 1 2. Lest common multiples 3 3. Finding ll solutions to x + by = c 4 Quick links to definitions/theorems Euclid

More information

Dr. D.M. Akbar Hussain

Dr. D.M. Akbar Hussain Dr. D.M. Akr Hussin Lexicl Anlysis. Bsic Ide: Red the source code nd generte tokens, it is similr wht humns will do to red in; just tking on the input nd reking it down in pieces. Ech token is sequence

More information

Graphs with at most two trees in a forest building process

Graphs with at most two trees in a forest building process Grphs with t most two trees in forest uilding process rxiv:802.0533v [mth.co] 4 Fe 208 Steve Butler Mis Hmnk Mrie Hrdt Astrct Given grph, we cn form spnning forest y first sorting the edges in some order,

More information

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards A Tutology Checker loosely relted to Stålmrck s Algorithm y Mrtin Richrds mr@cl.cm.c.uk http://www.cl.cm.c.uk/users/mr/ University Computer Lortory New Museum Site Pemroke Street Cmridge, CB2 3QG Mrtin

More information

Outline. Introduction Suffix Trees (ST) Building STs in linear time: Ukkonen s algorithm Applications of ST

Outline. Introduction Suffix Trees (ST) Building STs in linear time: Ukkonen s algorithm Applications of ST Suffi Trees Outline Introduction Suffi Trees (ST) Building STs in liner time: Ukkonen s lgorithm Applictions of ST 2 3 Introduction Sustrings String is ny sequence of chrcters. Sustring of string S is

More information

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus Unit #9 : Definite Integrl Properties, Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl

More information

4452 Mathematical Modeling Lecture 4: Lagrange Multipliers

4452 Mathematical Modeling Lecture 4: Lagrange Multipliers Mth Modeling Lecture 4: Lgrnge Multipliers Pge 4452 Mthemticl Modeling Lecture 4: Lgrnge Multipliers Lgrnge multipliers re high powered mthemticl technique to find the mximum nd minimum of multidimensionl

More information

Definition of Regular Expression

Definition of Regular Expression Definition of Regulr Expression After the definition of the string nd lnguges, we re redy to descrie regulr expressions, the nottion we shll use to define the clss of lnguges known s regulr sets. Recll

More information

Today. CS 188: Artificial Intelligence Fall Recap: Search. Example: Pancake Problem. Example: Pancake Problem. General Tree Search.

Today. CS 188: Artificial Intelligence Fall Recap: Search. Example: Pancake Problem. Example: Pancake Problem. General Tree Search. CS 88: Artificil Intelligence Fll 00 Lecture : A* Serch 9//00 A* Serch rph Serch Tody Heuristic Design Dn Klein UC Berkeley Multiple slides from Sturt Russell or Andrew Moore Recp: Serch Exmple: Pncke

More information

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5 CS321 Lnguges nd Compiler Design I Winter 2012 Lecture 5 1 FINITE AUTOMATA A non-deterministic finite utomton (NFA) consists of: An input lphet Σ, e.g. Σ =,. A set of sttes S, e.g. S = {1, 3, 5, 7, 11,

More information

Section 10.4 Hyperbolas

Section 10.4 Hyperbolas 66 Section 10.4 Hyperbols Objective : Definition of hyperbol & hyperbols centered t (0, 0). The third type of conic we will study is the hyperbol. It is defined in the sme mnner tht we defined the prbol

More information

OUTPUT DELIVERY SYSTEM

OUTPUT DELIVERY SYSTEM Differences in ODS formtting for HTML with Proc Print nd Proc Report Lur L. M. Thornton, USDA-ARS, Animl Improvement Progrms Lortory, Beltsville, MD ABSTRACT While Proc Print is terrific tool for dt checking

More information

Ma/CS 6b Class 1: Graph Recap

Ma/CS 6b Class 1: Graph Recap M/CS 6 Clss 1: Grph Recp By Adm Sheffer Course Detils Adm Sheffer. Office hour: Tuesdys 4pm. dmsh@cltech.edu TA: Victor Kstkin. Office hour: Tuesdys 7pm. 1:00 Mondy, Wednesdy, nd Fridy. http://www.mth.cltech.edu/~2014-15/2term/m006/

More information

The Greedy Method. The Greedy Method

The Greedy Method. The Greedy Method Lists nd Itertors /8/26 Presenttion for use with the textook, Algorithm Design nd Applictions, y M. T. Goodrich nd R. Tmssi, Wiley, 25 The Greedy Method The Greedy Method The greedy method is generl lgorithm

More information

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have Rndom Numers nd Monte Crlo Methods Rndom Numer Methods The integrtion methods discussed so fr ll re sed upon mking polynomil pproximtions to the integrnd. Another clss of numericl methods relies upon using

More information

Notes for Graph Theory

Notes for Graph Theory Notes for Grph Theory These re notes I wrote up for my grph theory clss in 06. They contin most of the topics typiclly found in grph theory course. There re proofs of lot of the results, ut not of everything.

More information

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011

CSCI 3130: Formal Languages and Automata Theory Lecture 12 The Chinese University of Hong Kong, Fall 2011 CSCI 3130: Forml Lnguges nd utomt Theory Lecture 12 The Chinese University of Hong Kong, Fll 2011 ndrej Bogdnov In progrmming lnguges, uilding prse trees is significnt tsk ecuse prse trees tell us the

More information

Reducing a DFA to a Minimal DFA

Reducing a DFA to a Minimal DFA Lexicl Anlysis - Prt 4 Reducing DFA to Miniml DFA Input: DFA IN Assume DFA IN never gets stuck (dd ded stte if necessry) Output: DFA MIN An equivlent DFA with the minimum numer of sttes. Hrry H. Porter,

More information

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy Recognition of Tokens if expressions nd reltionl opertors if è if then è then else è else relop

More information

An Algorithm for Enumerating All Maximal Tree Patterns Without Duplication Using Succinct Data Structure

An Algorithm for Enumerating All Maximal Tree Patterns Without Duplication Using Succinct Data Structure , Mrch 12-14, 2014, Hong Kong An Algorithm for Enumerting All Mximl Tree Ptterns Without Dupliction Using Succinct Dt Structure Yuko ITOKAWA, Tomoyuki UCHIDA nd Motoki SANO Astrct In order to extrct structured

More information

CS201 Discussion 10 DRAWTREE + TRIES

CS201 Discussion 10 DRAWTREE + TRIES CS201 Discussion 10 DRAWTREE + TRIES DrwTree First instinct: recursion As very generic structure, we could tckle this problem s follows: drw(): Find the root drw(root) drw(root): Write the line for the

More information

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe

CSCI 104. Rafael Ferreira da Silva. Slides adapted from: Mark Redekopp and David Kempe CSCI 0 fel Ferreir d Silv rfsilv@isi.edu Slides dpted from: Mrk edekopp nd Dvid Kempe LOG STUCTUED MEGE TEES Series Summtion eview Let n = + + + + k $ = #%& #. Wht is n? n = k+ - Wht is log () + log ()

More information

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016 Solving Prolems y Serching CS 486/686: Introduction to Artificil Intelligence Winter 2016 1 Introduction Serch ws one of the first topics studied in AI - Newell nd Simon (1961) Generl Prolem Solver Centrl

More information

Intermediate Information Structures

Intermediate Information Structures CPSC 335 Intermedite Informtion Structures LECTURE 13 Suffix Trees Jon Rokne Computer Science University of Clgry Cnd Modified from CMSC 423 - Todd Trengen UMD upd Preprocessing Strings We will look t

More information

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. General Tree Search. Uniform Cost. Lecture 3: A* Search 9/4/2007

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. General Tree Search. Uniform Cost. Lecture 3: A* Search 9/4/2007 CS 88: Artificil Intelligence Fll 2007 Lecture : A* Serch 9/4/2007 Dn Klein UC Berkeley Mny slides over the course dpted from either Sturt Russell or Andrew Moore Announcements Sections: New section 06:

More information

COMBINATORIAL PATTERN MATCHING

COMBINATORIAL PATTERN MATCHING COMBINATORIAL PATTERN MATCHING Genomic Repets Exmple of repets: ATGGTCTAGGTCCTAGTGGTC Motivtion to find them: Genomic rerrngements re often ssocited with repets Trce evolutionry secrets Mny tumors re chrcterized

More information

A Heuristic Approach for Discovering Reference Models by Mining Process Model Variants

A Heuristic Approach for Discovering Reference Models by Mining Process Model Variants A Heuristic Approch for Discovering Reference Models by Mining Process Model Vrints Chen Li 1, Mnfred Reichert 2, nd Andres Wombcher 3 1 Informtion System Group, University of Twente, The Netherlnds lic@cs.utwente.nl

More information

Suffix trees, suffix arrays, BWT

Suffix trees, suffix arrays, BWT ALGORITHMES POUR LA BIO-INFORMATIQUE ET LA VISUALISATION COURS 3 Rluc Uricru Suffix trees, suffix rrys, BWT Bsed on: Suffix trees nd suffix rrys presenttion y Him Kpln Suffix trees course y Pco Gomez Liner-Time

More information

UT1553B BCRT True Dual-port Memory Interface

UT1553B BCRT True Dual-port Memory Interface UTMC APPICATION NOTE UT553B BCRT True Dul-port Memory Interfce INTRODUCTION The UTMC UT553B BCRT is monolithic CMOS integrted circuit tht provides comprehensive MI-STD- 553B Bus Controller nd Remote Terminl

More information

ON THE DEHN COMPLEX OF VIRTUAL LINKS

ON THE DEHN COMPLEX OF VIRTUAL LINKS ON THE DEHN COMPLEX OF VIRTUAL LINKS RACHEL BYRD, JENS HARLANDER Astrct. A virtul link comes with vriety of link complements. This rticle is concerned with the Dehn spce, pseudo mnifold with oundry, nd

More information

CS481: Bioinformatics Algorithms

CS481: Bioinformatics Algorithms CS481: Bioinformtics Algorithms Cn Alkn EA509 clkn@cs.ilkent.edu.tr http://www.cs.ilkent.edu.tr/~clkn/teching/cs481/ EXACT STRING MATCHING Fingerprint ide Assume: We cn compute fingerprint f(p) of P in

More information

9 Graph Cutting Procedures

9 Graph Cutting Procedures 9 Grph Cutting Procedures Lst clss we begn looking t how to embed rbitrry metrics into distributions of trees, nd proved the following theorem due to Brtl (1996): Theorem 9.1 (Brtl (1996)) Given metric

More information

Position Heaps: A Simple and Dynamic Text Indexing Data Structure

Position Heaps: A Simple and Dynamic Text Indexing Data Structure Position Heps: A Simple nd Dynmic Text Indexing Dt Structure Andrzej Ehrenfeucht, Ross M. McConnell, Niss Osheim, Sung-Whn Woo Dept. of Computer Science, 40 UCB, University of Colordo t Boulder, Boulder,

More information

Information Retrieval and Organisation

Information Retrieval and Organisation Informtion Retrievl nd Orgnistion Suffix Trees dpted from http://www.mth.tu.c.il/~himk/seminr02/suffixtrees.ppt Dell Zhng Birkeck, University of London Trie A tree representing set of strings { } eef d

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distriuted Systems Principles nd Prdigms Chpter 11 (version April 7, 2008) Mrten vn Steen Vrije Universiteit Amsterdm, Fculty of Science Dept. Mthemtics nd Computer Science Room R4.20. Tel: (020) 598 7784

More information

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. Example: Pancake Problem. Example: Pancake Problem

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. Example: Pancake Problem. Example: Pancake Problem Announcements Project : erch It s live! Due 9/. trt erly nd sk questions. It s longer thn most! Need prtner? Come up fter clss or try Pizz ections: cn go to ny, ut hve priority in your own C 88: Artificil

More information

Typing with Weird Keyboards Notes

Typing with Weird Keyboards Notes Typing with Weird Keyords Notes Ykov Berchenko-Kogn August 25, 2012 Astrct Consider lnguge with n lphet consisting of just four letters,,,, nd. There is spelling rule tht sys tht whenever you see n next

More information

AI Adjacent Fields. This slide deck courtesy of Dan Klein at UC Berkeley

AI Adjacent Fields. This slide deck courtesy of Dan Klein at UC Berkeley AI Adjcent Fields Philosophy: Logic, methods of resoning Mind s physicl system Foundtions of lerning, lnguge, rtionlity Mthemtics Forml representtion nd proof Algorithms, computtion, (un)decidility, (in)trctility

More information

MTH 146 Conics Supplement

MTH 146 Conics Supplement 105- Review of Conics MTH 146 Conics Supplement In this section we review conics If ou ne more detils thn re present in the notes, r through section 105 of the ook Definition: A prol is the set of points

More information

ASTs, Regex, Parsing, and Pretty Printing

ASTs, Regex, Parsing, and Pretty Printing ASTs, Regex, Prsing, nd Pretty Printing CS 2112 Fll 2016 1 Algeric Expressions To strt, consider integer rithmetic. Suppose we hve the following 1. The lphet we will use is the digits {0, 1, 2, 3, 4, 5,

More information

GENERATING ORTHOIMAGES FOR CLOSE-RANGE OBJECTS BY AUTOMATICALLY DETECTING BREAKLINES

GENERATING ORTHOIMAGES FOR CLOSE-RANGE OBJECTS BY AUTOMATICALLY DETECTING BREAKLINES GENEATING OTHOIMAGES FO CLOSE-ANGE OBJECTS BY AUTOMATICALLY DETECTING BEAKLINES Efstrtios Stylinidis 1, Lzros Sechidis 1, Petros Ptis 1, Spiros Sptls 2 Aristotle University of Thessloniki 1 Deprtment of

More information

A New Learning Algorithm for the MAXQ Hierarchical Reinforcement Learning Method

A New Learning Algorithm for the MAXQ Hierarchical Reinforcement Learning Method A New Lerning Algorithm for the MAXQ Hierrchicl Reinforcement Lerning Method Frzneh Mirzzdeh 1, Bbk Behsz 2, nd Hmid Beigy 1 1 Deprtment of Computer Engineering, Shrif University of Technology, Tehrn,

More information

Efficient Algorithms For Optimizing Policy-Constrained Routing

Efficient Algorithms For Optimizing Policy-Constrained Routing Efficient Algorithms For Optimizing Policy-Constrined Routing Andrew R. Curtis curtis@cs.colostte.edu Ross M. McConnell rmm@cs.colostte.edu Dn Mssey mssey@cs.colostte.edu Astrct Routing policies ply n

More information

INTRODUCTION TO SIMPLICIAL COMPLEXES

INTRODUCTION TO SIMPLICIAL COMPLEXES INTRODUCTION TO SIMPLICIAL COMPLEXES CASEY KELLEHER AND ALESSANDRA PANTANO 0.1. Introduction. In this ctivity set we re going to introduce notion from Algebric Topology clled simplicil homology. The min

More information

12-B FRACTIONS AND DECIMALS

12-B FRACTIONS AND DECIMALS -B Frctions nd Decimls. () If ll four integers were negtive, their product would be positive, nd so could not equl one of them. If ll four integers were positive, their product would be much greter thn

More information

F. R. K. Chung y. University ofpennsylvania. Philadelphia, Pennsylvania R. L. Graham. AT&T Labs - Research. March 2,1997.

F. R. K. Chung y. University ofpennsylvania. Philadelphia, Pennsylvania R. L. Graham. AT&T Labs - Research. March 2,1997. Forced convex n-gons in the plne F. R. K. Chung y University ofpennsylvni Phildelphi, Pennsylvni 19104 R. L. Grhm AT&T Ls - Reserch Murry Hill, New Jersey 07974 Mrch 2,1997 Astrct In seminl pper from 1935,

More information

Ma/CS 6b Class 1: Graph Recap

Ma/CS 6b Class 1: Graph Recap M/CS 6 Clss 1: Grph Recp By Adm Sheffer Course Detils Instructor: Adm Sheffer. TA: Cosmin Pohot. 1pm Mondys, Wednesdys, nd Fridys. http://mth.cltech.edu/~2015-16/2term/m006/ Min ook: Introduction to Grph

More information

Midterm 2 Sample solution

Midterm 2 Sample solution Nme: Instructions Midterm 2 Smple solution CMSC 430 Introduction to Compilers Fll 2012 November 28, 2012 This exm contins 9 pges, including this one. Mke sure you hve ll the pges. Write your nme on the

More information

I/O Efficient Dynamic Data Structures for Longest Prefix Queries

I/O Efficient Dynamic Data Structures for Longest Prefix Queries I/O Efficient Dynmic Dt Structures for Longest Prefix Queries Moshe Hershcovitch 1 nd Him Kpln 2 1 Fculty of Electricl Engineering, moshik1@gmil.com 2 School of Computer Science, himk@cs.tu.c.il, Tel Aviv

More information

8.2 Areas in the Plane

8.2 Areas in the Plane 39 Chpter 8 Applictions of Definite Integrls 8. Ares in the Plne Wht ou will lern out... Are Between Curves Are Enclosed Intersecting Curves Boundries with Chnging Functions Integrting with Respect to

More information

From Dependencies to Evaluation Strategies

From Dependencies to Evaluation Strategies From Dependencies to Evlution Strtegies Possile strtegies: 1 let the user define the evlution order 2 utomtic strtegy sed on the dependencies: use locl dependencies to determine which ttriutes to compute

More information

documents 1. Introduction

documents 1. Introduction www.ijcsi.org 4 Efficient structurl similrity computtion etween XML documents Ali Aïtelhdj Computer Science Deprtment, Fculty of Electricl Engineering nd Computer Science Mouloud Mmmeri University of Tizi-Ouzou

More information

CSCE 531, Spring 2017, Midterm Exam Answer Key

CSCE 531, Spring 2017, Midterm Exam Answer Key CCE 531, pring 2017, Midterm Exm Answer Key 1. (15 points) Using the method descried in the ook or in clss, convert the following regulr expression into n equivlent (nondeterministic) finite utomton: (

More information

1.1. Interval Notation and Set Notation Essential Question When is it convenient to use set-builder notation to represent a set of numbers?

1.1. Interval Notation and Set Notation Essential Question When is it convenient to use set-builder notation to represent a set of numbers? 1.1 TEXAS ESSENTIAL KNOWLEDGE AND SKILLS Prepring for 2A.6.K, 2A.7.I Intervl Nottion nd Set Nottion Essentil Question When is it convenient to use set-uilder nottion to represent set of numers? A collection

More information

CS311H: Discrete Mathematics. Graph Theory IV. A Non-planar Graph. Regions of a Planar Graph. Euler s Formula. Instructor: Işıl Dillig

CS311H: Discrete Mathematics. Graph Theory IV. A Non-planar Graph. Regions of a Planar Graph. Euler s Formula. Instructor: Işıl Dillig CS311H: Discrete Mthemtics Grph Theory IV Instructor: Işıl Dillig Instructor: Işıl Dillig, CS311H: Discrete Mthemtics Grph Theory IV 1/25 A Non-plnr Grph Regions of Plnr Grph The plnr representtion of

More information

Today. Search Problems. Uninformed Search Methods. Depth-First Search Breadth-First Search Uniform-Cost Search

Today. Search Problems. Uninformed Search Methods. Depth-First Search Breadth-First Search Uniform-Cost Search Uninformed Serch [These slides were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to AI t UC Berkeley. All CS188 mterils re vilble t http://i.berkeley.edu.] Tody Serch Problems Uninformed Serch Methods

More information

2014 Haskell January Test Regular Expressions and Finite Automata

2014 Haskell January Test Regular Expressions and Finite Automata 0 Hskell Jnury Test Regulr Expressions nd Finite Automt This test comprises four prts nd the mximum mrk is 5. Prts I, II nd III re worth 3 of the 5 mrks vilble. The 0 Hskell Progrmming Prize will be wrded

More information

Engineer To Engineer Note

Engineer To Engineer Note Engineer To Engineer Note EE-186 Technicl Notes on using Anlog Devices' DSP components nd development tools Contct our technicl support by phone: (800) ANALOG-D or e-mil: dsp.support@nlog.com Or visit

More information

MA1008. Calculus and Linear Algebra for Engineers. Course Notes for Section B. Stephen Wills. Department of Mathematics. University College Cork

MA1008. Calculus and Linear Algebra for Engineers. Course Notes for Section B. Stephen Wills. Department of Mathematics. University College Cork MA1008 Clculus nd Liner Algebr for Engineers Course Notes for Section B Stephen Wills Deprtment of Mthemtics University College Cork s.wills@ucc.ie http://euclid.ucc.ie/pges/stff/wills/teching/m1008/ma1008.html

More information

PARALLEL AND DISTRIBUTED COMPUTING

PARALLEL AND DISTRIBUTED COMPUTING PARALLEL AND DISTRIBUTED COMPUTING 2009/2010 1 st Semester Teste Jnury 9, 2010 Durtion: 2h00 - No extr mteril llowed. This includes notes, scrtch pper, clcultor, etc. - Give your nswers in the ville spce

More information

1. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES)

1. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES) Numbers nd Opertions, Algebr, nd Functions 45. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES) In sequence of terms involving eponentil growth, which the testing service lso clls geometric

More information

Integration. September 28, 2017

Integration. September 28, 2017 Integrtion September 8, 7 Introduction We hve lerned in previous chpter on how to do the differentition. It is conventionl in mthemtics tht we re supposed to lern bout the integrtion s well. As you my

More information

The Math Learning Center PO Box 12929, Salem, Oregon Math Learning Center

The Math Learning Center PO Box 12929, Salem, Oregon Math Learning Center Resource Overview Quntile Mesure: Skill or Concept: 80Q Multiply two frctions or frction nd whole numer. (QT N ) Excerpted from: The Mth Lerning Center PO Box 99, Slem, Oregon 9709 099 www.mthlerningcenter.org

More information

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining EECS150 - Digitl Design Lecture 23 - High-level Design nd Optimiztion 3, Prllelism nd Pipelining Nov 12, 2002 John Wwrzynek Fll 2002 EECS150 - Lec23-HL3 Pge 1 Prllelism Prllelism is the ct of doing more

More information

Grade 7/8 Math Circles Geometric Arithmetic October 31, 2012

Grade 7/8 Math Circles Geometric Arithmetic October 31, 2012 Fculty of Mthemtics Wterloo, Ontrio N2L 3G1 Grde 7/8 Mth Circles Geometric Arithmetic Octoer 31, 2012 Centre for Eduction in Mthemtics nd Computing Ancient Greece hs given irth to some of the most importnt

More information

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Solving Prolems y Serching CS 486/686: Introduction to Artificil Intelligence 1 Introduction Serch ws one of the first topics studied in AI - Newell nd Simon (1961) Generl Prolem Solver Centrl component

More information

Agilent Mass Hunter Software

Agilent Mass Hunter Software Agilent Mss Hunter Softwre Quick Strt Guide Use this guide to get strted with the Mss Hunter softwre. Wht is Mss Hunter Softwre? Mss Hunter is n integrl prt of Agilent TOF softwre (version A.02.00). Mss

More information

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFORMATICS 1 COMPUTATION & LOGIC INSTRUCTIONS TO CANDIDATES

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFORMATICS 1 COMPUTATION & LOGIC INSTRUCTIONS TO CANDIDATES UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFORMATICS COMPUTATION & LOGIC Sturdy st April 7 : to : INSTRUCTIONS TO CANDIDATES This is tke-home exercise. It will not

More information

Lexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Lexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay Lexicl Anlysis Amith Snyl (www.cse.iit.c.in/ s) Deprtment of Computer Science nd Engineering, Indin Institute of Technology, Bomy Septemer 27 College of Engineering, Pune Lexicl Anlysis: 2/6 Recp The input

More information

Lecture 7: Integration Techniques

Lecture 7: Integration Techniques Lecture 7: Integrtion Techniques Antiderivtives nd Indefinite Integrls. In differentil clculus, we were interested in the derivtive of given rel-vlued function, whether it ws lgeric, eponentil or logrithmic.

More information

Misrepresentation of Preferences

Misrepresentation of Preferences Misrepresenttion of Preferences Gicomo Bonnno Deprtment of Economics, University of Cliforni, Dvis, USA gfbonnno@ucdvis.edu Socil choice functions Arrow s theorem sys tht it is not possible to extrct from

More information

File Manager Quick Reference Guide. June Prepared for the Mayo Clinic Enterprise Kahua Deployment

File Manager Quick Reference Guide. June Prepared for the Mayo Clinic Enterprise Kahua Deployment File Mnger Quick Reference Guide June 2018 Prepred for the Myo Clinic Enterprise Khu Deployment NVIGTION IN FILE MNGER To nvigte in File Mnger, users will mke use of the left pne to nvigte nd further pnes

More information

Digital Design. Chapter 6: Optimizations and Tradeoffs

Digital Design. Chapter 6: Optimizations and Tradeoffs Digitl Design Chpter 6: Optimiztions nd Trdeoffs Slides to ccompny the tetbook Digitl Design, with RTL Design, VHDL, nd Verilog, 2nd Edition, by Frnk Vhid, John Wiley nd Sons Publishers, 2. http://www.ddvhid.com

More information

An Efficient Divide and Conquer Algorithm for Exact Hazard Free Logic Minimization

An Efficient Divide and Conquer Algorithm for Exact Hazard Free Logic Minimization An Efficient Divide nd Conquer Algorithm for Exct Hzrd Free Logic Minimiztion J.W.J.M. Rutten, M.R.C.M. Berkelr, C.A.J. vn Eijk, M.A.J. Kolsteren Eindhoven University of Technology Informtion nd Communiction

More information

Epson Projector Content Manager Operation Guide

Epson Projector Content Manager Operation Guide Epson Projector Content Mnger Opertion Guide Contents 2 Introduction to the Epson Projector Content Mnger Softwre 3 Epson Projector Content Mnger Fetures... 4 Setting Up the Softwre for the First Time

More information

Inference of node replacement graph grammars

Inference of node replacement graph grammars Glley Proof 22/6/27; :6 File: id293.tex; BOKCTP/Hin p. Intelligent Dt Anlysis (27) 24 IOS Press Inference of node replcement grph grmmrs Jcek P. Kukluk, Lwrence B. Holder nd Dine J. Cook Deprtment of Computer

More information

Approximation of Two-Dimensional Rectangle Packing

Approximation of Two-Dimensional Rectangle Packing pproximtion of Two-imensionl Rectngle Pcking Pinhong hen, Yn hen, Mudit Goel, Freddy Mng S70 Project Report, Spring 1999. My 18, 1999 1 Introduction 1-d in pcking nd -d in pcking re clssic NP-complete

More information

arxiv: v1 [math.co] 18 Sep 2015

arxiv: v1 [math.co] 18 Sep 2015 Improvements on the density o miml -plnr grphs rxiv:509.05548v [mth.co] 8 Sep 05 János Brát MTA-ELTE Geometric nd Algeric Comintorics Reserch Group rt@cs.elte.hu nd Géz Tóth Alréd Rényi Institute o Mthemtics,

More information

Mobile IP route optimization method for a carrier-scale IP network

Mobile IP route optimization method for a carrier-scale IP network Moile IP route optimiztion method for crrier-scle IP network Tkeshi Ihr, Hiroyuki Ohnishi, nd Ysushi Tkgi NTT Network Service Systems Lortories 3-9-11 Midori-cho, Musshino-shi, Tokyo 180-8585, Jpn Phone:

More information

Allocator Basics. Dynamic Memory Allocation in the Heap (malloc and free) Allocator Goals: malloc/free. Internal Fragmentation

Allocator Basics. Dynamic Memory Allocation in the Heap (malloc and free) Allocator Goals: malloc/free. Internal Fragmentation Alloctor Bsics Dynmic Memory Alloction in the Hep (mlloc nd free) Pges too corse-grined for llocting individul objects. Insted: flexible-sized, word-ligned blocks. Allocted block (4 words) Free block (3

More information

PIA INQUIRY QUESTIONS LEASED DARK FIBER AND SPECIAL CONSTRUCTION

PIA INQUIRY QUESTIONS LEASED DARK FIBER AND SPECIAL CONSTRUCTION PIA INQUIRY QUESTIONS LEASED DARK FIBER AND SPECIAL CONSTRUCTION IMPORTANT: The rules for evluting the cost effectiveness of drk fier nd self provisioning options re strict, ever evolving, nd re explined

More information

Topological Queries on Graph-structured XML Data: Models and Implementations

Topological Queries on Graph-structured XML Data: Models and Implementations Topologicl Queries on Grph-structured XML Dt: Models nd Implementtions Hongzhi Wng, Jinzhong Li, nd Jizhou Luo Astrct In mny pplictions, dt is in grph structure, which cn e nturlly represented s grph-structured

More information

Tree Structured Symmetrical Systems of Linear Equations and their Graphical Solution

Tree Structured Symmetrical Systems of Linear Equations and their Graphical Solution Proceedings of the World Congress on Engineering nd Computer Science 4 Vol I WCECS 4, -4 October, 4, Sn Frncisco, USA Tree Structured Symmetricl Systems of Liner Equtions nd their Grphicl Solution Jime

More information

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 3b Lexical Analysis Elias Athanasopoulos

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 3b Lexical Analysis Elias Athanasopoulos ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy RecogniNon of Tokens if expressions nd relnonl opertors if è if then è then else è else relop è

More information

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona Implementing utomt Sc 5 ompilers nd Systems Softwre : Lexicl nlysis II Deprtment of omputer Science University of rizon collerg@gmil.com opyright c 009 hristin ollerg NFs nd DFs cn e hrd-coded using this

More information