Algorithms for Computing the Triplet and Quartet Distances for Binary and General Trees

Size: px
Start display at page:

Download "Algorithms for Computing the Triplet and Quartet Distances for Binary and General Trees"

Transcription

1 Bology 2013, 2, ; do: /bology Revew OPEN ACCESS bology ISSN Algorthms for Computng the Trplet and Quartet Dstances for Bnary and General Trees Andreas Sand 1,2, Morten K. Holt 1, Jens Johansen 1, Rolf Fagerberg 3, Gerth Støltng Brodal 1,4, Chrstan N. S. Pedersen 1,2 and Thomas Malund 2, * 1 Department of Computer Scence, Aarhus Unversty, IT-Paren, Aabogade 34, DK-8200 Aarhus N, Denmar; E-Mals: asand@brc.au.d (A.S.); the.thaw@gmal.com (M.K.H.); ens.oha@gmal.com (J.J.); gerth@cs.au.d (G.S.B.); cstorm@brc.au.d (C.N.S.P.) 2 Bonformatcs Research Centre, Aarhus Unversty, C.F. Møllers Allé 8, DK-8000 Aarhus C, Denmar 3 Department of Mathematcs and Computer Scence, Unversty of Southern Denmar, Campusve 55, DK-5230 Odense M, Denmar; E-Mal: rolf@mada.sdu.d 4 MADALGO - Center for Massve Data Algorthms, a Centre of the Dansh Natonal Research Foundaton, Aabogade 34, DK-8200 Aarhus N, Denmar * Author to whom correspondence should be addressed; E-Mal: malund@brc.au.d; Tel.: ; Fax: Receved: 15 July 2013; n revsed form: 29 August 2013 / Accepted: 13 September 2013 / Publshed: 26 September 2013 Abstract: Dstance measures between trees are useful for comparng trees n a systematc manner, and several dfferent dstance measures have been proposed. The trplet and quartet dstances, for rooted and unrooted trees, respectvely, are defned as the number of subsets of three or four leaves, respectvely, where the topologes of the nduced subtrees dffer. These dstances can trvally be computed by explctly enumeratng all sets of three or four leaves and testng f the topologes are dfferent, but ths leads to tme complextes at least of the order n 3 or n 4 ust for enumeratng the sets. The dfferent topologes can be counted mplctly, however, and n ths paper, we revew a seres of algorthmc mprovements that have been used durng the last decade to develop more effcent algorthms by explotng two dfferent strateges for ths; one based on dynamc programmng and another based on colorng leaves n one tree and updatng a herarchcal decomposton of the other. Keywords: algorthmc development; tree comparson; trplet dstance; quartet dstance

2 Bology 2013, Introducton Evolutonary relatonshps are often represented as trees; a practce not only used n bology, where trees can represent, for example, speces relatonshps or gene relatonshps n a gene famly, but also used n many other felds studyng obects related n some evolutonary fashon. Examples nclude lngustcs, where trees represent the evoluton of related languages, or archeology, where trees have been used to represent how copes of ancent manuscrpts have changed over tme. Common for most such felds s that the true tree relatonshp between obects s never observed, but must be nferred, and dependng on both the data avalable and the methods used for the nference, the nferred trees wll lely be slghtly dfferent. Tree dstances provde a formal way of quantfyng how smlar two trees are and, for example, determne f two trees are sgnfcantly smlar or no more smlar than could be expected by chance. Many dfferent dstances have been defned on trees. Most only consder the tree topology,.e., how nodes are related, but not how branch lengths mght dffer between the trees, and most of these only consder relatonshps between labeled leaves and consder nner nodes as unlabeled. Commonly used dstance measures are the Robnson-Foulds dstance [1] and the quartet dstance [2] for unrooted trees and the trplet dstance [3] for rooted trees. All three are based on the dea of enumeratng all substructures of the two trees topology and countng how often the structure s the same n the two trees and how often t s dfferent. The Robnson-Foulds dstance consders all the ways the leaf labels can be splt nto two sets and counts how often only one of the trees has an edge matchng ths splt. Informally, ths essentally means that t counts how often the two trees have the same edge and how often an edge n one tree has no counterpart n the other. Edges are arguably the smplest element of the topology of a tree, and not surprsngly, the Robnson-Foulds dstance s both the most frequently used dstance measure and the dstance measure that can be computed wth the optmal algorthmc complexty, O(n), for trees wth n leaves [4]. The trplet and quartet dstances enumerate all subsets of three or four leaves, respectvely, and count how often the topologes nduced by the three or four leaves are the same n the two trees. The trplet dstance s ntended for rooted trees, where the trplet topology s the smallest nformatve subtree (for unrooted trees, all subtrees wth three leaves have the same topology), whle the quartet dstance s ntended for unrooted trees, where the quartet topology s the smallest nformatve subtree. Whether t s possble to compute the trplet and quartet dstance n lnear tme s unnown. The fastest nown algorthms have tme complexty O(n log n) for both dstances for bnary trees, O(n log n) for the trplet dstance and O(dn log n) for the quartet dstance for general trees, where d s the largest degree of a node n the trees [5]. In ths paper, we wll revew the algorthmc development that led to these non-trval runnng tmes, n partcular, the development of algorthms for general (non-bnary) trees to whch the authors have contrbuted a number of papers. We wll frst formally defne the trplet and quartet dstance between two leaf-labeled trees. We then descrbe the state-of-the-art for bnary trees wth two dfferent approaches to computng the dstances: one based on dynamc programmng wth tme complexty O(n 2 ) for both quartet and trplet dstance and one based on colorng leaves n a tree traversal wth complexty

3 Bology 2013, O(n log n) for quartet dstances (wth very lttle wor, ths algorthm can be modfed to compute the trplet dstance, but t has never been descrbed as such n the lterature). The dynamc programmng approach was the frst algorthm that was modfed to deal wth the quartet dstance for general trees, and n the followng secton, we descrbe a number of approaches to dong ths. In the same secton, we descrbe how the tree colorng approach can be adapted to general trees, leadng to the currently best worst-case runnng tmes mentoned above. We then turn to practcal mplementatons, present expermental results for the best algorthms and show that the theoretcally fastest algorthms can also be mplemented to run effcently n practce. 2. The Trplet and Quartet Dstances Gven two trees, T 1 and T 2, each wth n leaves wth labels from the same set of n labels, the trplet dstance s defned for rooted trees, whle the quartet dstance s defned for unrooted trees. A trplet s a set, {,, }, of three leaf labels. Ths s the smallest number of leaves for whch the subtree nduced by these leaves can have dfferent topologes n two rooted trees. The possble topologcal relatonshps between leaves, and are shown n Fgure 1a. The last case s not possble for bnary trees, but t s for trees of arbtrary degrees. The trplet dstance s defned as the number of trplets whose topology dffers n the two trees. It can navely be computed by enumeratng all ( n 3) sets of three leafs and for each set comparng the nduced topologes n the two trees. A quartet s a set, {,,, l}, of four leaf labels. Ths s the smallest number of leaves for whch the subtree nduced by these leaves can have dfferent topologes n two unrooted trees. The possble topologes are shown n Fgure 1b. The last case s not possble for bnary trees (trees wth nner nodes of a degree of three), but t s for trees of arbtrary degrees. For the remanng three cases, the set, {,,, l}, s splt nto two sets of two leaves by the mddle edge. We sometmes use the notaton, l, for splts, wth ths partcular nstance referrng to the leftmost topology n Fgure 1b. Smlar to the trplet dstance, the quartet dstance s defned as the number of quartets whose topology dffer n the two trees. It can navely be computed by enumeratng all ( n 4) sets of four leafs and for each set comparng the nduced topologes n the two trees. Fgure 1. Dfferent cases for trplet and quartet topologes. (a) Trplet topologes. ` ` ` (b) Quartet topologes. ` The rghtmost cases for trplets and quartets n Fgure 1a,b are called unresolved, whle the remanng are called resolved. Both for trplets and quartets, the possble topologes n two trees, T 1 and T 2, of arbtrary degree can be parttoned nto the fve cases, A E, lsted n Fgure 2. Note that n the resolved-resolved case, the trplet/quartet topologes may ether agree (A) or dffer (B) n the two trees. In the resolved-unresolved and unresolved-resolved cases (C and D), they always dffer, and n the unresolved-unresolved case (E), they always agree. For bnary trees, only the two resolved-resolved cases are possble.

4 Bology 2013, Fgure 2. Cases for computng dfferences. T 1 T 2 Resolved Unresolved Resolved A: Agree B: D er C Unresolved D E Snce both dstances are defned as the number of dfferng topologes, they can be computed as B + C + D. However, none of our algorthms compute B, C and D explctly. Dfferent algorthms use dfferent relatonshps between the fve counters, A E, to compute the dstances faster. The quartet and trplet dstances are nown to be more robust to small changes n the trees than other dstance measures, ncludng the Robnson-Foulds dstance [6]. They also have a much larger range than many other dstance measures for trees and are, therefore, regarded as more fne-graned. The maxmal value of the normalzed quartet dstance, (B + C + D)/ ( n 4), obtaned by dvdng the dstance by the total number of quartets was shown by Bandelt and Dress to be monotoncally ncreasng wth n, and they conectured that ths rato s bounded by 2/3 [7]. Steel and Penny furthermore showed that the normalzed mean value of the dstances, E(B + C + D)/ ( n 4), tends to 2/3 as n and that the varance of (B + C + D)/ ( n 4) tends to zero as n, when all (bnary or general) trees are sampled wth equal probablty [6]. A further desrable feature of the quartet and trplet dstances s that quartets and trplets play a fundamental role n many approaches to phylogenetc tree reconstructon (see, e.g., the R* method n Dendroscope [8] or the Quartet MaxCut method [9]). The parameterzed trplet and quartet dstance s defned by Bansal et al. [10] as B + p (C + D), for some 0 p 1, and maes t possble to wegh the contrbuton of unresolved trplets/quartets to the total trplet/quartet dstance. For p = 0, unresolved trplets/quartets do not contrbute to the dstance,.e., unresolved trplets/quartets are gnored, whle for p = 1, they contrbute fully to the dstance, as s the case n the unparameterzed trplet/quartet dstance. Bansal et al. recently showed that the parameterzed trplet and quartet dstances are: ) metrcs f p [1/2, 1]; ) dstance measures, but not metrcs (the trangle-nequalty s volated) f p (0, 1/2); and ) not dstance measures f p = 0 (two non-equvalent trees can have a dstance of zero) [10]. Whether unresolved topologes should contrbute to the dstance depends on whether one consders an unresolved node a statement of nowledge of a multfurcaton or smply a lac of nowledge of the true topology. See, e.g., Pompe et al. [11] and Waler et al. [12] for a dscusson of dealng wth unresolved topologes as a lac of nowledge n lngustcs. 3. Computng the Trplet and Quartet Dstances between Bnary Trees In ths secton, we consder the bnary case,.e., where all topologes are resolved and C, D and E from Fgure 2 are zero. In ths case, the dstances are the B counter, whch can be obtaned by computng ether A or B, snce B = ( n 3) A for the trplet dstance and B = ( n 4) A for the quartet dstance. We wll descrbe two algorthms for computng the quartet dstance for bnary trees, one based on dynamc programmng and the other on colorng leaves and comparng topologes nduced by the colorng. These are also the two approaches that we have used to handle general trees, and we wll

5 Bology 2013, descrbe the extensons for general trees n the next secton. Varatons of the two approaches to compute the trplet dstance have also been developed, but n our man exposton, we focus on the quartet dstance A Dynamc Programmng Approach The frst algorthm, from Bryant et al. [13], s based on two deas: orentng edges and assocatng drected edges to quartets and, then, buldng tables of the number of leaves shared between two subtrees. We frst conceptually tae all edges n the two trees and replace them wth two orented edges. Ths way, we can unquely assgn three subtrees to each orented edge: F 1 behnd the edge and F 2 and F 3 n front of the edge (see Fgure 3a). We wll wrte ths as F e 1 (F 2, F 3 ) and say that the drected edge, F e 1 (F 2, F 3 ), clams all quartets, l, wth, F 1, F 2 and l F 3 (or l F 2 and F 3 ). If a tree contans the quartet topology, l, t wll be clamed by exactly two such edges: F e 1 (F 2, F 3 ) e wth, F 1, F 2 and l F 3 (or l F 2 and F 3 ); and G 1 (G 2, G 3 ) wth, l G 1, G 2 and G 3 (or G 2 and G 3 ) as n Fgure 3b. Fgure 3. Any quartet s clamed by exactly two edges. F 2 G 2 F 1 e e G 1 l F 3 l G 3 (a) F 1 e! (F 2,F 3 ) (b) G 1 e 0! (G 2,G 3 ) We wll consder these two orented quartets, l and l, where e clams l and e clams l. Any shared quartet topology, l, corresponds to exactly two orented quartet topologes, so we can compute A by countng the number of shared orented quartet topologes and dvde by two. e Gven two trees, T 1 and T 2, the algorthm now smply terates through all edges, F 1 1 (F2, F 3 ) n T 1 e and G 2 1 (G2, G 3 ) n T 2, and counts how many orented quartets, l, are clamed by both e 1 and e 2. Ths number, denoted A(e 1, e 2 ), can be computed as: ( ) F1 G 1 A(e 1, e 2 ) = ( F 2 G 2 F 3 G 3 + F 2 G 3 F 3 G 2 ) 2 where F G denotes the number of leaves n two subtrees, F and G, wth the same labels. Ths expresson can be computed n constant tme f we have tables contanng the count, F G, for all subtrees, F of T 1 and G of T 2. The number of shared quartets (and by extenson, the quartet dstance) can then be computed as: A = 1 A(e 1, e 2 ) 2 e 2 T 2 e 1 T 1 n tme O(n 2 ). Tables for F G can be computed recursvely n tme O(n 2 ): f F contans subtrees, F 1 and F 2, and G contans subtrees, G 1 and G 2, then F G = F 1 G 1 + F 1 G 2 + F 2 G 1 + F 2 G 2. If F or G (or both) are leaves, smlar, but smpler, cases are used.

6 Bology 2013, By modfyng ths algorthm slghtly, t can also be used to compute the trplet dstance between rooted bnary trees n O(n 2 ) tme Tree Colorng Brodal et al. s O(n log 2 n) and O(n log n) algorthms [14,15] tae a dfferent approach than dynamc programmng, but they are also based on orented quartets. Rather than assocatng orented quartets to edges, however, the algorthms assocate orented quartets to nner nodes. Gven an nner node, v, wth three ncdent subtrees, F, G and H, we say that v clams the orented quartets, l, where and are n two dfferent subtrees of v and and l both are n the remanng subtree. Node v thus clams ( ) ( F 2 G H + F G ) ( 2 H + F G H ) 2 orented quartets, and smlar to before, each quartet s assocated wth exactly two nner nodes. Furthermore, smlar to before, the dea s to compute the double sum A = 1 2 v 1 T 1 v 2 T 2 A(v 1, v 2 ), where A(v 1, v 2 ) now denotes the number of orented quartets (n the remander of ths secton, we wll use quartet and orented quartet nterchangeably) assocated wth nodes, v 1 and v 2, whch nduce the same quartet topology n both T 1 and T 2. The two algorthms, however, only explctly terate over the nodes n T 1, whle for each v 1 T 1, they compute v 2 T 2 A(v 1, v 2 ) mplctly usng a data structure called the herarchcal decomposton tree of T 2. To realze ths strategy, both algorthms use a colorng procedure n whch the leaves of the two trees are colored usng the three colors, A, B and C. For an nternal node, v 1 T 1, we say that T 1 s colored accordng to v 1 f the leaves n one of the subtrees all have the color, A, the leaves n another of the subtrees all have the color, B, and the leaves n the remanng subtree all have the color, C. Addtonally, we say that a quartet, l, s compatble wth a colorng f and have dfferent colors and and l both have the remanng color. From ths setup, we mmedately get that f T 1 s colored accordng to a node, v 1 T 1, then the set of quartets n T 1 that are compatble wth ths colorng s exactly the set of quartets assocated wth v 1 and, furthermore, f we color the leaves of T 2 n the same way as n T 1, that the set of quartets, S, n T 2 that are compatble wth ths colorng s exactly the set of quartets that are assocated wth v 1 and nduce the same topology n both T 1 and T 2 ; thus, v 2 T 2 A(v 1, v 2 ) = S. Navely colorng the leaves of the two trees accordng to each nner node n T 1 and countng the compatble quartets n T 2 explctly would tae tme O(n 2 ), as we would run through n 2 dfferent colorngs and perform O(n) wor for each. By handlng the colorng of T 1 recursvely and usng the smaller-half trc, we can ensure that each leaf changes color only O(log n) tmes, gvng O(n log n) color changes n total. To compute the number of compatble quartets n T 2 for each colorng, we use a herarchcal decomposton of T 2. The man feature of ths data structure s that we can decorate t n a way such that t can return the number of quartets n T 2 compatble wth the current colorng n constant tme. To acheve ths, the herarchcal decomposton uses O(log n) tme to update ts decoraton when a sngle leaf changes color. Thus, n total, we use O(n log 2 n) tme [14]. The algorthm for computng the trplet dstance between bnary trees n tme O(n log 2 n) from [16] also uses ths strategy, although the decoraton of the herarchcal decomposton dffers sgnfcantly. In the followng two subsectons, we descrbe the colorng procedure and the herarchcal decomposton tree data structure used n the quartet dstance algorthm. Fnally, n Secton , we descrbe how the algorthm was tweaed n [15] to obtan tme complexty O(n log n).

7 Bology 2013, Colorng Leaves n T 1 Usng the Smaller-Half Trc The colorng procedure starts by rootng T 1 n an arbtrary leaf. It then for each nner node, v 1 T 1, calculates the number of leaves, v 1, n the subtree rooted at v 1. Ths s done n a postorder traversal startng n the new (desgnated) root of T 1, and the nformaton s stored n the nodes. In ths traversal, all leaves are also colored by the color, A, and the new root s colored by the color, C. The procedure then runs through all colorngs of T 1 recursvely, as descrbed and llustrated n Fgure 4, startng at the sngle chld of the root of T 1. In Fgure 4, large(v 1 ), respectvely small(v 1 ), denotes the chld of v 1 that consttutes the root of the largest, respectvely smallest, of the two subtrees under v 1, measured on the number of leaves n the subtrees, and get shared() s a call to the herarchcal decomposton of T 2 to retreve the number of quartets n T 2 that are compatble wth the current colorng. Durng the traversal, the followng two nvarants are mantaned for all nodes, v 1 T 1 : (1) when Count(v 1 ) s called, all leaves n the subtree rooted at v 1 have the color, A, and all other leaves have the color, C; and (2) when Count(v 1 ) returns, all leaves have the color, C. As llustrated n Fgure 4, these nvarants mply that when get shared() s called as a subprocedure of Count(v 1 ), then T 1 s colored accordng to v 1. Hence, the correctness of the algorthm follows, assumng that get shared() returns the number of orented quartets n T 2 compatble wth the current colorng of the leaves. Fgure 4. Traversng T 1 usng the smaller-half trc to ensure that each leaf changes color at most O(log n) tmes. v1 v1 1 Procedure Count(v 1 ): 2 f v 1 s a leaf : 3 color v 1 C 4 return 0 5 else : 6 ColorLeaves(small(v 1 ), B ) 7 shared = get shared () 8 ColorLeaves(small(v 1 ), C ) 9 shared += Count(large(v 1 )) 10 ColorLeaves(small(v 1 ), A) 11 shared += Count(small(v 1 )) 12 return shared (1) At lne 1 v1 (3) After lne 8 (2) After lne 6 v1 (4) After lne 9 v1 v1 (5) After lne 10 (6) After lne 11 Note that the color of a specfc leaf s only changed whenever t s n the smallest subtree of one of ts ancestors. Snce the sze of the smallest of the two subtrees of a node, v 1, s at most half the sze of the entre subtree rooted at v 1, ths can only happen at most log n tmes for each leaf, leadng to O(n log n) color changes n total.

8 Bology 2013, The Herarchcal Decomposton Tree The herarchcal decomposton tree of T 2, HDT (T 2 ), s a representaton of T 2 as a bnary tree data structure wth logarthmc heght. The nodes n HDT (T 2 ) are called components, and each of them s of one of the sx types llustrated n Fgure 5. The nodes n T 2 (ncludng leaves) consttute the leaves of HDT (T 2 ): each leaf n T 2 s contaned n a component of type (), and each nner node of T 2 s contaned n a component of type (). An nner node/component of HDT (T 2 ) represents a connected subset of nodes n T 2, and t s formed as the unon of two adacent components usng one of the compostons n Fgure 5() (v), where two components are adacent f there s an edge n T 2 connectng the two subsets of nodes they represent. The root of HDT (T 2 ) s formed usng the composton n Fgure 5(v), and t represents the entre set of all nodes (ncludng leaves) n T 2. Fgure 5. The sx dfferent types of components n a herarchcal decomposton tree. Type ()and () consst of, respectvely, a sngle leaf or an nner node n the orgnal bnary tree. Type () (v) are formed as unons of two adacent components. Type (v) occurs only at the root of the herarchcal decomposton tree. () () () (v) (v) (v) To buld HDT (T 2 ), we frst mae a component of type () or () for each node n T 2 and, then, greedly combne and replace pars of components usng the compostons n Fgure 5() (v) n subsequent teratons, untl we only have the sngle root component left. Ths greedy strategy requres O(log n) teratons usng O(n) tme n total, and the result s an HDT of heght O(log n) (see [14,15] for detals). To be able to retreve the number of quartets n T 2 from HDT (T 2 ) n constant tme, we decorate t n the followng way: For each component, C, n HDT (T 2 ), we store a tuple (a, b, c) of ntegers and a functon, F. The ntegers, a, b and c, denote the number of leaves wth colors A, B and C, respectvely, n the connected component of T 2 represented by C. The functon F taes three parameters for each external edge of C (all components have between zero and three external edges, dependng on ther type, e.g., type () components have one external edge, whle type () components have two). If C has external edges, we number these arbtrarly from one to and denote the three parameters for edge by a, b and c. Of the two ends of an external edge, one s n the component and the other s not. The parameters, a, b and c, denote the number of leaves wth colors A, B and C, respectvely, whch are n the subtree of T 2 rooted at the endpont of edge that s not n C. Fnally, F states, as a functon of these parameters, the number of elements of the quartets, whch are both assocated wth nodes contaned n C and compatble wth the current colorng of the leaves. It turns out that F s a polynomal of a total degree of at most four. It s beyond the scope of ths paper to descrbe the detals of how ths decoraton of HDT (T 2 ) s ntalzed and updated, but the crucal pont s that for a component, C, whch s the composton of two components, C and C, the decoraton of C can be computed n constant tme, provded that the

9 Bology 2013, decoratons of C and C are nown. Ths means that the decoraton can be ntalzed n O(n) tme n a postorder traversal and that t can be updated n O(log n) tme when a sngle leaf changes color by updatng the path from the type () component contanng the leaf to the root of HDT (T 2 ) Tweang the Algorthm to Obtan Tme Complexty O(n log n) To further decrease the tme complexty of our algorthm, we need a crucal lemma, whch was hnted at n exercse 35 of [17] and stated as Lemma 7 n [15]. Ths lemma, whch was named the extended smaller-half trc states that f T s a bnary tree wth n leaves and we have a cost of v c v = small(v) log for each nner node v and c small(v) v = 0 for each leaf v, then we have a total cost of v T c v n log n. Ths means that f we can decrease the tme used per nner node v 1 n T 1 from v small(v 1 ) log n, as n the prevous two subsectons, to small(v 1 ) log 1 small( v 1, the total tme used wll ) be reduced from O(n log 2 n) to O(n log n). The frst step n dong ths s to use Lemma 3 from [15], whch tells us that f we are gven an unrooted tree, T, wth n nodes of a degree of at most three, where leaves have been mared as non-contractble, then we can contract T n O(n) tme nto a contracton wth at most 4 5 nodes, such that each non-contractble leaf s a node by tself. Ths means that f we, pror to vstng a node, v 1 T 1, mar all leaves n the subtree of T 1 rooted at v 1 as non-contractble and contract T 2 accordng to ths marng, then we get a contracton, T (v 1) 2, wth at most 4 v 1 5 nodes, and thus, we can buld a herarchcal decomposton tree, HDT (T (v 1) 2 ), from ths contracton wth heght O(log v 1 ) n tme O( v 1 ). Hence, f we use HDT (T (v 1) 2 ) nstead of HDT (T 2 ) when vstng v 1 T 1, the tme needed for vstng v 1, dsregardng the tme spent on buldng T (v) 2, s now O( small(v 1 ) log v 1 ). However, ths s stll too much to use the extended smaller-half trc to obtan our goal. v To get all the way down to O( small(v 1 ) log 1 small(v 1 ), we frst observe that a herarchcal ) decomposton tree s a locally-balanced bnary rooted tree. A bnary rooted tree s c-locally-balanced f for all nodes v n the tree, the heght of the subtree rooted at v s at most c(1 + log v ). To be exact, our herarchcal decomposton trees are 1/ log( 12 )-locally-balanced (see [15] for the proof of 11 ths). A locally-balanced tree wth n nodes has the property that the unon of dfferent leaf-to-root paths contans ust O( log n) nodes (see Lemma 4 n [15]). Thus, snce each v 1 T 1 HDT (T (v 1) 2 ) has O( v 1 ) nodes, the small(v 1 ) leaf-to-root paths n HDT (T (v) 2 ) that need to be updated when vstng v v 1 contan O( small(v 1 ) log 1 small(v 1 ) nodes n total. Ths means that f we spend only constant tme ) to update each of these, we get a total tme for colorng and countng, stll dsregardng the tme used to buld contractons, of O(n log n) by the extended smaller-half trc. To spend only constant tme for v each of the O( small(v 1 ) log 1 ) nodes, we splt the update procedure for HDT (T (v 1) small(v 1 ) 2 ) after the small(v 1 ) color changes n two n the followng way: (1) We frst mar all nternal nodes n HDT (T (v 1) 2 ) on paths from the small(v 1 ) type () components to the root by marng bottom-up from each of the type () components, untl we fnd the frst already mared component. (2) We then update all the mared nodes recursvely n a postorder traversal startng at the root of HDT (T (v 1) 2 ). We now use O(n log n) tme for the colorng and countng, but f we bult T (v 1) 2 from scratch for every nner node, v 1 n T 1, we would spend O(n 2 ) tme ust dong ths. To fx ths problem, we wll only contract T 2 whenever a constant fracton of the leaves have been colored C, and we wll not do t from scratch every tme. Fgure 6 shows how ths s acheved usng the subroutnes, Contract and

10 Bology 2013, Extract. In ths pseudo-code, T 2 and T 2 are used to refer to the tree, T 2, as well as ther assocated herarchcal decomposton trees. Fgure 6. Tree colorng algorthm. 1 Procedure FastCount(v, T 2 ): 2 local var T2 0 3 f v s a leaf : 4 color v C 5 return 0 6 else : 7 ColorLeaves( small (v), B, T 2 ) 8 shared = get shared () 9 T2 0 = Contract(B, Extract(small(v), T 2 )) 10 ColorLeaves( small (v), C, T 2 ) 11 f T 2 > 5 large(v) : 12 T 2 = Contract(A, T 2 ) 13 shared += FastCount( large (v), T 2 ) 14 ColorLeaves( small (v), A, T2 0 ) 15 shared += FastCount(small(v), 16 return shared T2 0 ) The routne, Contract(X, T 2 ), constructs a decomposton of T 2 of sze O( X ), where each leaf wth the color, X, has been regarded as non-contractble and where X s the number of leaves wth the color, X. Ths s done n O( T 2 ) tme, where T 2 s the number of nodes n the current verson of T 2. Note that T 2 s not statc, but s a parameter to FastCount and s changed n specfc recursve calls. See [15] for the detals of Contract. The routne, Extract(small(v 1 ), T 2 ), uses the herarchcal decomposton of T 2 to extract a copy of T 2 at the pont n the algorthm where T 1 s colored accordng to v 1. The extracted tree s a copy of T 2 where all leaves n the subtree rooted at small(v 1 ) stll have the color, B, but all other leaves have the color C. The frst call to Contract n lne 9 bulds a contracton, T 2, of ths copy, and ths contracton s used as T 2 n the recursve call on small(v 1 ) n lne 15. The second call to Contract n lne 12 s only executed when T 2 > 5 large(v 1 ) (.e., when more than 4/5 of the leaves have the color, C). Ths contracton s used n the recursve call on large(v 1 ) n lne 13. T Lne 9 taes O( small(v 1 ) log 2 ) (see [15]); thus, snce T small(v 1 ) 2 = v 1 when we perform t, the total tme used on ths lne s O(n log n) by the extended smaller-half trc. We now consder the tme spent on contractng T 2 n lne 12. We perform ths lne whenever T 2 > 5 large(v 1 ). Snce all leaves n large(v 1 ) are colored A when we contract, the sze of the new T 2 has at most 4 large(v 1 ) 5 nodes. Hence, the sze of T 2 s reduced by a factor of 4/5. Ths mples that the sequence of contractons appled to a herarchcal decomposton results n a sequence of data structures of geometrcally decayng szes. Snce a contracton taes tme O( T 2 ), the total tme spent on lne 12 s lnear n the ntal sze of T 2,.e., t s domnated by the tme for constructng the ntal herarchcal decomposton tree, whch s O(n). In total, we therefore use O(n log n) tme for contractng and extractng T Dealng wth General Trees The man focus on algorthms for tree comparson has been on bnary trees. Ths s understandable, snce most algorthms for constructng trees wll always create fully-resolved trees, even when some

11 Bology 2013, edges have very lttle support from the data (but, see Buneman trees or refned Buneman trees for exceptons to ths [18,19]). Trees that are not fully resolved do occur n studes, however, and ths creates a problem when the algorthms for computng tree dstances do not generalze to general trees. In our research, we have developed a number of approaches for adaptng the algorthms from the prevous secton to wor on general trees, and we revew these approaches n ths secton Dynamc Programmng Approaches Problems wth generalzng the dynamc programmng algorthm for bnary trees to general trees are two-fold. Frst, usng edges to clam quartets s only meanngful for resolved quartets, and secondly, the resolved quartets clamed by an edge can be dstrbuted over a large number of trees. The frst problem can possbly be dealt wth by usng nodes to clam unresolved quartets, but the second problem s more serous. We can stll compute the tables of the ntersectons of trees, F G. If nodes can have a degree up to d, then n the recurson for buldng tables F G, we need to combne counts for O(d 2 ) pars of trees, but the total sum of degrees for each tree s bounded by O(n). Therefore, the product of the pars of degrees s bounded by O(n 2 ). Worse, however, s computng A(e 1, e 2 ) where we need to consder all ways of pcng trees F 2 and F 3 and parng them wth choces G 2 and G 3, wth a worst-case performance of O(d 4 ) for each par of edges. In a seres of papers, we have developed dfferent algorthms for computng the quartet dstance between general trees effcently by avodng explctly havng to deal wth choosng pars of trees for nner nodes. Common for these s that we also avod explctly handlng unresolved quartets, but only consder the resolved topologes and handle the unresolved quartets mplctly. Chrstansen et al. [20] developed two algorthms for computng the quartet dstance between general trees. The dea n the frst algorthm, whch runs n tme O(n 3 ), s to consder all trplets, {,, }, and count for each fourth leaf-label, x, how many of the quartets, {,,, x}, have the same topology n the two trees; whether ths be resolved topologes, x, x or x countng A, or unresolved, (x) countng E. Countng ths way, each shared quartet topology wll be counted twelve tmes; so, the algorthm actually computes 12 (A + E), and we must dvde by twelve at the end. We explctly terate through all O(n 3 ) trplets, and the crux of the algorthm s countng the shared quartet topologes n constant tme. The approach s as follows: For each trplet, {,, }, we dentfy the center, C, of the nduced topology n both trees. Let F, F and F be the trees n T 1 connected to C contanng leaves, and, respectvely, and let G, G and G be the correspondng trees n T 2. A resolved quartet, x, s shared between the two trees f x s n F and n G (see Fgure 7), so the number of such quartets s F G 1 (mnus one because we otherwse would also count ). Therefore, gven, and, the number of shared resolved quartets s: F G + F G + F G 3 whch can be computed n constant tme from the tables we buld n preprocessng usng dynamc programmng, assumng we now the centers for the trplet.

12 Bology 2013, Fgure 7. Computng quartets between hgh-degree nodes. T 1 T 2 F G C C x F F G x G Now, let F denote all the leaves n T 1, except those n F, F and F, and smlarly, let G denote the set of leaves n T 2 not n G, G and G. The number of unresolved quartets (x) s then gven by F G. We cannot drectly buld a table of all F G n O(n 2 ), snce ust from the number of ndces (, and ), we would need a table of sze O(n 3 ). Instead, we can compute: F G = G ( G F + G F + G F ) where: G = n ( G + G + G ) (we can tabulate G l for all l n O(n) n preprocessng) and: G F l = F l ( F l G + F l G + F l G ) for l =,, ; so, we can also count how many unresolved topologes we have for (x) f we have bult tables n preprocessng and we now the center nodes. To get the runnng tme of O(n 3 ), we thus only need to fnd the center nodes n constant tme for each trplet, {,, }. We acheve ths by fndng a lnear number of centers, n lnear tme, for a lnear number of trplets. For pars, {, }, we terate through all and ther correspondng centers n lnear tme. The dea s as follows: for each par, and, we dentfy the path between them, whch s easly done n O(n). Along ths path, we then consder all nner nodes and the trees branchng off. We can explctly terate through all n such a subtree, and the correspondng center node s the node where the subtree branches off the path from to (see Fgure 8). Ths way, for each par, {, } of whch there are O(n 2 ) we terate through leaves n O(n) and count the number of shared quartets, {,,, x}, gvng us a total runnng tme of O(n 3 ). The second algorthm n Chrstansen et al. [20] completely avods dealng wth unresolved quartet topologes by only countng the resolved topologes that are ether shared, A, or that dffer, B, between the two trees. Recall that the quartet dstance between trees T 1 and T 2 are gven by B+C +D n Fgure 2. If we have functons A(T 1, T 2 ) and B(T 1, T 2 ) that compute A and B for the two trees, respectvely, then we can compute the quartet dstance wthout consderng unresolved topologes explctly, snce A + B + C = A(T 1, T 1 ), A + B + D = A(T 2, T 2 ), and: qdst(t 1, T 2 ) = B + C + D = (A + B + C) + (A + B + D) 2 A B = A(T 1, T 1 ) + A(T 2, T 2 ) 2 A(T 1, T 2 ) B(T 1, T 2 )

13 Bology 2013, Computng the row and column sums, A + B + C and A + B + D, can also be done faster than usng the A functon; n fact, t can be computed n lnear tme for both trplets and quartets usng dynamc programmng (see ether Chrstansen et al. [21] or Brodal et al. [5] for detals). Fgure 8. Handlng all trees hangng off the path from to. C Should one want to compute the parameterzed quartet dstance nstead, t can be done a lttle more cumbersomely, but stll usng only the A and B counts: pqdst(t 1, T 2, p) = B + p (C + D) = B + p [(A + B + C) + (A + B + D) 2 A 2 B] = B(T 1, T 2 ) + p [A(T 1, T 1 ) + A(T 2, T 2 ) 2 A(T 1, T 2 ) 2 B(T 1, T 2 )] although that s not lely to be an effcent way of dong ths. We wll not consder t n more detal for now, however. The second algorthm from Chrstansen et al. [20] s based on countng the resolved quartets, whether case A or B, usng orented edges, F e 1 (F 2, F 3 ), and the quartets they clam. Consder orented edges e 1 and e 2 n T 1 and T 2, respectvely, and let v 1 and v 2 denote the destnaton nodes of the edges. If v 1 e or v 2 are hgh-degree nodes, then the edges do not correspond to unque clams, F 1 1 (F2, F 3 ) and e G 2 1 (G2, G 3 ), snce the trees, F 2, F 3, G 2 and G 3, are not unquely defned (see Fgure 9a). To get around ths, the algorthm transforms the trees by expandng (arbtrarly) each hgh-degree node nto a bnary tree, taggng edges, so we can dstngush between orgnal edges and edges caused by the expanson (see Fgure 9b). We then defne extended clams, F 1 e 1 e 1 (F 2, F 3 ), consstng of two edges, e 1 and e 1, such that e 1 s one of the orgnal edges, e 1 s a result of expandng nodes and such that the orented path from e 1 to e 1 only goes through expanson edges. The extended clams play the role that clams do n the orgnal algorthm, and each resolved quartet n the orgnal tree s clamed by exactly two extended clams n the expanded tree. The algorthm then mplements functons A and B by explctly teratng through all pars of extended clams. Each of the orgnal O(n) edges are expanded nto at most O(nd) extended clams, where d s the maxmal degree of the trees, and by countng the equal or dfferent quartet topologes n constant tme for each par of extended clams, the total runnng tme s O(n 2 d 2 ).

14 Bology 2013, Fgure 9. Expandng hgh-degree nodes. F 2 F 2 F 3 v 1 F 3 F 1 e1 v 1 F 1 e 1 F 4 F 4 F 5 F 5 (a) The edge e 1 s not part of a unque clam, but part of all clams F 1 e 1! (F,F ) for, =2, 3, 4, 5, 6=. (b) The node v 1 n (a) s expanded to a bnary tree such that every trple F 1,F,F can be clamed by a unque e extended clam F 1,... 1! (F,F ) for all, =2, 3, 4, 5, 6=. The A functon counts the number of equal topologes from the table of F G counts exactly as the algorthm for bnary trees: ( ) A(e 1 e 1, e 2 e F1 G 1 2) = ( F 2 G 2 F 3 G 3 + F 2 G 3 F 3 G 2 ) 2 and: A(T 1, T 2 ) = 1 2 A(e 1 e 1, e 2 e 2) e 1 e 1 T 1 e 2 e 2 T 2 The B functon, countng the number of resolved quartets, {,,, l}, wth dfferent topologes, also uses the tables, but wth a slghtly dfferent expresson. Snce we have resolved, but dfferent, quartet e 1 e topologes whenever F 1 1 (F 2, F 3 ) clams l (.e.,, F 1, F 2 and l F 3 ) and e 2 e G 2 1 (G 2, G 3 ) clams l, l, l or l, we can count as: B(e 1 e 1, e 2 e 2) = F 1 G 1 F 1 G 3 F 2 G 2 F 3 G 1 + F 1 G 1 F 1 G 2 F 2 G 3 F 3 G 1 + F 1 G 1 F 1 G 2 F 2 G 1 F 3 G 3 + F 1 G 1 F 1 G 3 F 2 G 1 F 3 G 2 Because of symmetres and countng each quartet n two clams n each tree, ths wll over-count by a factor of four, so the number of resolved, but dfferent, quartet topologes between the two trees s counted as: B(T 1, T 2 ) = 1 B(e 1 e 4 1, e 2 e 2) e 1 e 1 T 1 e 2 e 2 T 2 Chrstansen et al. [21] mproved the runnng tme to O(n 2 d) by changng the countng schemes for the A and B functons. The gst of the deas n the mproved countng scheme s to extend the table of shared leaf counts, F G, wth values F G, F Ḡ and F Ḡ, where F denotes the set of leaves not n subtree F. Usng these extra counts, the algorthm bulds addtonal tables, over-countng

15 Bology 2013, the number of quartets clamed and, then, adustng the over-count. For detals on ths, rather nvolved, countng scheme, we refer to the orgnal paper [21]. Usng the same basc dea, but wth yet another countng scheme and another set of tables countng sets of shared leaves, Nelsen et al. [22] developed an O(n 2+α ) algorthm, where α = (ω 1)/2 and O(n ω ) s the tme t taes to multply two n n matrces. Usng the Coppersmth-Wnograd algorthm [23], where ω = 2.376, ths yelds a runnng tme of O(n ) and was the frst guaranteed sub-cubc tme algorthm for computng the quartet dstance between general trees. Here, the underlyng dea s to reduce an explct teraton over O(d 2 ) pars of clams to a matrx-matrx multplcaton as part of the countng teraton. Agan, we refer to the orgnal paper for detals [22] Tree Colorng The colorng approach of Brodal et al. [14,15] for bnary trees, descrbed n Secton 3.2, has been extended to general trees, frst by Stssng et al. [24] and recently by Brodal et al. [5]. The result of [24] s an O(d 9 n log n) tme algorthm for computng the quartet dstance between two trees, T 1 and T 2, where d = max{d 1, d 2 } for d, the maxmal degree of a node n tree T. Ths was the frst algorthm for general trees allowng for sub-quadratc tme. The approach of [24] stays rather close to that of [14,15], but wth a decoraton scheme adapted to general trees. The dependency, d 9, on d stems from the HDT data structure havng a d factor n ts balance bound when appled to general trees and from the appled decoraton scheme for the HDT requrng O(d 8 ) tme for the update of the decoraton of a node based on ts chldren after a color change. The results of [5] are an O(n log n) tme algorthm for computng the trplet dstance and an O(dn log n) tme algorthm for computng the quartet dstance, both for general trees, wth d defned as above. The mproved dependency on d stems from a new defnton of the HDT data structure achevng good balance for general trees and from a much more elaborate decoraton scheme, wth a large system of auxlary counters assstng n the calculaton of the man counters. Johansen and Holt [25] have later shown how to mprove the algorthm, such that d = mn{d 1, d 2 }, whch s of sgnfcance f only one of the trees has a hgh degree. The new HDT defnton s based on the four types of components shown n Fgure 10. Here, the base components, consttutng the leaves of the HDT, are L components contanng a leaf from T 2 and I components contanng a sngle nternal node from T 2. The nternal nodes of the HDT are formed by C components contanng a connected set of nodes wth at most two external edges to other components and G components contanng the subtrees of some of the sblngs of a node n T 2. Durng constructon of the HDT, C and G components are created by the transformatons and compostons shown n Fgure 11. The constructon proceeds n rounds, wth each round performng a set of non-overlappng compostons. It s shown n [5] that n O(log n) rounds and O(n) tme, a 1/ log(10/9)-locally balanced HDT tree s formed. Ths plays the role of the HDT descrbed n Secton 3.2, but now, for general trees.

16 Bology 2013, Fgure 10. The four dfferent types of components. C L I G Fgure 11. The two types of transformatons (top) and the three types of compostons (bottom). C L C G L!C C!G C C 2 C 1 C G 1 G 2 G G I CC!C GG!G IG!C Addtonal changes nclude the use of d+1 colors, denoted 0, 1, 2,...,d, and new defntons of whch node a trplet/quartet s assocated wth. The node n queston s denoted the anchor, and the choce of anchors can be seen n Fgure 12. Ths choce forms the bass for the defnton of the decoraton system, whch, as sad, s qute nvolved and for whch we refer to [5] for detals (as well as to [25] for mnor correctons and for a trmmed varant of the system). Fgure 12. The anchors (whte nodes) of resolved and unresolved trplets and quartets. Edges n the fgures represents paths n the tree. Trplets Resolved Unresolved Quartets Resolved Unresolved Type Type Type ` ` ` ` ` The remanng parts follow the lnes of [14,15] (see Secton 3.2). The basc algorthm, correspondng to Fgure 4 for bnary trees, for traversng T 1 recursvely can be seen n Fgure 13. It mantans the

17 Bology 2013, followng nvarants: (1) When enterng a node, v, all leaves n the subtree of v have the color, one, and all leaves not n the subtree of v have the color, zero; (2) When extng v, all leaves n T 1 have the color, zero. To ntalze Invarant (1), all leaves are colored one at the start. As n Secton 3.2, the operatons, Contract and Extract, are then added (analogously to Fgure 6) for explotng a varant of the extended smaller-half trc, n order to arrve at the stated runnng tmes. Fgure 13. The man algorthm performng a recursve traversal of T 1. Count(v) f v s a leaf Color v by the color 0. else Let c 1 be the chld of v wth largest subtree, and let c 2,...c be ts remanng chldren. for =2to Color the leaves n the subtree of c by the color. // Leaves are now colored accordng to v Query the HDT for the number of trplets/quartets n T 2 compatble wth the colorng. Add that number to the global count. for =2to Color the leaves n the subtree of c by the color 0. Count(c 1 ) for =2to Color the leaves n the subtree of c by the color 1. Count(c ) 5. Expermental Results Most of the algorthms we have descrbed n ths revew paper have been mplemented n dfferent software tools, and n ths secton, we expermentally compare ther runtme performance. All experments nvolved compare randomly generated balanced trees. We note that two randomly generated trees are expected to have a large dstance, whch nfluences the runnng tme for the colorng algorthms. Smlar trees requre, overall, less updatng n the herarchcal decomposton, so random trees are a worst-case stuaton for these. Experments, not shown here, demonstrate that comparng smlar trees can be sgnfcantly faster [25]. The shape of the trees also affects the runnng tme for the colorng algorthms through the smaller-half trc, wth faster runnng tmes for very sewed trees (for perfectly balanced trees, the number of color changes requred s Θ(n log n), whle for the other extreme, caterpllar trees, the number s Θ(n)). All experments n ths secton were conducted on an Ubuntu Lnux Server 12.04, 3.4 GHz 64-bt Intel Core (quad-core) wth 32 GB of RAM. For the quartet dstance between bnary trees, we performed experments wth three algorthms, the O(n log 2 n) tme algorthm mplemented n QDst [14,26], the O(n ) tme algorthms from Nelsen et al. [22] and the O(dn log n) tme algorthm from Brodal et al. [5]. For the quartet dstance between non-bnary trees, we consdered trees wth a degree of eght and 128, usng only the last two of the three algorthms, snce the QDst algorthm only wors on bnary trees. Note, however, that

18 Bology 2013, the mplementaton from [5] s not strctly the varaton descrbed n the paper. It contans algorthmc optmzatons mprovng both the asymptotc and practcal runtme as descrbed n [25]. Fgure 14 shows the runtmes of the three mplementatons of the quartet dstance calculaton algorthms runnng on bnary trees. All three mplementatons can operate on trees of a sze of up to 10,000; only the algorthm from [5] s shown for up to one mllon leaves. For nputs of less than approxmately 4,000 leaves, the mplementaton of the O(n ) tme algorthm s faster than the mplementaton of the O(n log 2 n) tme algorthm. For all nput szes depcted, however, the mplementaton of the O(dn log n) tme algorthm s the fastest, computng the dstance between two trees wth one mllon leaves n well under two mnutes. Fgure 14. Quartet dstance runnng tme on bnary trees. Runnng tme n seconds O(d n log n), [5] O(n ), [18] O(n log 2 n), [22] Runnng tme n seconds O(d n log n), [5] Number of leaves (a) Number of leaves (b) Fgure 15a compares the two general quartet dstance algorthms on trees wth a degree of eght, whle Fgure 15b compares the algorthms for trees wth a degree of 128. Both algorthms are faster on these hgher-degree trees than on bnary trees. The O(n ) tme algorthm s faster on small trees (where the exact pont where the other algorthm s faster depends on the degree). Fgure 15. Quartet dstance runnng tme on non-bnary balanced trees O(d n log n), [5] O(n ), [18] O(d n log n), [5] O(n ), [18] Runnng tme n seconds Number of leaves (a) Trees wth degree d =8. Runnng tme n seconds Number of leaves (b) Trees wth degree d =128.

19 Bology 2013, We note, however, that the runtme of the algorthm from [5], for d somewhere between 256 and 512, n fact, ncreases beyond that of the bnary case. For d = 1, 024, the runtme of the algorthm has almost doubled compared to the bnary case, and for d = 2, 048, the runtme has more than trpled compared to the bnary case (results not shown). For the trplet dstance between bnary trees, we compared the general algorthm from [5] wth the algorthm from [16] for bnary trees. For the general algorthm, we also performed experments on trees wth a degree of 8 and 128. Fgure 16 shows the measured runnng tmes. These experments are summarzed n Fgure 16. For bnary trees, the O(n log 2 n) tme algorthm [16] s faster than the O(n log n) tme algorthm [5] for all measured szes. For the general algorthm, we agan observe that t performs faster on hgh-degree trees (where, snce the number of leaves s fxed, we have fewer nternal nodes to handle). Fgure 16. Trplet dstance runnng tme. For the O(n log 2 n) tme algorthm, whch only handles bnary trees, results are only shown for a degree of d = 2, whle for the general algorthm, results are also shown for d = 8 and d = 128. Runnng tme n seconds O(n logn), [5], d = 2 O(n logn), [5], d = 8 O(n logn), [5], d = 128 O(n log 2 n), [16], d = Number of leaves 6. Conclusons We have presented a seres of algorthmc mprovements for computng the trplet and quartet dstance between two general trees that we have developed over the last decade. Our wor has followed two man approaches, one based on countng shared topologes usng tables of the ntersectons of subtrees and one based on colorng labels and countng compatble topologes usng a herarchcal decomposton data structure. The second approach has resulted n the currently best worst-case runnng tme of O(n log n) for computng the trplet dstance and O(dn log n) for computng the quartet dstance. Whether ths can be mproved further s currently unnown. Whle the theoretcal fastest algorthms nvolve rather complex booeepng for countng topologes, we have shown that they can be mplemented to be effcent n practce, as well, computng the dstance between two trees wth a mllon leaves n a few mnutes. Wth more typcal phylogenetc tree szes, wth the number of leaves n the hundreds or low thousands, the dstance can be computed n less than a second.

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

CE 221 Data Structures and Algorithms

CE 221 Data Structures and Algorithms CE 1 ata Structures and Algorthms Chapter 4: Trees BST Text: Read Wess, 4.3 Izmr Unversty of Economcs 1 The Search Tree AT Bnary Search Trees An mportant applcaton of bnary trees s n searchng. Let us assume

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

On Some Entertaining Applications of the Concept of Set in Computer Science Course

On Some Entertaining Applications of the Concept of Set in Computer Science Course On Some Entertanng Applcatons of the Concept of Set n Computer Scence Course Krasmr Yordzhev *, Hrstna Kostadnova ** * Assocate Professor Krasmr Yordzhev, Ph.D., Faculty of Mathematcs and Natural Scences,

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Інформаційні технології в освіті ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE Yordzhev K., Kostadnova H. Some aspects of programmng educaton

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array Inserton Sort Dvde and Conquer Sortng CSE 6 Data Structures Lecture 18 What f frst k elements of array are already sorted? 4, 7, 1, 5, 1, 16 We can shft the tal of the sorted elements lst down and then

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements Module 3: Element Propertes Lecture : Lagrange and Serendpty Elements 5 In last lecture note, the nterpolaton functons are derved on the bass of assumed polynomal from Pascal s trangle for the fled varable.

More information

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss. Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs:

More information

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound CSE 326: Data Structures Qucksort Comparson Sortng Bound Steve Setz Wnter 2009 Qucksort Qucksort uses a dvde and conquer strategy, but does not requre the O(N) extra space that MergeSort does. Here s the

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information

Optimal Workload-based Weighted Wavelet Synopses

Optimal Workload-based Weighted Wavelet Synopses Optmal Workload-based Weghted Wavelet Synopses Yoss Matas School of Computer Scence Tel Avv Unversty Tel Avv 69978, Israel matas@tau.ac.l Danel Urel School of Computer Scence Tel Avv Unversty Tel Avv 69978,

More information

Array transposition in CUDA shared memory

Array transposition in CUDA shared memory Array transposton n CUDA shared memory Mke Gles February 19, 2014 Abstract Ths short note s nspred by some code wrtten by Jeremy Appleyard for the transposton of data through shared memory. I had some

More information

Ramsey numbers of cubes versus cliques

Ramsey numbers of cubes versus cliques Ramsey numbers of cubes versus clques Davd Conlon Jacob Fox Choongbum Lee Benny Sudakov Abstract The cube graph Q n s the skeleton of the n-dmensonal cube. It s an n-regular graph on 2 n vertces. The Ramsey

More information

Improving Low Density Parity Check Codes Over the Erasure Channel. The Nelder Mead Downhill Simplex Method. Scott Stransky

Improving Low Density Parity Check Codes Over the Erasure Channel. The Nelder Mead Downhill Simplex Method. Scott Stransky Improvng Low Densty Party Check Codes Over the Erasure Channel The Nelder Mead Downhll Smplex Method Scott Stransky Programmng n conjuncton wth: Bors Cukalovc 18.413 Fnal Project Sprng 2004 Page 1 Abstract

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound CSE 326: Data Structures Qucksort Comparson Sortng Bound Bran Curless Sprng 2008 Announcements (5/14/08) Homework due at begnnng of class on Frday. Secton tomorrow: Graded homeworks returned More dscusson

More information

Sorting. Sorting. Why Sort? Consistent Ordering

Sorting. Sorting. Why Sort? Consistent Ordering Sortng CSE 6 Data Structures Unt 15 Readng: Sectons.1-. Bubble and Insert sort,.5 Heap sort, Secton..6 Radx sort, Secton.6 Mergesort, Secton. Qucksort, Secton.8 Lower bound Sortng Input an array A of data

More information

Sorting: The Big Picture. The steps of QuickSort. QuickSort Example. QuickSort Example. QuickSort Example. Recursive Quicksort

Sorting: The Big Picture. The steps of QuickSort. QuickSort Example. QuickSort Example. QuickSort Example. Recursive Quicksort Sortng: The Bg Pcture Gven n comparable elements n an array, sort them n an ncreasng (or decreasng) order. Smple algorthms: O(n ) Inserton sort Selecton sort Bubble sort Shell sort Fancer algorthms: O(n

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

CHAPTER 2 DECOMPOSITION OF GRAPHS

CHAPTER 2 DECOMPOSITION OF GRAPHS CHAPTER DECOMPOSITION OF GRAPHS. INTRODUCTION A graph H s called a Supersubdvson of a graph G f H s obtaned from G by replacng every edge uv of G by a bpartte graph,m (m may vary for each edge by dentfyng

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Priority queues and heaps Professors Clark F. Olson and Carol Zander

Priority queues and heaps Professors Clark F. Olson and Carol Zander Prorty queues and eaps Professors Clark F. Olson and Carol Zander Prorty queues A common abstract data type (ADT) n computer scence s te prorty queue. As you mgt expect from te name, eac tem n te prorty

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Hermite Splines in Lie Groups as Products of Geodesics

Hermite Splines in Lie Groups as Products of Geodesics Hermte Splnes n Le Groups as Products of Geodescs Ethan Eade Updated May 28, 2017 1 Introducton 1.1 Goal Ths document defnes a curve n the Le group G parametrzed by tme and by structural parameters n the

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

1 Dynamic Connectivity

1 Dynamic Connectivity 15-850: Advanced Algorthms CMU, Sprng 2017 Lecture #3: Dynamc Graph Connectvty algorthms 01/30/17 Lecturer: Anupam Gupta Scrbe: Hu Han Chn, Jacob Imola Dynamc graph algorthms s the study of standard graph

More information

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming CS 4/560 Desgn and Analyss of Algorthms Kent State Unversty Dept. of Math & Computer Scence LECT-6 Dynamc Programmng 2 Dynamc Programmng Dynamc Programmng, lke the dvde-and-conquer method, solves problems

More information

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6) Harvard Unversty CS 101 Fall 2005, Shmon Schocken Assembler Elements of Computng Systems 1 Assembler (Ch. 6) Why care about assemblers? Because Assemblers employ some nfty trcks Assemblers are the frst

More information

The Shortest Path of Touring Lines given in the Plane

The Shortest Path of Touring Lines given in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 262 The Open Cybernetcs & Systemcs Journal, 2015, 9, 262-267 The Shortest Path of Tourng Lnes gven n the Plane Open Access Ljuan Wang 1,2, Dandan He

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

Fast Computation of Shortest Path for Visiting Segments in the Plane

Fast Computation of Shortest Path for Visiting Segments in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 4 The Open Cybernetcs & Systemcs Journal, 04, 8, 4-9 Open Access Fast Computaton of Shortest Path for Vstng Segments n the Plane Ljuan Wang,, Bo Jang

More information

CS221: Algorithms and Data Structures. Priority Queues and Heaps. Alan J. Hu (Borrowing slides from Steve Wolfman)

CS221: Algorithms and Data Structures. Priority Queues and Heaps. Alan J. Hu (Borrowing slides from Steve Wolfman) CS: Algorthms and Data Structures Prorty Queues and Heaps Alan J. Hu (Borrowng sldes from Steve Wolfman) Learnng Goals After ths unt, you should be able to: Provde examples of approprate applcatons for

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

CHAPTER 10: ALGORITHM DESIGN TECHNIQUES

CHAPTER 10: ALGORITHM DESIGN TECHNIQUES CHAPTER 10: ALGORITHM DESIGN TECHNIQUES So far, we have been concerned wth the effcent mplementaton of algorthms. We have seen that when an algorthm s gven, the actual data structures need not be specfed.

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law) Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Discrete Applied Mathematics. Shortest paths in linear time on minor-closed graph classes, with an application to Steiner tree approximation

Discrete Applied Mathematics. Shortest paths in linear time on minor-closed graph classes, with an application to Steiner tree approximation Dscrete Appled Mathematcs 7 (9) 67 684 Contents lsts avalable at ScenceDrect Dscrete Appled Mathematcs journal homepage: www.elsever.com/locate/dam Shortest paths n lnear tme on mnor-closed graph classes,

More information

Design and Analysis of Algorithms

Design and Analysis of Algorithms Desgn and Analyss of Algorthms Heaps and Heapsort Reference: CLRS Chapter 6 Topcs: Heaps Heapsort Prorty queue Huo Hongwe Recap and overvew The story so far... Inserton sort runnng tme of Θ(n 2 ); sorts

More information

5 The Primal-Dual Method

5 The Primal-Dual Method 5 The Prmal-Dual Method Orgnally desgned as a method for solvng lnear programs, where t reduces weghted optmzaton problems to smpler combnatoral ones, the prmal-dual method (PDM) has receved much attenton

More information

Report on On-line Graph Coloring

Report on On-line Graph Coloring 2003 Fall Semester Comp 670K Onlne Algorthm Report on LO Yuet Me (00086365) cndylo@ust.hk Abstract Onlne algorthm deals wth data that has no future nformaton. Lots of examples demonstrate that onlne algorthm

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

F Geometric Mean Graphs

F Geometric Mean Graphs Avalable at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 1932-9466 Vol. 10, Issue 2 (December 2015), pp. 937-952 Applcatons and Appled Mathematcs: An Internatonal Journal (AAM) F Geometrc Mean Graphs A.

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Notes on Organizing Java Code: Packages, Visibility, and Scope

Notes on Organizing Java Code: Packages, Visibility, and Scope Notes on Organzng Java Code: Packages, Vsblty, and Scope CS 112 Wayne Snyder Java programmng n large measure s a process of defnng enttes (.e., packages, classes, methods, or felds) by name and then usng

More information

CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vidyanagar

CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vidyanagar CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vdyanagar Faculty Name: Am D. Trved Class: SYBCA Subject: US03CBCA03 (Advanced Data & Fle Structure) *UNIT 1 (ARRAYS AND TREES) **INTRODUCTION TO ARRAYS If we want

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Algorithm To Convert A Decimal To A Fraction

Algorithm To Convert A Decimal To A Fraction Algorthm To Convert A ecmal To A Fracton by John Kennedy Mathematcs epartment Santa Monca College 1900 Pco Blvd. Santa Monca, CA 90405 jrkennedy6@gmal.com Except for ths comment explanng that t s blank

More information

Faster Shortest Paths in Dense Distance Graphs, with Applications

Faster Shortest Paths in Dense Distance Graphs, with Applications Faster Shortest Paths n Dense Dstance Graphs, wth Applcatons Shay Mozes IDC Herzlya smozes@dc.ac.l Yahav Nussbaum Unversty of Hafa yahav.nussbaum@cs.tau.ac.l Oren Wemann Unversty of Hafa oren@cs.hafa.ac.l

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface. IDC Herzlya Shmon Schocken Assembler Shmon Schocken Sprng 2005 Elements of Computng Systems 1 Assembler (Ch. 6) Where we are at: Human Thought Abstract desgn Chapters 9, 12 abstract nterface H.L. Language

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Intro. Iterators. 1. Access

Intro. Iterators. 1. Access Intro Ths mornng I d lke to talk a lttle bt about s and s. We wll start out wth smlartes and dfferences, then we wll see how to draw them n envronment dagrams, and we wll fnsh wth some examples. Happy

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Greedy Technique - Definition

Greedy Technique - Definition Greedy Technque Greedy Technque - Defnton The greedy method s a general algorthm desgn paradgm, bult on the follong elements: confguratons: dfferent choces, collectons, or values to fnd objectve functon:

More information

Distributed Degree Splitting, Edge Coloring, and Orientations

Distributed Degree Splitting, Edge Coloring, and Orientations Abstract Dstrbuted Degree Splttng, Edge Colorng, and Orentatons Mohsen Ghaffar MIT ghaffar@mt.edu We study a famly of closely-related dstrbuted graph problems, whch we call degree splttng, where roughly

More information

An Approach in Coloring Semi-Regular Tilings on the Hyperbolic Plane

An Approach in Coloring Semi-Regular Tilings on the Hyperbolic Plane An Approach n Colorng Sem-Regular Tlngs on the Hyperbolc Plane Ma Louse Antonette N De Las Peñas, mlp@mathscmathadmueduph Glenn R Lago, glago@yahoocom Math Department, Ateneo de Manla Unversty, Loyola

More information

All-Pairs Shortest Paths. Approximate All-Pairs shortest paths Approximate distance oracles Spanners and Emulators. Uri Zwick Tel Aviv University

All-Pairs Shortest Paths. Approximate All-Pairs shortest paths Approximate distance oracles Spanners and Emulators. Uri Zwick Tel Aviv University Approxmate All-Pars shortest paths Approxmate dstance oracles Spanners and Emulators Ur Zwck Tel Avv Unversty Summer School on Shortest Paths (PATH05 DIKU, Unversty of Copenhagen All-Pars Shortest Paths

More information

Lecture #15 Lecture Notes

Lecture #15 Lecture Notes Lecture #15 Lecture Notes The ocean water column s very much a 3-D spatal entt and we need to represent that structure n an economcal way to deal wth t n calculatons. We wll dscuss one way to do so, emprcal

More information

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces Range mages For many structured lght scanners, the range data forms a hghly regular pattern known as a range mage. he samplng pattern s determned by the specfc scanner. Range mage regstraton 1 Examples

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation Intellgent Informaton Management, 013, 5, 191-195 Publshed Onlne November 013 (http://www.scrp.org/journal/m) http://dx.do.org/10.36/m.013.5601 Qualty Improvement Algorthm for Tetrahedral Mesh Based on

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

O n processors in CRCW PRAM

O n processors in CRCW PRAM PARALLEL COMPLEXITY OF SINGLE SOURCE SHORTEST PATH ALGORITHMS Mshra, P. K. Department o Appled Mathematcs Brla Insttute o Technology, Mesra Ranch-8355 (Inda) & Dept. o Electroncs & Electrcal Communcaton

More information

The Erdős Pósa property for vertex- and edge-disjoint odd cycles in graphs on orientable surfaces

The Erdős Pósa property for vertex- and edge-disjoint odd cycles in graphs on orientable surfaces Dscrete Mathematcs 307 (2007) 764 768 www.elsever.com/locate/dsc Note The Erdős Pósa property for vertex- and edge-dsjont odd cycles n graphs on orentable surfaces Ken-Ich Kawarabayash a, Atsuhro Nakamoto

More information