Study of a Simple Pruning Strategy with Days Algorithm

Size: px
Start display at page:

Download "Study of a Simple Pruning Strategy with Days Algorithm"

Transcription

1 Study of a Simple Pruning Strategy with ays Algorithm Thomas G. Kristensen Abstract We wish to calculate all pairwise Robinson Foulds distances in a set of trees. Traditional algorithms for doing this seldom take into account the fact that they are to be utilized on a set of very similar trees. Some randomized algorithms [] are actually slower, the more similar the trees are. In this paper we present an augmentation of ays algorithm that runs faster the more similar the trees are. Our studies indicate that (1) a careful implementation of ays algorithm is faster than recent randomized approaches, () our pruning strategy improves the running time of ays algorithm and that (3) augmenting ays algorithm with more sophisticated pruning strategies that remove larger topologies is most likely futile. 1 Introduction When researching the phylogenetic relationship between a set species, different methods will often disagree on the result. The need therefore arises for a measure of similarity between every pair of suggestions in a set of results. If the suggestions are represented as phylogenetic trees we can use the Robinson Foulds distance. Several algorithms exist that given a set of t phylogenetic trees over the same set of n species will calculate the Robinson Foulds distance between every pair of these. ays algorithm is a deterministic algorithm that computes the distance between two trees in time O(n) ([1]), which is optimal when studying exactly two trees. The algorithm can be used to calculate all pairwise distances in a set of trees in time O(t n) which is suboptimal as the size of the input is O(tn) and the size of the output is O(t ). Several randomized algorithms exist, but they are only fast when the trees are very dissimilar which is not the case in most real life applications ([]). A part of our study is therefore focused on comparing a randomized approach to the classical algorithm by ay. In this paper we investigate an extension of ays algorithm that removes small topologies that are shared among the trees by a pruning process that does not alter the asymptotic upper bound of the original algorithm. A goal of our study is to investigate if the extra time used for detecting and removing these 1

2 A C Figure 1: A simple example of phylogenetic tree. topologies pays of, and whether or not more complicated extensions of ays algorithm are advised. This paper is organized as follows: we first present the background of this study, including phylogenetic trees and the Robinson Foulds distance. Next we present ays algorithm and our pruning strategy. We then describe our experimental setup including implementation details and choice of data set. We finish with an analysis and discussion of our results and a short conclusion. ackground.1 Phylogenetic Trees A phylogenetic tree is a way of representing the evolutionary relationship between a set of species. ach leaf in a phylogenetic tree represents a species, and each inner node represents a speciation event. The tree can be rooted or unrooted. Given n species (or taxa), different methods exist to infer a phylogenetic tree with n leafs. These methods are based on different background data and philosophies, and will therefore often disagree on the topology of the underlying tree. Sometimes one method might even produce several suggestions based on the same data. It is therefore useful to have a measure of agreement between t trees over the same taxa.. Robinson Foulds istance If we remove an edge from a tree, we split the set of leaves into two disjoint sets, called a bipartition. In Figure 1, the dashed edge separates A and from C, and, giving rise to the bipartition A C. The edges that connect the leaves A,, C, and to the rest of the tree are trivial, as the bipartition they define is present in every phylogenetic tree over the taxa. We will therefore ignore them, and focus on the set of nontrivial bipartitions. Let (T ) denote the set of nontrivial bipartition in a tree T. In Figure, (T 1 ) is {A C, A C} and (T ) is {A C, A C}. Given a

3 C A A T 1 C T Figure : Two phylogenetic trees that share exactly one bipartition A C. pair of trees, T 1 and T, we can count the number of bipartitions in (T 1 ) not found in (T ) as (T 1 ) (T ). In our example, (T 1 ) (T ) is {A C} as the two trees share one split; A C. The Robinson Foulds distance d RF (T 1, T ) is d RF (T 1, T ) = 1 ( (T 1) (T ) + (T ) (T 1 ) ) Given t trees T 1,..., T t, we wish to calculate the Robinson Foulds distance between every pair of these. As the distance is symmetric, we only need to compare T i to T i+1,..., T t. 3 ays Algorithm 3.1 Original Algorithm ays algorithm was first presented in [1]. The main idea is to represent the bipartitions by intervals instead of numbers. First, we root the two trees T 1 and T in a taxon, e.g. A (see Figure 3). Next, we perform a depth first traversal of T 1, remembering in which order the leaves are visited. The order in which the leaves are visited defines a map from taxon to a number. We apply this map to the leaves of T. ach split in T 1 now has a well defined interval associated with it, namely the interval from its leftmost child to its rightmost. Similar, some of the nontrivial splits in T have a well defined interval associated with them, based on the numbers on the leafs in their subtree, even though they might not be sorted (see Figure 3). The intervals can be collected in a depth first search by comparing the number of leafs to the smallest and largest leaf in the subtrees. If an interval is shared between the two trees, this corresponds to a shared bipartition. We can examine which intervals are shared by treating them as list of tuples. The lists can be sorted using radix sort in O(n) and compared for duplicates in O(n). Instead of sorting, ay use an O(n) table where lookups are performed using a bijective map from the inner nodes of T 1 to {,..., n}. In 3

4 A T 1 T A C C [, 4] [3, 4] 1 [3, 4] Figure 3: ays algorithm illustrated. our study we use a bijective map into O(n ) for speeding up the computation by removing some bookkeeping in ays algorithm. We can use the leaf map from T 1 on the trees T,..., T t, calculating the Robinson Foulds distance between T 1 and all these, yielding O(t n) in total. Again, notice that we only have to compare the tree T i width the trees T j where j > i. This fact does not reduce the asymptotic running time of the algorithm, but of course improves the running time of the algorithm in practice. 3. Pruning Strategy If we can identify a topology that are shared among all the trees, we can replace it with a leaf without altering the result, but with an improvement in execution time. When we reach the tree T i in ays algorithm, the topologies need only be shared among the remaining t i trees T i+1,..., T t to be replaced. The problem is, of course, that we should be able to identify and remove the topologies fast. In our experiments, we accomplish this by only considering shared topologies with exactly two leafs called cherries. That is, topologies identified as intervals of size two in ays algorithm, such as the interval [3, 4] in the previous example. Such an interval can easily be replaced by one of its leafs, reducing the size of the trees, but without altering the distance between them. Identifying the intervals that are shared among a set of O(t) trees can be done by keeping a table that keeps track of how many times each of the O(n) cherries have been seen in the depth first traversal of the trees. Updating the table does not alter the asymptotic running time of ays algorithm. If we can 4

5 Algorithm NA RNA Hash-RF ays algorithm Pruning Table 1: xecution time of our algorithm on trees from []. map from interval to node position in each tree in constant time, we can also remove cherries in constant time. Such a map can be maintained in the depth first traversal of the tree without any further asymptotic cost and as we remove at most tn cherries in our algorithm this step will not alter the asymptotic execution time. If the trees are very similar, we expect the trees to be pruned very fast, resulting in very small trees. However, if the trees are very dissimilar, we expect the alternations to slow down the program significantly. Therefore, examining the two versions of ays algorithm will be one of our main goals. 4 xperimental Setup We have implemented the two algorithms in C++. The implementations are available on... They all share the same code for parsing the files and printing the output along with most of the code for traversing the trees. We have tested our implementations on two sets of realistic binary trees from the article []. The trees are generated by the Recursive-Iterative CM3 (Rec-I-CM3) algorithm on (1) a set of 500 aligned rbcl NA sequences and () a set of 1,17 aligned large subunit ribosomal RNA sequences. In the rest of this article, these will be referred to as NA and RNA. oth sets consist of 1000 trees. xperiments have been performed on a different number of trees by running the implementations on the first t trees in the NA data set. We also test for varying sizes of trees by removing leafs from this data set before performing our experiments; we believe that the resulting trees still represent realistic trees. All timing tests were performed on a Macbook with.16 GHz and G RAM and for each experiment we have performed five runs and present the average. 5 Results 5.1 Rec-I-CM3 Trees We want to compare our implementations to randomized approaches, and have therefore run our programs on the data sets NA and RNA from the article []. ach algorithm was run five times and the average is presented in Table 1 along with the running time of the Hash-RF program from []. As can be 5

6 seen from Table 1, our algorithm outperform the randomized approach. This is particularly encouraging as the observations of Hash-RF are performed on a 3 GHz processor as opposed to the.16 GHz machine used for ays and the pruning strategy. It is less encouraging to see that the pruning strategy seems to be slower on the RNA data set. To examine what the best obtainable time is we ran the two programs on 1,000 copies of the first tree in the NA and RNA data sets. ays ordinary algorithm used the same time on the two trees as it did on the 1,000 different trees. The pruning strategy used 3.0 seconds on the NA tree and 6.6 seconds on the RNA tree, which is less than a third of the time used by the original algorithm. 5. Running Time xperiments s ays Pruning + Running time as a function of n n Figure 4: Running time in seconds s of the number of taxa n. The number of trees is kept at 1,000. The results of running our implementations on trees of varying size are illustrated in Figure 4, where it can be seen that the running time is indeed linear in the number of taxa n. As can be seen, the extra work associated with detecting and removing the shared topologies renders our algorithm faster than ays algorithm, even for as little as 100 taxa. We have also tested our implementations on a different number of trees. The results are presented in 5, where it can be seen that both algorithms are 6

7 s Running time as a function of t 1 ays 10 Pruning t Figure 5: Running time in seconds s of the number of trees t. The number of taxa is kept at 500. quadratic in the number of trees t. The pruning strategy is however a bit faster when the number of trees exceed Close xamination of Pruning Strategy As previously described, our pruning strategy only removes shared topologies of size two. Removing entire topologies in binary trees can be done by repeatedly removing topologies of size two until there are none left. We have examined how much we can prune the trees in terms of how many taxa we can remove. In Figure 6 we have plotted the total number of removed taxa as a function of how many trees we have had as source T i. As can be seen, our strategy identifies and removes shared topologies within very few iterations and far the most iterations are performed on trees that share no common topologies. This process is particularly active in the beginning of the algorithm. In Figure 6(a) we have focused our attention on the first ten iterations, where within eight iterations the remaining trees share no topologies. As a shared topology (a subtree rooted at an inner node) will always have at least one topology of size two, the number of taxa will be reduced by at least one in each iteration of our algorithm (given that the trees share a common topology). An example of this can be seen in Figure 6(b), where a typical part of Figure 6 is magnified. The algorithm removes one taxa in each iteration, quickly reaching the optimal number of removed taxa. 7

8 n (a) (b) i (b) (a) Figure 6: Top: Number of removable taxa as a function of iterations i. The number of taxa that are removed by our pruning strategy is presented in solid lines, the number of removable taxa is presented in grey. elow: Magnification of the first 10 iterations (a) and a magnification of a typical place on the graph (b), where we can see an entire topology being removed. 8

9 In a run of ays algorithm on t trees, the number of FS traversals performed is 1 t(t + 1) 1. On the NA data set from Figure 6 this is 500,499. Close inspection of the observed data reveals that 48,566 of the performed FS traversals in our pruning strategy are performed on trees of optimal size in the sense that no topology is shared among all the trees. This means that only 17,933 ( 3.58%) of our FS traversals are performed on trees of suboptimal size. The same pattern is seen on the other tested data sets. 6 Conclusions and Future Works Our studies show that using the pruning strategy pays of when the trees are realistic in the sense that they are generated by a phylogenetic inference program. It even outperforms randomized approaches, but our studies indicates that so does the now more than 0 year old algorithm by ay. Our studies also indicate that the possible improvement obtained by jumping from removing small shared topologies of size two to removing entire topologies is negligible if not non existing. If however we remove the entire set of shared topologies before running ays algorithm we might gain a small improvement, but this has not been the focus of this study. In our pruning strategy, we remove topologies of size two, which corresponds to small bipartitions. We have not investigated how much, if anything, could be obtained by removing shared bipartitions with more than two taxa on both sides. We would not be able to reduce the number of taxa, but the number of internal nodes could be lowered and might result in a better algorithm. References [1] William ay. Optimal algorithms for comparing trees with labeled leaves. Journal of Classification, (1):7 8, ecember [] Seung-Jin Sul and Tiffani L. Williams. A randomized algorithm for comparing sets of phylogenetic trees. In APC, pages ,

A RANDOMIZED ALGORITHM FOR COMPARING SETS OF PHYLOGENETIC TREES

A RANDOMIZED ALGORITHM FOR COMPARING SETS OF PHYLOGENETIC TREES A RANDOMIZED ALGORITHM FOR COMPARING SETS OF PHYLOGENETIC TREES SEUNG-JIN SUL AND TIFFANI L. WILLIAMS Department of Computer Science Texas A&M University College Station, TX 77843-3112 USA E-mail: {sulsj,tlw}@cs.tamu.edu

More information

A Randomized Algorithm for Comparing Sets of Phylogenetic Trees

A Randomized Algorithm for Comparing Sets of Phylogenetic Trees A Randomized Algorithm for Comparing Sets of Phylogenetic Trees Seung-Jin Sul and Tiffani L. Williams Department of Computer Science Texas A&M University E-mail: {sulsj,tlw}@cs.tamu.edu Technical Report

More information

An Experimental Analysis of Robinson-Foulds Distance Matrix Algorithms

An Experimental Analysis of Robinson-Foulds Distance Matrix Algorithms An Experimental Analysis of Robinson-Foulds Distance Matrix Algorithms Seung-Jin Sul and Tiffani L. Williams Department of Computer Science Texas A&M University College Station, TX 77843-3 {sulsj,tlw}@cs.tamu.edu

More information

Lecture 9 March 4, 2010

Lecture 9 March 4, 2010 6.851: Advanced Data Structures Spring 010 Dr. André Schulz Lecture 9 March 4, 010 1 Overview Last lecture we defined the Least Common Ancestor (LCA) and Range Min Query (RMQ) problems. Recall that an

More information

Binary Search Tree (3A) Young Won Lim 6/6/18

Binary Search Tree (3A) Young Won Lim 6/6/18 Binary Search Tree (A) //1 Copyright (c) 2015-201 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2

More information

A practical O(n log 2 n) time algorithm for computing the triplet distance on binary trees

A practical O(n log 2 n) time algorithm for computing the triplet distance on binary trees A practical O(n log 2 n) time algorithm for computing the triplet distance on binary trees Andreas Sand 1,2, Gerth Stølting Brodal 2,3, Rolf Fagerberg 4, Christian N. S. Pedersen 1,2 and Thomas Mailund

More information

Introduction to Trees

Introduction to Trees Introduction to Trees Tandy Warnow December 28, 2016 Introduction to Trees Tandy Warnow Clades of a rooted tree Every node v in a leaf-labelled rooted tree defines a subset of the leafset that is below

More information

Computing the Quartet Distance Between Trees of Arbitrary Degrees

Computing the Quartet Distance Between Trees of Arbitrary Degrees January 22, 2006 University of Aarhus Department of Computer Science Computing the Quartet Distance Between Trees of Arbitrary Degrees Chris Christiansen & Martin Randers Thesis supervisor: Christian Nørgaard

More information

Laboratory Module X B TREES

Laboratory Module X B TREES Purpose: Purpose 1... Purpose 2 Purpose 3. Laboratory Module X B TREES 1. Preparation Before Lab When working with large sets of data, it is often not possible or desirable to maintain the entire structure

More information

Olivier Gascuel Arbres formels et Arbre de la Vie Conférence ENS Cachan, septembre Arbres formels et Arbre de la Vie.

Olivier Gascuel Arbres formels et Arbre de la Vie Conférence ENS Cachan, septembre Arbres formels et Arbre de la Vie. Arbres formels et Arbre de la Vie Olivier Gascuel Centre National de la Recherche Scientifique LIRMM, Montpellier, France www.lirmm.fr/gascuel 10 permanent researchers 2 technical staff 3 postdocs, 10

More information

AUTOMATED PLAUSIBILITY ANALYSIS OF LARGE PHYLOGENIES

AUTOMATED PLAUSIBILITY ANALYSIS OF LARGE PHYLOGENIES CHAPTER 1 AUTOMATED PLAUSIBILITY ANALYSIS OF LARGE PHYLOGENIES David Dao 1, Tomáš Flouri 2, Alexandros Stamatakis 1,2 1 KarlsruheInstituteofTechnology,InstituteforTheoreticalInformatics,Postfach 6980,

More information

COMP3121/3821/9101/ s1 Assignment 1

COMP3121/3821/9101/ s1 Assignment 1 Sample solutions to assignment 1 1. (a) Describe an O(n log n) algorithm (in the sense of the worst case performance) that, given an array S of n integers and another integer x, determines whether or not

More information

CSCI-401 Examlet #5. Name: Class: Date: True/False Indicate whether the sentence or statement is true or false.

CSCI-401 Examlet #5. Name: Class: Date: True/False Indicate whether the sentence or statement is true or false. Name: Class: Date: CSCI-401 Examlet #5 True/False Indicate whether the sentence or statement is true or false. 1. The root node of the standard binary tree can be drawn anywhere in the tree diagram. 2.

More information

Dynamic Programming for Phylogenetic Estimation

Dynamic Programming for Phylogenetic Estimation 1 / 45 Dynamic Programming for Phylogenetic Estimation CS598AGB Pranjal Vachaspati University of Illinois at Urbana-Champaign 2 / 45 Coalescent-based Species Tree Estimation Find evolutionary tree for

More information

CS350: Data Structures B-Trees

CS350: Data Structures B-Trees B-Trees James Moscola Department of Engineering & Computer Science York College of Pennsylvania James Moscola Introduction All of the data structures that we ve looked at thus far have been memory-based

More information

DDS Dynamic Search Trees

DDS Dynamic Search Trees DDS Dynamic Search Trees 1 Data structures l A data structure models some abstract object. It implements a number of operations on this object, which usually can be classified into l creation and deletion

More information

Binary Search Tree (3A) Young Won Lim 6/4/18

Binary Search Tree (3A) Young Won Lim 6/4/18 Binary Search Tree (A) /4/1 Copyright (c) 2015-201 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2

More information

Scaling species tree estimation methods to large datasets using NJMerge

Scaling species tree estimation methods to large datasets using NJMerge Scaling species tree estimation methods to large datasets using NJMerge Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana Champaign 2018 Phylogenomics Software

More information

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Fundamenta Informaticae 56 (2003) 105 120 105 IOS Press A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Jesper Jansson Department of Computer Science Lund University, Box 118 SE-221

More information

Data Structures and Algorithms

Data Structures and Algorithms Data Structures and Algorithms CS245-2008S-19 B-Trees David Galles Department of Computer Science University of San Francisco 19-0: Indexing Operations: Add an element Remove an element Find an element,

More information

Binary Search Tree (3A) Young Won Lim 6/2/18

Binary Search Tree (3A) Young Won Lim 6/2/18 Binary Search Tree (A) /2/1 Copyright (c) 2015-201 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2

More information

Solutions. Suppose we insert all elements of U into the table, and let n(b) be the number of elements of U that hash to bucket b. Then.

Solutions. Suppose we insert all elements of U into the table, and let n(b) be the number of elements of U that hash to bucket b. Then. Assignment 3 1. Exercise [11.2-3 on p. 229] Modify hashing by chaining (i.e., bucketvector with BucketType = List) so that BucketType = OrderedList. How is the runtime of search, insert, and remove affected?

More information

CSE 530A. B+ Trees. Washington University Fall 2013

CSE 530A. B+ Trees. Washington University Fall 2013 CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key

More information

Fast Hashing Algorithms to Summarize Large. Collections of Evolutionary Trees

Fast Hashing Algorithms to Summarize Large. Collections of Evolutionary Trees Texas A&M CS Technical Report 2008-6- June 27, 2008 Fast Hashing Algorithms to Summarize Large Collections of Evolutionary Trees by Seung-Jin Sul and Tiffani L. Williams Department of Computer Science

More information

Distance based tree reconstruction. Hierarchical clustering (UPGMA) Neighbor-Joining (NJ)

Distance based tree reconstruction. Hierarchical clustering (UPGMA) Neighbor-Joining (NJ) Distance based tree reconstruction Hierarchical clustering (UPGMA) Neighbor-Joining (NJ) All organisms have evolved from a common ancestor. Infer the evolutionary tree (tree topology and edge lengths)

More information

CS 350 : Data Structures B-Trees

CS 350 : Data Structures B-Trees CS 350 : Data Structures B-Trees David Babcock (courtesy of James Moscola) Department of Physical Sciences York College of Pennsylvania James Moscola Introduction All of the data structures that we ve

More information

16 Greedy Algorithms

16 Greedy Algorithms 16 Greedy Algorithms Optimization algorithms typically go through a sequence of steps, with a set of choices at each For many optimization problems, using dynamic programming to determine the best choices

More information

Ensures that no such path is more than twice as long as any other, so that the tree is approximately balanced

Ensures that no such path is more than twice as long as any other, so that the tree is approximately balanced 13 Red-Black Trees A red-black tree (RBT) is a BST with one extra bit of storage per node: color, either RED or BLACK Constraining the node colors on any path from the root to a leaf Ensures that no such

More information

Red-Black-Trees and Heaps in Timestamp-Adjusting Sweepline Based Algorithms

Red-Black-Trees and Heaps in Timestamp-Adjusting Sweepline Based Algorithms Department of Informatics, University of Zürich Vertiefungsarbeit Red-Black-Trees and Heaps in Timestamp-Adjusting Sweepline Based Algorithms Mirko Richter Matrikelnummer: 12-917-175 Email: mirko.richter@uzh.ch

More information

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time B + -TREES MOTIVATION An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations finish within O(log N) time The theoretical conclusion

More information

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs Computational Optimization ISE 407 Lecture 16 Dr. Ted Ralphs ISE 407 Lecture 16 1 References for Today s Lecture Required reading Sections 6.5-6.7 References CLRS Chapter 22 R. Sedgewick, Algorithms in

More information

9/24/ Hash functions

9/24/ Hash functions 11.3 Hash functions A good hash function satis es (approximately) the assumption of SUH: each key is equally likely to hash to any of the slots, independently of the other keys We typically have no way

More information

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Divide and Conquer

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Divide and Conquer Computer Science 385 Analysis of Algorithms Siena College Spring 2011 Topic Notes: Divide and Conquer Divide and-conquer is a very common and very powerful algorithm design technique. The general idea:

More information

Divide and Conquer Sorting Algorithms and Noncomparison-based

Divide and Conquer Sorting Algorithms and Noncomparison-based Divide and Conquer Sorting Algorithms and Noncomparison-based Sorting Algorithms COMP1927 16x1 Sedgewick Chapters 7 and 8 Sedgewick Chapter 6.10, Chapter 10 DIVIDE AND CONQUER SORTING ALGORITHMS Step 1

More information

Algorithms in Systems Engineering ISE 172. Lecture 16. Dr. Ted Ralphs

Algorithms in Systems Engineering ISE 172. Lecture 16. Dr. Ted Ralphs Algorithms in Systems Engineering ISE 172 Lecture 16 Dr. Ted Ralphs ISE 172 Lecture 16 1 References for Today s Lecture Required reading Sections 6.5-6.7 References CLRS Chapter 22 R. Sedgewick, Algorithms

More information

CSE373: Data Structure & Algorithms Lecture 18: Comparison Sorting. Dan Grossman Fall 2013

CSE373: Data Structure & Algorithms Lecture 18: Comparison Sorting. Dan Grossman Fall 2013 CSE373: Data Structure & Algorithms Lecture 18: Comparison Sorting Dan Grossman Fall 2013 Introduction to Sorting Stacks, queues, priority queues, and dictionaries all focused on providing one element

More information

Data Structures - Binary Trees and Operations on Binary Trees

Data Structures - Binary Trees and Operations on Binary Trees ata Structures - inary Trees and Operations on inary Trees MS 275 anko drovic (UI) MS 275 October 15, 2018 1 / 25 inary Trees binary tree is a finite set of elements. It can be empty partitioned into three

More information

Figure 4.1: The evolution of a rooted tree.

Figure 4.1: The evolution of a rooted tree. 106 CHAPTER 4. INDUCTION, RECURSION AND RECURRENCES 4.6 Rooted Trees 4.6.1 The idea of a rooted tree We talked about how a tree diagram helps us visualize merge sort or other divide and conquer algorithms.

More information

ABOUT THE LARGEST SUBTREE COMMON TO SEVERAL PHYLOGENETIC TREES Alain Guénoche 1, Henri Garreta 2 and Laurent Tichit 3

ABOUT THE LARGEST SUBTREE COMMON TO SEVERAL PHYLOGENETIC TREES Alain Guénoche 1, Henri Garreta 2 and Laurent Tichit 3 The XIII International Conference Applied Stochastic Models and Data Analysis (ASMDA-2009) June 30-July 3, 2009, Vilnius, LITHUANIA ISBN 978-9955-28-463-5 L. Sakalauskas, C. Skiadas and E. K. Zavadskas

More information

Physical Level of Databases: B+-Trees

Physical Level of Databases: B+-Trees Physical Level of Databases: B+-Trees Adnan YAZICI Computer Engineering Department METU (Fall 2005) 1 B + -Tree Index Files l Disadvantage of indexed-sequential files: performance degrades as file grows,

More information

B-Trees. Version of October 2, B-Trees Version of October 2, / 22

B-Trees. Version of October 2, B-Trees Version of October 2, / 22 B-Trees Version of October 2, 2014 B-Trees Version of October 2, 2014 1 / 22 Motivation An AVL tree can be an excellent data structure for implementing dictionary search, insertion and deletion Each operation

More information

Multi-Way Number Partitioning

Multi-Way Number Partitioning Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09) Multi-Way Number Partitioning Richard E. Korf Computer Science Department University of California,

More information

CS61B, Spring 2003 Discussion #15 Amir Kamil UC Berkeley 4/28/03

CS61B, Spring 2003 Discussion #15 Amir Kamil UC Berkeley 4/28/03 CS61B, Spring 2003 Discussion #15 Amir Kamil UC Berkeley 4/28/03 Topics: Sorting 1 Sorting The topic of sorting really requires no introduction. We start with an unsorted sequence, and want a sorted sequence

More information

CSci 231 Homework 7. Red Black Trees. CLRS Chapter 13 and 14

CSci 231 Homework 7. Red Black Trees. CLRS Chapter 13 and 14 CSci 31 Homework 7 Red Black Trees CLRS Chapter 13 and 14 1. Problem 13-1 (persistent dynamic sets).. Problem 13-3 (AVL trees) 3. In this problem we consider a data structure for maintaining a multi-set

More information

Evolution of Tandemly Repeated Sequences

Evolution of Tandemly Repeated Sequences University of Canterbury Department of Mathematics and Statistics Evolution of Tandemly Repeated Sequences A thesis submitted in partial fulfilment of the requirements of the Degree for Master of Science

More information

Computing the All-Pairs Quartet Distance on a set of Evolutionary Trees

Computing the All-Pairs Quartet Distance on a set of Evolutionary Trees Journal of Bioinformatics and Computational Biology c Imperial College Press Computing the All-Pairs Quartet Distance on a set of Evolutionary Trees M. Stissing, T. Mailund, C. N. S. Pedersen and G. S.

More information

Binary Search Tree (2A) Young Won Lim 5/17/18

Binary Search Tree (2A) Young Won Lim 5/17/18 Binary Search Tree (2A) Copyright (c) 2015-2018 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or

More information

CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting. Aaron Bauer Winter 2014

CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting. Aaron Bauer Winter 2014 CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting Aaron Bauer Winter 2014 The main problem, stated carefully For now, assume we have n comparable elements in an array and we want

More information

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1 Asymptotics, Recurrence and Basic Algorithms 1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1 2. O(n) 2. [1 pt] What is the solution to the recurrence T(n) = T(n/2) + n, T(1)

More information

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion, Introduction Chapter 5 Hashing hashing performs basic operations, such as insertion, deletion, and finds in average time 2 Hashing a hash table is merely an of some fixed size hashing converts into locations

More information

2. Sorting. 2.1 Introduction

2. Sorting. 2.1 Introduction 2. Sorting 2.1 Introduction Given a set of n objects, it is often necessary to sort them based on some characteristic, be it size, importance, spelling, etc. It is trivial for a human to do this with a

More information

The SNPR neighbourhood of tree-child networks

The SNPR neighbourhood of tree-child networks Journal of Graph Algorithms and Applications http://jgaa.info/ vol. 22, no. 2, pp. 29 55 (2018) DOI: 10.7155/jgaa.00472 The SNPR neighbourhood of tree-child networks Jonathan Klawitter Department of Computer

More information

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology Introduction Chapter 4 Trees for large input, even linear access time may be prohibitive we need data structures that exhibit average running times closer to O(log N) binary search tree 2 Terminology recursive

More information

The worst case complexity of Maximum Parsimony

The worst case complexity of Maximum Parsimony he worst case complexity of Maximum Parsimony mir armel Noa Musa-Lempel Dekel sur Michal Ziv-Ukelson Ben-urion University June 2, 20 / 2 What s a phylogeny Phylogenies: raph-like structures whose topology

More information

CSci 231 Homework 7. Red Black Trees. CLRS Chapter 13 and 14

CSci 231 Homework 7. Red Black Trees. CLRS Chapter 13 and 14 CSci 31 Homework 7 Red Black Trees CLRS Chapter 13 and 14 Choose 4 problems from the list below. 1. (CLRS 13.1-6) What is the largest possible number of internal nodes in a red-black tree with black-height

More information

Final Exam in Algorithms and Data Structures 1 (1DL210)

Final Exam in Algorithms and Data Structures 1 (1DL210) Final Exam in Algorithms and Data Structures 1 (1DL210) Department of Information Technology Uppsala University February 0th, 2012 Lecturers: Parosh Aziz Abdulla, Jonathan Cederberg and Jari Stenman Location:

More information

We have used both of the last two claims in previous algorithms and therefore their proof is omitted.

We have used both of the last two claims in previous algorithms and therefore their proof is omitted. Homework 3 Question 1 The algorithm will be based on the following logic: If an edge ( ) is deleted from the spanning tree, let and be the trees that were created, rooted at and respectively Adding any

More information

arxiv: v3 [cs.ds] 18 Apr 2011

arxiv: v3 [cs.ds] 18 Apr 2011 A tight bound on the worst-case number of comparisons for Floyd s heap construction algorithm Ioannis K. Paparrizos School of Computer and Communication Sciences Ècole Polytechnique Fèdèrale de Lausanne

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17 01.433/33 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/2/1.1 Introduction In this lecture we ll talk about a useful abstraction, priority queues, which are

More information

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree Chapter 4 Trees 2 Introduction for large input, even access time may be prohibitive we need data structures that exhibit running times closer to O(log N) binary search tree 3 Terminology recursive definition

More information

15.4 Longest common subsequence

15.4 Longest common subsequence 15.4 Longest common subsequence Biological applications often need to compare the DNA of two (or more) different organisms A strand of DNA consists of a string of molecules called bases, where the possible

More information

Introduction to Analysis of Algorithms

Introduction to Analysis of Algorithms Introduction to Analysis of Algorithms Analysis of Algorithms To determine how efficient an algorithm is we compute the amount of time that the algorithm needs to solve a problem. Given two algorithms

More information

Binary Trees

Binary Trees Binary Trees 4-7-2005 Opening Discussion What did we talk about last class? Do you have any code to show? Do you have any questions about the assignment? What is a Tree? You are all familiar with what

More information

Trees. Eric McCreath

Trees. Eric McCreath Trees Eric McCreath 2 Overview In this lecture we will explore: general trees, binary trees, binary search trees, and AVL and B-Trees. 3 Trees Trees are recursive data structures. They are useful for:

More information

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 The data of the problem is of 2GB and the hard disk is of 1GB capacity, to solve this problem we should Use better data structures

More information

HEAPS: IMPLEMENTING EFFICIENT PRIORITY QUEUES

HEAPS: IMPLEMENTING EFFICIENT PRIORITY QUEUES HEAPS: IMPLEMENTING EFFICIENT PRIORITY QUEUES 2 5 6 9 7 Presentation for use with the textbook Data Structures and Algorithms in Java, 6 th edition, by M. T. Goodrich, R. Tamassia, and M. H., Wiley, 2014

More information

Constructing a Cycle Basis for a Planar Graph

Constructing a Cycle Basis for a Planar Graph Constructing a Cycle Basis for a Planar Graph David Eberly, Geometric Tools, Redmond WA 98052 https://www.geometrictools.com/ This work is licensed under the Creative Commons Attribution 4.0 International

More information

Indexing and Hashing

Indexing and Hashing C H A P T E R 1 Indexing and Hashing Solutions to Practice Exercises 1.1 Reasons for not keeping several search indices include: a. Every index requires additional CPU time and disk I/O overhead during

More information

COMP : Trees. COMP20012 Trees 219

COMP : Trees. COMP20012 Trees 219 COMP20012 3: Trees COMP20012 Trees 219 Trees Seen lots of examples. Parse Trees Decision Trees Search Trees Family Trees Hierarchical Structures Management Directories COMP20012 Trees 220 Trees have natural

More information

Computational Geometry

Computational Geometry Windowing queries Windowing Windowing queries Zoom in; re-center and zoom in; select by outlining Windowing Windowing queries Windowing Windowing queries Given a set of n axis-parallel line segments, preprocess

More information

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far Chapter 5 Hashing 2 Introduction hashing performs basic operations, such as insertion, deletion, and finds in average time better than other ADTs we ve seen so far 3 Hashing a hash table is merely an hashing

More information

Algorithms for Computing Maximum Agreement Subtrees

Algorithms for Computing Maximum Agreement Subtrees Algorithms for Computing Maximum Agreement Subtrees Nikolaj Skipper Rasmussen 20114373 Thomas Hedegaard Lange 20113788 Master s Thesis, Computer Science June 2016 Advisor: Christian Nørgaard Storm Pedersen

More information

COMP 251 Winter 2017 Online quizzes with answers

COMP 251 Winter 2017 Online quizzes with answers COMP 251 Winter 2017 Online quizzes with answers Open Addressing (2) Which of the following assertions are true about open address tables? A. You cannot store more records than the total number of slots

More information

Lecture 5. Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs

Lecture 5. Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs Lecture 5 Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs Reading: Randomized Search Trees by Aragon & Seidel, Algorithmica 1996, http://sims.berkeley.edu/~aragon/pubs/rst96.pdf;

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17 5.1 Introduction You should all know a few ways of sorting in O(n log n)

More information

Linked Structures Songs, Games, Movies Part III. Fall 2013 Carola Wenk

Linked Structures Songs, Games, Movies Part III. Fall 2013 Carola Wenk Linked Structures Songs, Games, Movies Part III Fall 2013 Carola Wenk Biological Structures Nature has evolved vascular and nervous systems in a hierarchical manner so that nutrients and signals can quickly

More information

UNIT III BALANCED SEARCH TREES AND INDEXING

UNIT III BALANCED SEARCH TREES AND INDEXING UNIT III BALANCED SEARCH TREES AND INDEXING OBJECTIVE The implementation of hash tables is frequently called hashing. Hashing is a technique used for performing insertions, deletions and finds in constant

More information

Throughout the chapter, we will assume that the reader is familiar with the basics of phylogenetic trees.

Throughout the chapter, we will assume that the reader is familiar with the basics of phylogenetic trees. Chapter 7 SUPERTREE ALGORITHMS FOR NESTED TAXA Philip Daniel and Charles Semple Abstract: Keywords: Most supertree algorithms combine collections of rooted phylogenetic trees with overlapping leaf sets

More information

Applied Mathematics Letters. Graph triangulations and the compatibility of unrooted phylogenetic trees

Applied Mathematics Letters. Graph triangulations and the compatibility of unrooted phylogenetic trees Applied Mathematics Letters 24 (2011) 719 723 Contents lists available at ScienceDirect Applied Mathematics Letters journal homepage: www.elsevier.com/locate/aml Graph triangulations and the compatibility

More information

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Greedy Algorithms (continued) The best known application where the greedy algorithm is optimal is surely

More information

Evolutionary Trees. Fredrik Ronquist. August 29, 2005

Evolutionary Trees. Fredrik Ronquist. August 29, 2005 Evolutionary Trees Fredrik Ronquist August 29, 2005 1 Evolutionary Trees Tree is an important concept in Graph Theory, Computer Science, Evolutionary Biology, and many other areas. In evolutionary biology,

More information

Quiz 1 Solutions. (a) f(n) = n g(n) = log n Circle all that apply: f = O(g) f = Θ(g) f = Ω(g)

Quiz 1 Solutions. (a) f(n) = n g(n) = log n Circle all that apply: f = O(g) f = Θ(g) f = Ω(g) Introduction to Algorithms March 11, 2009 Massachusetts Institute of Technology 6.006 Spring 2009 Professors Sivan Toledo and Alan Edelman Quiz 1 Solutions Problem 1. Quiz 1 Solutions Asymptotic orders

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

COMP 250 Fall recurrences 2 Oct. 13, 2017

COMP 250 Fall recurrences 2 Oct. 13, 2017 COMP 250 Fall 2017 15 - recurrences 2 Oct. 13, 2017 Here we examine the recurrences for mergesort and quicksort. Mergesort Recall the mergesort algorithm: we divide the list of things to be sorted into

More information

CH 8. HEAPS AND PRIORITY QUEUES

CH 8. HEAPS AND PRIORITY QUEUES CH 8. HEAPS AND PRIORITY QUEUES ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++, GOODRICH, TAMASSIA AND MOUNT (WILEY 2004) AND SLIDES FROM NANCY

More information

CS301 - Data Structures Glossary By

CS301 - Data Structures Glossary By CS301 - Data Structures Glossary By Abstract Data Type : A set of data values and associated operations that are precisely specified independent of any particular implementation. Also known as ADT Algorithm

More information

Divide-and-Conquer. The most-well known algorithm design strategy: smaller instances. combining these solutions

Divide-and-Conquer. The most-well known algorithm design strategy: smaller instances. combining these solutions Divide-and-Conquer The most-well known algorithm design strategy: 1. Divide instance of problem into two or more smaller instances 2. Solve smaller instances recursively 3. Obtain solution to original

More information

Successor/Predecessor Rules in Binary Trees

Successor/Predecessor Rules in Binary Trees Successor/Predecessor Rules in inary Trees Thomas. nastasio July 7, 2003 Introduction inary tree traversals are commonly made in one of three patterns, inorder, preorder, and postorder. These traversals

More information

( ) 1 B. 1. Suppose f x

( ) 1 B. 1. Suppose f x CSE Name Test Spring Last Digits of Student ID Multiple Choice. Write your answer to the LEFT of each problem. points each is a monotonically increasing function. Which of the following approximates the

More information

CS-301 Data Structure. Tariq Hanif

CS-301 Data Structure. Tariq Hanif 1. The tree data structure is a Linear data structure Non-linear data structure Graphical data structure Data structure like queue FINALTERM EXAMINATION Spring 2012 CS301- Data Structure 25-07-2012 2.

More information

Data Structures and Algorithms Week 4

Data Structures and Algorithms Week 4 Data Structures and Algorithms Week. About sorting algorithms. Heapsort Complete binary trees Heap data structure. Quicksort a popular algorithm very fast on average Previous Week Divide and conquer Merge

More information

Λέων-Χαράλαμπος Σταματάρης

Λέων-Χαράλαμπος Σταματάρης Λέων-Χαράλαμπος Σταματάρης INTRODUCTION Two classical problems of information dissemination in computer networks: The broadcasting problem: Distributing a particular message from a distinguished source

More information

CH. 8 PRIORITY QUEUES AND HEAPS

CH. 8 PRIORITY QUEUES AND HEAPS CH. 8 PRIORITY QUEUES AND HEAPS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++, GOODRICH, TAMASSIA AND MOUNT (WILEY 2004) AND SLIDES FROM NANCY

More information

Selection, Bubble, Insertion, Merge, Heap, Quick Bucket, Radix

Selection, Bubble, Insertion, Merge, Heap, Quick Bucket, Radix Spring 2010 Review Topics Big O Notation Heaps Sorting Selection, Bubble, Insertion, Merge, Heap, Quick Bucket, Radix Hashtables Tree Balancing: AVL trees and DSW algorithm Graphs: Basic terminology and

More information

Introduction to the Analysis of Algorithms. Algorithm

Introduction to the Analysis of Algorithms. Algorithm Introduction to the Analysis of Algorithms Based on the notes from David Fernandez-Baca Bryn Mawr College CS206 Intro to Data Structures Algorithm An algorithm is a strategy (well-defined computational

More information

For searching and sorting algorithms, this is particularly dependent on the number of data elements.

For searching and sorting algorithms, this is particularly dependent on the number of data elements. Looking up a phone number, accessing a website and checking the definition of a word in a dictionary all involve searching large amounts of data. Searching algorithms all accomplish the same goal finding

More information

Programming II (CS300)

Programming II (CS300) 1 Programming II (CS300) Chapter 11: Binary Search Trees MOUNA KACEM mouna@cs.wisc.edu Fall 2018 General Overview of Data Structures 2 Introduction to trees 3 Tree: Important non-linear data structure

More information

Improved parameterized complexity of the Maximum Agreement Subtree and Maximum Compatible Tree problems LIRMM, Tech.Rep. num 04026

Improved parameterized complexity of the Maximum Agreement Subtree and Maximum Compatible Tree problems LIRMM, Tech.Rep. num 04026 Improved parameterized complexity of the Maximum Agreement Subtree and Maximum Compatible Tree problems LIRMM, Tech.Rep. num 04026 Vincent Berry, François Nicolas Équipe Méthodes et Algorithmes pour la

More information

3 Fractional Ramsey Numbers

3 Fractional Ramsey Numbers 27 3 Fractional Ramsey Numbers Since the definition of Ramsey numbers makes use of the clique number of graphs, we may define fractional Ramsey numbers simply by substituting fractional clique number into

More information