Study of a Simple Pruning Strategy with Days Algorithm
|
|
- Nelson Gardner
- 5 years ago
- Views:
Transcription
1 Study of a Simple Pruning Strategy with ays Algorithm Thomas G. Kristensen Abstract We wish to calculate all pairwise Robinson Foulds distances in a set of trees. Traditional algorithms for doing this seldom take into account the fact that they are to be utilized on a set of very similar trees. Some randomized algorithms [] are actually slower, the more similar the trees are. In this paper we present an augmentation of ays algorithm that runs faster the more similar the trees are. Our studies indicate that (1) a careful implementation of ays algorithm is faster than recent randomized approaches, () our pruning strategy improves the running time of ays algorithm and that (3) augmenting ays algorithm with more sophisticated pruning strategies that remove larger topologies is most likely futile. 1 Introduction When researching the phylogenetic relationship between a set species, different methods will often disagree on the result. The need therefore arises for a measure of similarity between every pair of suggestions in a set of results. If the suggestions are represented as phylogenetic trees we can use the Robinson Foulds distance. Several algorithms exist that given a set of t phylogenetic trees over the same set of n species will calculate the Robinson Foulds distance between every pair of these. ays algorithm is a deterministic algorithm that computes the distance between two trees in time O(n) ([1]), which is optimal when studying exactly two trees. The algorithm can be used to calculate all pairwise distances in a set of trees in time O(t n) which is suboptimal as the size of the input is O(tn) and the size of the output is O(t ). Several randomized algorithms exist, but they are only fast when the trees are very dissimilar which is not the case in most real life applications ([]). A part of our study is therefore focused on comparing a randomized approach to the classical algorithm by ay. In this paper we investigate an extension of ays algorithm that removes small topologies that are shared among the trees by a pruning process that does not alter the asymptotic upper bound of the original algorithm. A goal of our study is to investigate if the extra time used for detecting and removing these 1
2 A C Figure 1: A simple example of phylogenetic tree. topologies pays of, and whether or not more complicated extensions of ays algorithm are advised. This paper is organized as follows: we first present the background of this study, including phylogenetic trees and the Robinson Foulds distance. Next we present ays algorithm and our pruning strategy. We then describe our experimental setup including implementation details and choice of data set. We finish with an analysis and discussion of our results and a short conclusion. ackground.1 Phylogenetic Trees A phylogenetic tree is a way of representing the evolutionary relationship between a set of species. ach leaf in a phylogenetic tree represents a species, and each inner node represents a speciation event. The tree can be rooted or unrooted. Given n species (or taxa), different methods exist to infer a phylogenetic tree with n leafs. These methods are based on different background data and philosophies, and will therefore often disagree on the topology of the underlying tree. Sometimes one method might even produce several suggestions based on the same data. It is therefore useful to have a measure of agreement between t trees over the same taxa.. Robinson Foulds istance If we remove an edge from a tree, we split the set of leaves into two disjoint sets, called a bipartition. In Figure 1, the dashed edge separates A and from C, and, giving rise to the bipartition A C. The edges that connect the leaves A,, C, and to the rest of the tree are trivial, as the bipartition they define is present in every phylogenetic tree over the taxa. We will therefore ignore them, and focus on the set of nontrivial bipartitions. Let (T ) denote the set of nontrivial bipartition in a tree T. In Figure, (T 1 ) is {A C, A C} and (T ) is {A C, A C}. Given a
3 C A A T 1 C T Figure : Two phylogenetic trees that share exactly one bipartition A C. pair of trees, T 1 and T, we can count the number of bipartitions in (T 1 ) not found in (T ) as (T 1 ) (T ). In our example, (T 1 ) (T ) is {A C} as the two trees share one split; A C. The Robinson Foulds distance d RF (T 1, T ) is d RF (T 1, T ) = 1 ( (T 1) (T ) + (T ) (T 1 ) ) Given t trees T 1,..., T t, we wish to calculate the Robinson Foulds distance between every pair of these. As the distance is symmetric, we only need to compare T i to T i+1,..., T t. 3 ays Algorithm 3.1 Original Algorithm ays algorithm was first presented in [1]. The main idea is to represent the bipartitions by intervals instead of numbers. First, we root the two trees T 1 and T in a taxon, e.g. A (see Figure 3). Next, we perform a depth first traversal of T 1, remembering in which order the leaves are visited. The order in which the leaves are visited defines a map from taxon to a number. We apply this map to the leaves of T. ach split in T 1 now has a well defined interval associated with it, namely the interval from its leftmost child to its rightmost. Similar, some of the nontrivial splits in T have a well defined interval associated with them, based on the numbers on the leafs in their subtree, even though they might not be sorted (see Figure 3). The intervals can be collected in a depth first search by comparing the number of leafs to the smallest and largest leaf in the subtrees. If an interval is shared between the two trees, this corresponds to a shared bipartition. We can examine which intervals are shared by treating them as list of tuples. The lists can be sorted using radix sort in O(n) and compared for duplicates in O(n). Instead of sorting, ay use an O(n) table where lookups are performed using a bijective map from the inner nodes of T 1 to {,..., n}. In 3
4 A T 1 T A C C [, 4] [3, 4] 1 [3, 4] Figure 3: ays algorithm illustrated. our study we use a bijective map into O(n ) for speeding up the computation by removing some bookkeeping in ays algorithm. We can use the leaf map from T 1 on the trees T,..., T t, calculating the Robinson Foulds distance between T 1 and all these, yielding O(t n) in total. Again, notice that we only have to compare the tree T i width the trees T j where j > i. This fact does not reduce the asymptotic running time of the algorithm, but of course improves the running time of the algorithm in practice. 3. Pruning Strategy If we can identify a topology that are shared among all the trees, we can replace it with a leaf without altering the result, but with an improvement in execution time. When we reach the tree T i in ays algorithm, the topologies need only be shared among the remaining t i trees T i+1,..., T t to be replaced. The problem is, of course, that we should be able to identify and remove the topologies fast. In our experiments, we accomplish this by only considering shared topologies with exactly two leafs called cherries. That is, topologies identified as intervals of size two in ays algorithm, such as the interval [3, 4] in the previous example. Such an interval can easily be replaced by one of its leafs, reducing the size of the trees, but without altering the distance between them. Identifying the intervals that are shared among a set of O(t) trees can be done by keeping a table that keeps track of how many times each of the O(n) cherries have been seen in the depth first traversal of the trees. Updating the table does not alter the asymptotic running time of ays algorithm. If we can 4
5 Algorithm NA RNA Hash-RF ays algorithm Pruning Table 1: xecution time of our algorithm on trees from []. map from interval to node position in each tree in constant time, we can also remove cherries in constant time. Such a map can be maintained in the depth first traversal of the tree without any further asymptotic cost and as we remove at most tn cherries in our algorithm this step will not alter the asymptotic execution time. If the trees are very similar, we expect the trees to be pruned very fast, resulting in very small trees. However, if the trees are very dissimilar, we expect the alternations to slow down the program significantly. Therefore, examining the two versions of ays algorithm will be one of our main goals. 4 xperimental Setup We have implemented the two algorithms in C++. The implementations are available on... They all share the same code for parsing the files and printing the output along with most of the code for traversing the trees. We have tested our implementations on two sets of realistic binary trees from the article []. The trees are generated by the Recursive-Iterative CM3 (Rec-I-CM3) algorithm on (1) a set of 500 aligned rbcl NA sequences and () a set of 1,17 aligned large subunit ribosomal RNA sequences. In the rest of this article, these will be referred to as NA and RNA. oth sets consist of 1000 trees. xperiments have been performed on a different number of trees by running the implementations on the first t trees in the NA data set. We also test for varying sizes of trees by removing leafs from this data set before performing our experiments; we believe that the resulting trees still represent realistic trees. All timing tests were performed on a Macbook with.16 GHz and G RAM and for each experiment we have performed five runs and present the average. 5 Results 5.1 Rec-I-CM3 Trees We want to compare our implementations to randomized approaches, and have therefore run our programs on the data sets NA and RNA from the article []. ach algorithm was run five times and the average is presented in Table 1 along with the running time of the Hash-RF program from []. As can be 5
6 seen from Table 1, our algorithm outperform the randomized approach. This is particularly encouraging as the observations of Hash-RF are performed on a 3 GHz processor as opposed to the.16 GHz machine used for ays and the pruning strategy. It is less encouraging to see that the pruning strategy seems to be slower on the RNA data set. To examine what the best obtainable time is we ran the two programs on 1,000 copies of the first tree in the NA and RNA data sets. ays ordinary algorithm used the same time on the two trees as it did on the 1,000 different trees. The pruning strategy used 3.0 seconds on the NA tree and 6.6 seconds on the RNA tree, which is less than a third of the time used by the original algorithm. 5. Running Time xperiments s ays Pruning + Running time as a function of n n Figure 4: Running time in seconds s of the number of taxa n. The number of trees is kept at 1,000. The results of running our implementations on trees of varying size are illustrated in Figure 4, where it can be seen that the running time is indeed linear in the number of taxa n. As can be seen, the extra work associated with detecting and removing the shared topologies renders our algorithm faster than ays algorithm, even for as little as 100 taxa. We have also tested our implementations on a different number of trees. The results are presented in 5, where it can be seen that both algorithms are 6
7 s Running time as a function of t 1 ays 10 Pruning t Figure 5: Running time in seconds s of the number of trees t. The number of taxa is kept at 500. quadratic in the number of trees t. The pruning strategy is however a bit faster when the number of trees exceed Close xamination of Pruning Strategy As previously described, our pruning strategy only removes shared topologies of size two. Removing entire topologies in binary trees can be done by repeatedly removing topologies of size two until there are none left. We have examined how much we can prune the trees in terms of how many taxa we can remove. In Figure 6 we have plotted the total number of removed taxa as a function of how many trees we have had as source T i. As can be seen, our strategy identifies and removes shared topologies within very few iterations and far the most iterations are performed on trees that share no common topologies. This process is particularly active in the beginning of the algorithm. In Figure 6(a) we have focused our attention on the first ten iterations, where within eight iterations the remaining trees share no topologies. As a shared topology (a subtree rooted at an inner node) will always have at least one topology of size two, the number of taxa will be reduced by at least one in each iteration of our algorithm (given that the trees share a common topology). An example of this can be seen in Figure 6(b), where a typical part of Figure 6 is magnified. The algorithm removes one taxa in each iteration, quickly reaching the optimal number of removed taxa. 7
8 n (a) (b) i (b) (a) Figure 6: Top: Number of removable taxa as a function of iterations i. The number of taxa that are removed by our pruning strategy is presented in solid lines, the number of removable taxa is presented in grey. elow: Magnification of the first 10 iterations (a) and a magnification of a typical place on the graph (b), where we can see an entire topology being removed. 8
9 In a run of ays algorithm on t trees, the number of FS traversals performed is 1 t(t + 1) 1. On the NA data set from Figure 6 this is 500,499. Close inspection of the observed data reveals that 48,566 of the performed FS traversals in our pruning strategy are performed on trees of optimal size in the sense that no topology is shared among all the trees. This means that only 17,933 ( 3.58%) of our FS traversals are performed on trees of suboptimal size. The same pattern is seen on the other tested data sets. 6 Conclusions and Future Works Our studies show that using the pruning strategy pays of when the trees are realistic in the sense that they are generated by a phylogenetic inference program. It even outperforms randomized approaches, but our studies indicates that so does the now more than 0 year old algorithm by ay. Our studies also indicate that the possible improvement obtained by jumping from removing small shared topologies of size two to removing entire topologies is negligible if not non existing. If however we remove the entire set of shared topologies before running ays algorithm we might gain a small improvement, but this has not been the focus of this study. In our pruning strategy, we remove topologies of size two, which corresponds to small bipartitions. We have not investigated how much, if anything, could be obtained by removing shared bipartitions with more than two taxa on both sides. We would not be able to reduce the number of taxa, but the number of internal nodes could be lowered and might result in a better algorithm. References [1] William ay. Optimal algorithms for comparing trees with labeled leaves. Journal of Classification, (1):7 8, ecember [] Seung-Jin Sul and Tiffani L. Williams. A randomized algorithm for comparing sets of phylogenetic trees. In APC, pages ,
A RANDOMIZED ALGORITHM FOR COMPARING SETS OF PHYLOGENETIC TREES
A RANDOMIZED ALGORITHM FOR COMPARING SETS OF PHYLOGENETIC TREES SEUNG-JIN SUL AND TIFFANI L. WILLIAMS Department of Computer Science Texas A&M University College Station, TX 77843-3112 USA E-mail: {sulsj,tlw}@cs.tamu.edu
More informationA Randomized Algorithm for Comparing Sets of Phylogenetic Trees
A Randomized Algorithm for Comparing Sets of Phylogenetic Trees Seung-Jin Sul and Tiffani L. Williams Department of Computer Science Texas A&M University E-mail: {sulsj,tlw}@cs.tamu.edu Technical Report
More informationAn Experimental Analysis of Robinson-Foulds Distance Matrix Algorithms
An Experimental Analysis of Robinson-Foulds Distance Matrix Algorithms Seung-Jin Sul and Tiffani L. Williams Department of Computer Science Texas A&M University College Station, TX 77843-3 {sulsj,tlw}@cs.tamu.edu
More informationLecture 9 March 4, 2010
6.851: Advanced Data Structures Spring 010 Dr. André Schulz Lecture 9 March 4, 010 1 Overview Last lecture we defined the Least Common Ancestor (LCA) and Range Min Query (RMQ) problems. Recall that an
More informationBinary Search Tree (3A) Young Won Lim 6/6/18
Binary Search Tree (A) //1 Copyright (c) 2015-201 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2
More informationA practical O(n log 2 n) time algorithm for computing the triplet distance on binary trees
A practical O(n log 2 n) time algorithm for computing the triplet distance on binary trees Andreas Sand 1,2, Gerth Stølting Brodal 2,3, Rolf Fagerberg 4, Christian N. S. Pedersen 1,2 and Thomas Mailund
More informationIntroduction to Trees
Introduction to Trees Tandy Warnow December 28, 2016 Introduction to Trees Tandy Warnow Clades of a rooted tree Every node v in a leaf-labelled rooted tree defines a subset of the leafset that is below
More informationComputing the Quartet Distance Between Trees of Arbitrary Degrees
January 22, 2006 University of Aarhus Department of Computer Science Computing the Quartet Distance Between Trees of Arbitrary Degrees Chris Christiansen & Martin Randers Thesis supervisor: Christian Nørgaard
More informationLaboratory Module X B TREES
Purpose: Purpose 1... Purpose 2 Purpose 3. Laboratory Module X B TREES 1. Preparation Before Lab When working with large sets of data, it is often not possible or desirable to maintain the entire structure
More informationOlivier Gascuel Arbres formels et Arbre de la Vie Conférence ENS Cachan, septembre Arbres formels et Arbre de la Vie.
Arbres formels et Arbre de la Vie Olivier Gascuel Centre National de la Recherche Scientifique LIRMM, Montpellier, France www.lirmm.fr/gascuel 10 permanent researchers 2 technical staff 3 postdocs, 10
More informationAUTOMATED PLAUSIBILITY ANALYSIS OF LARGE PHYLOGENIES
CHAPTER 1 AUTOMATED PLAUSIBILITY ANALYSIS OF LARGE PHYLOGENIES David Dao 1, Tomáš Flouri 2, Alexandros Stamatakis 1,2 1 KarlsruheInstituteofTechnology,InstituteforTheoreticalInformatics,Postfach 6980,
More informationCOMP3121/3821/9101/ s1 Assignment 1
Sample solutions to assignment 1 1. (a) Describe an O(n log n) algorithm (in the sense of the worst case performance) that, given an array S of n integers and another integer x, determines whether or not
More informationCSCI-401 Examlet #5. Name: Class: Date: True/False Indicate whether the sentence or statement is true or false.
Name: Class: Date: CSCI-401 Examlet #5 True/False Indicate whether the sentence or statement is true or false. 1. The root node of the standard binary tree can be drawn anywhere in the tree diagram. 2.
More informationDynamic Programming for Phylogenetic Estimation
1 / 45 Dynamic Programming for Phylogenetic Estimation CS598AGB Pranjal Vachaspati University of Illinois at Urbana-Champaign 2 / 45 Coalescent-based Species Tree Estimation Find evolutionary tree for
More informationCS350: Data Structures B-Trees
B-Trees James Moscola Department of Engineering & Computer Science York College of Pennsylvania James Moscola Introduction All of the data structures that we ve looked at thus far have been memory-based
More informationDDS Dynamic Search Trees
DDS Dynamic Search Trees 1 Data structures l A data structure models some abstract object. It implements a number of operations on this object, which usually can be classified into l creation and deletion
More informationBinary Search Tree (3A) Young Won Lim 6/4/18
Binary Search Tree (A) /4/1 Copyright (c) 2015-201 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2
More informationScaling species tree estimation methods to large datasets using NJMerge
Scaling species tree estimation methods to large datasets using NJMerge Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana Champaign 2018 Phylogenomics Software
More informationA Fast Algorithm for Optimal Alignment between Similar Ordered Trees
Fundamenta Informaticae 56 (2003) 105 120 105 IOS Press A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Jesper Jansson Department of Computer Science Lund University, Box 118 SE-221
More informationData Structures and Algorithms
Data Structures and Algorithms CS245-2008S-19 B-Trees David Galles Department of Computer Science University of San Francisco 19-0: Indexing Operations: Add an element Remove an element Find an element,
More informationBinary Search Tree (3A) Young Won Lim 6/2/18
Binary Search Tree (A) /2/1 Copyright (c) 2015-201 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2
More informationSolutions. Suppose we insert all elements of U into the table, and let n(b) be the number of elements of U that hash to bucket b. Then.
Assignment 3 1. Exercise [11.2-3 on p. 229] Modify hashing by chaining (i.e., bucketvector with BucketType = List) so that BucketType = OrderedList. How is the runtime of search, insert, and remove affected?
More informationCSE 530A. B+ Trees. Washington University Fall 2013
CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key
More informationFast Hashing Algorithms to Summarize Large. Collections of Evolutionary Trees
Texas A&M CS Technical Report 2008-6- June 27, 2008 Fast Hashing Algorithms to Summarize Large Collections of Evolutionary Trees by Seung-Jin Sul and Tiffani L. Williams Department of Computer Science
More informationDistance based tree reconstruction. Hierarchical clustering (UPGMA) Neighbor-Joining (NJ)
Distance based tree reconstruction Hierarchical clustering (UPGMA) Neighbor-Joining (NJ) All organisms have evolved from a common ancestor. Infer the evolutionary tree (tree topology and edge lengths)
More informationCS 350 : Data Structures B-Trees
CS 350 : Data Structures B-Trees David Babcock (courtesy of James Moscola) Department of Physical Sciences York College of Pennsylvania James Moscola Introduction All of the data structures that we ve
More information16 Greedy Algorithms
16 Greedy Algorithms Optimization algorithms typically go through a sequence of steps, with a set of choices at each For many optimization problems, using dynamic programming to determine the best choices
More informationEnsures that no such path is more than twice as long as any other, so that the tree is approximately balanced
13 Red-Black Trees A red-black tree (RBT) is a BST with one extra bit of storage per node: color, either RED or BLACK Constraining the node colors on any path from the root to a leaf Ensures that no such
More informationRed-Black-Trees and Heaps in Timestamp-Adjusting Sweepline Based Algorithms
Department of Informatics, University of Zürich Vertiefungsarbeit Red-Black-Trees and Heaps in Timestamp-Adjusting Sweepline Based Algorithms Mirko Richter Matrikelnummer: 12-917-175 Email: mirko.richter@uzh.ch
More informationAn AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time
B + -TREES MOTIVATION An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations finish within O(log N) time The theoretical conclusion
More informationComputational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs
Computational Optimization ISE 407 Lecture 16 Dr. Ted Ralphs ISE 407 Lecture 16 1 References for Today s Lecture Required reading Sections 6.5-6.7 References CLRS Chapter 22 R. Sedgewick, Algorithms in
More information9/24/ Hash functions
11.3 Hash functions A good hash function satis es (approximately) the assumption of SUH: each key is equally likely to hash to any of the slots, independently of the other keys We typically have no way
More informationComputer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Divide and Conquer
Computer Science 385 Analysis of Algorithms Siena College Spring 2011 Topic Notes: Divide and Conquer Divide and-conquer is a very common and very powerful algorithm design technique. The general idea:
More informationDivide and Conquer Sorting Algorithms and Noncomparison-based
Divide and Conquer Sorting Algorithms and Noncomparison-based Sorting Algorithms COMP1927 16x1 Sedgewick Chapters 7 and 8 Sedgewick Chapter 6.10, Chapter 10 DIVIDE AND CONQUER SORTING ALGORITHMS Step 1
More informationAlgorithms in Systems Engineering ISE 172. Lecture 16. Dr. Ted Ralphs
Algorithms in Systems Engineering ISE 172 Lecture 16 Dr. Ted Ralphs ISE 172 Lecture 16 1 References for Today s Lecture Required reading Sections 6.5-6.7 References CLRS Chapter 22 R. Sedgewick, Algorithms
More informationCSE373: Data Structure & Algorithms Lecture 18: Comparison Sorting. Dan Grossman Fall 2013
CSE373: Data Structure & Algorithms Lecture 18: Comparison Sorting Dan Grossman Fall 2013 Introduction to Sorting Stacks, queues, priority queues, and dictionaries all focused on providing one element
More informationData Structures - Binary Trees and Operations on Binary Trees
ata Structures - inary Trees and Operations on inary Trees MS 275 anko drovic (UI) MS 275 October 15, 2018 1 / 25 inary Trees binary tree is a finite set of elements. It can be empty partitioned into three
More informationFigure 4.1: The evolution of a rooted tree.
106 CHAPTER 4. INDUCTION, RECURSION AND RECURRENCES 4.6 Rooted Trees 4.6.1 The idea of a rooted tree We talked about how a tree diagram helps us visualize merge sort or other divide and conquer algorithms.
More informationABOUT THE LARGEST SUBTREE COMMON TO SEVERAL PHYLOGENETIC TREES Alain Guénoche 1, Henri Garreta 2 and Laurent Tichit 3
The XIII International Conference Applied Stochastic Models and Data Analysis (ASMDA-2009) June 30-July 3, 2009, Vilnius, LITHUANIA ISBN 978-9955-28-463-5 L. Sakalauskas, C. Skiadas and E. K. Zavadskas
More informationPhysical Level of Databases: B+-Trees
Physical Level of Databases: B+-Trees Adnan YAZICI Computer Engineering Department METU (Fall 2005) 1 B + -Tree Index Files l Disadvantage of indexed-sequential files: performance degrades as file grows,
More informationB-Trees. Version of October 2, B-Trees Version of October 2, / 22
B-Trees Version of October 2, 2014 B-Trees Version of October 2, 2014 1 / 22 Motivation An AVL tree can be an excellent data structure for implementing dictionary search, insertion and deletion Each operation
More informationMulti-Way Number Partitioning
Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09) Multi-Way Number Partitioning Richard E. Korf Computer Science Department University of California,
More informationCS61B, Spring 2003 Discussion #15 Amir Kamil UC Berkeley 4/28/03
CS61B, Spring 2003 Discussion #15 Amir Kamil UC Berkeley 4/28/03 Topics: Sorting 1 Sorting The topic of sorting really requires no introduction. We start with an unsorted sequence, and want a sorted sequence
More informationCSci 231 Homework 7. Red Black Trees. CLRS Chapter 13 and 14
CSci 31 Homework 7 Red Black Trees CLRS Chapter 13 and 14 1. Problem 13-1 (persistent dynamic sets).. Problem 13-3 (AVL trees) 3. In this problem we consider a data structure for maintaining a multi-set
More informationEvolution of Tandemly Repeated Sequences
University of Canterbury Department of Mathematics and Statistics Evolution of Tandemly Repeated Sequences A thesis submitted in partial fulfilment of the requirements of the Degree for Master of Science
More informationComputing the All-Pairs Quartet Distance on a set of Evolutionary Trees
Journal of Bioinformatics and Computational Biology c Imperial College Press Computing the All-Pairs Quartet Distance on a set of Evolutionary Trees M. Stissing, T. Mailund, C. N. S. Pedersen and G. S.
More informationBinary Search Tree (2A) Young Won Lim 5/17/18
Binary Search Tree (2A) Copyright (c) 2015-2018 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or
More informationCSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting. Aaron Bauer Winter 2014
CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting Aaron Bauer Winter 2014 The main problem, stated carefully For now, assume we have n comparable elements in an array and we want
More information1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1
Asymptotics, Recurrence and Basic Algorithms 1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1 2. O(n) 2. [1 pt] What is the solution to the recurrence T(n) = T(n/2) + n, T(1)
More informationChapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,
Introduction Chapter 5 Hashing hashing performs basic operations, such as insertion, deletion, and finds in average time 2 Hashing a hash table is merely an of some fixed size hashing converts into locations
More information2. Sorting. 2.1 Introduction
2. Sorting 2.1 Introduction Given a set of n objects, it is often necessary to sort them based on some characteristic, be it size, importance, spelling, etc. It is trivial for a human to do this with a
More informationThe SNPR neighbourhood of tree-child networks
Journal of Graph Algorithms and Applications http://jgaa.info/ vol. 22, no. 2, pp. 29 55 (2018) DOI: 10.7155/jgaa.00472 The SNPR neighbourhood of tree-child networks Jonathan Klawitter Department of Computer
More information9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology
Introduction Chapter 4 Trees for large input, even linear access time may be prohibitive we need data structures that exhibit average running times closer to O(log N) binary search tree 2 Terminology recursive
More informationThe worst case complexity of Maximum Parsimony
he worst case complexity of Maximum Parsimony mir armel Noa Musa-Lempel Dekel sur Michal Ziv-Ukelson Ben-urion University June 2, 20 / 2 What s a phylogeny Phylogenies: raph-like structures whose topology
More informationCSci 231 Homework 7. Red Black Trees. CLRS Chapter 13 and 14
CSci 31 Homework 7 Red Black Trees CLRS Chapter 13 and 14 Choose 4 problems from the list below. 1. (CLRS 13.1-6) What is the largest possible number of internal nodes in a red-black tree with black-height
More informationFinal Exam in Algorithms and Data Structures 1 (1DL210)
Final Exam in Algorithms and Data Structures 1 (1DL210) Department of Information Technology Uppsala University February 0th, 2012 Lecturers: Parosh Aziz Abdulla, Jonathan Cederberg and Jari Stenman Location:
More informationWe have used both of the last two claims in previous algorithms and therefore their proof is omitted.
Homework 3 Question 1 The algorithm will be based on the following logic: If an edge ( ) is deleted from the spanning tree, let and be the trees that were created, rooted at and respectively Adding any
More informationarxiv: v3 [cs.ds] 18 Apr 2011
A tight bound on the worst-case number of comparisons for Floyd s heap construction algorithm Ioannis K. Paparrizos School of Computer and Communication Sciences Ècole Polytechnique Fèdèrale de Lausanne
More information/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17
01.433/33 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/2/1.1 Introduction In this lecture we ll talk about a useful abstraction, priority queues, which are
More informationIntroduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree
Chapter 4 Trees 2 Introduction for large input, even access time may be prohibitive we need data structures that exhibit running times closer to O(log N) binary search tree 3 Terminology recursive definition
More information15.4 Longest common subsequence
15.4 Longest common subsequence Biological applications often need to compare the DNA of two (or more) different organisms A strand of DNA consists of a string of molecules called bases, where the possible
More informationIntroduction to Analysis of Algorithms
Introduction to Analysis of Algorithms Analysis of Algorithms To determine how efficient an algorithm is we compute the amount of time that the algorithm needs to solve a problem. Given two algorithms
More informationBinary Trees
Binary Trees 4-7-2005 Opening Discussion What did we talk about last class? Do you have any code to show? Do you have any questions about the assignment? What is a Tree? You are all familiar with what
More informationTrees. Eric McCreath
Trees Eric McCreath 2 Overview In this lecture we will explore: general trees, binary trees, binary search trees, and AVL and B-Trees. 3 Trees Trees are recursive data structures. They are useful for:
More informationFINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard
FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 The data of the problem is of 2GB and the hard disk is of 1GB capacity, to solve this problem we should Use better data structures
More informationHEAPS: IMPLEMENTING EFFICIENT PRIORITY QUEUES
HEAPS: IMPLEMENTING EFFICIENT PRIORITY QUEUES 2 5 6 9 7 Presentation for use with the textbook Data Structures and Algorithms in Java, 6 th edition, by M. T. Goodrich, R. Tamassia, and M. H., Wiley, 2014
More informationConstructing a Cycle Basis for a Planar Graph
Constructing a Cycle Basis for a Planar Graph David Eberly, Geometric Tools, Redmond WA 98052 https://www.geometrictools.com/ This work is licensed under the Creative Commons Attribution 4.0 International
More informationIndexing and Hashing
C H A P T E R 1 Indexing and Hashing Solutions to Practice Exercises 1.1 Reasons for not keeping several search indices include: a. Every index requires additional CPU time and disk I/O overhead during
More informationCOMP : Trees. COMP20012 Trees 219
COMP20012 3: Trees COMP20012 Trees 219 Trees Seen lots of examples. Parse Trees Decision Trees Search Trees Family Trees Hierarchical Structures Management Directories COMP20012 Trees 220 Trees have natural
More informationComputational Geometry
Windowing queries Windowing Windowing queries Zoom in; re-center and zoom in; select by outlining Windowing Windowing queries Windowing Windowing queries Given a set of n axis-parallel line segments, preprocess
More informationIntroduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far
Chapter 5 Hashing 2 Introduction hashing performs basic operations, such as insertion, deletion, and finds in average time better than other ADTs we ve seen so far 3 Hashing a hash table is merely an hashing
More informationAlgorithms for Computing Maximum Agreement Subtrees
Algorithms for Computing Maximum Agreement Subtrees Nikolaj Skipper Rasmussen 20114373 Thomas Hedegaard Lange 20113788 Master s Thesis, Computer Science June 2016 Advisor: Christian Nørgaard Storm Pedersen
More informationCOMP 251 Winter 2017 Online quizzes with answers
COMP 251 Winter 2017 Online quizzes with answers Open Addressing (2) Which of the following assertions are true about open address tables? A. You cannot store more records than the total number of slots
More informationLecture 5. Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs
Lecture 5 Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs Reading: Randomized Search Trees by Aragon & Seidel, Algorithmica 1996, http://sims.berkeley.edu/~aragon/pubs/rst96.pdf;
More informationChapter 12: Indexing and Hashing. Basic Concepts
Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition
More information/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17
601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17 5.1 Introduction You should all know a few ways of sorting in O(n log n)
More informationLinked Structures Songs, Games, Movies Part III. Fall 2013 Carola Wenk
Linked Structures Songs, Games, Movies Part III Fall 2013 Carola Wenk Biological Structures Nature has evolved vascular and nervous systems in a hierarchical manner so that nutrients and signals can quickly
More informationUNIT III BALANCED SEARCH TREES AND INDEXING
UNIT III BALANCED SEARCH TREES AND INDEXING OBJECTIVE The implementation of hash tables is frequently called hashing. Hashing is a technique used for performing insertions, deletions and finds in constant
More informationThroughout the chapter, we will assume that the reader is familiar with the basics of phylogenetic trees.
Chapter 7 SUPERTREE ALGORITHMS FOR NESTED TAXA Philip Daniel and Charles Semple Abstract: Keywords: Most supertree algorithms combine collections of rooted phylogenetic trees with overlapping leaf sets
More informationApplied Mathematics Letters. Graph triangulations and the compatibility of unrooted phylogenetic trees
Applied Mathematics Letters 24 (2011) 719 723 Contents lists available at ScienceDirect Applied Mathematics Letters journal homepage: www.elsevier.com/locate/aml Graph triangulations and the compatibility
More informationAdvanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret
Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Greedy Algorithms (continued) The best known application where the greedy algorithm is optimal is surely
More informationEvolutionary Trees. Fredrik Ronquist. August 29, 2005
Evolutionary Trees Fredrik Ronquist August 29, 2005 1 Evolutionary Trees Tree is an important concept in Graph Theory, Computer Science, Evolutionary Biology, and many other areas. In evolutionary biology,
More informationQuiz 1 Solutions. (a) f(n) = n g(n) = log n Circle all that apply: f = O(g) f = Θ(g) f = Ω(g)
Introduction to Algorithms March 11, 2009 Massachusetts Institute of Technology 6.006 Spring 2009 Professors Sivan Toledo and Alan Edelman Quiz 1 Solutions Problem 1. Quiz 1 Solutions Asymptotic orders
More informationChapter 12: Indexing and Hashing
Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL
More informationCOMP 250 Fall recurrences 2 Oct. 13, 2017
COMP 250 Fall 2017 15 - recurrences 2 Oct. 13, 2017 Here we examine the recurrences for mergesort and quicksort. Mergesort Recall the mergesort algorithm: we divide the list of things to be sorted into
More informationCH 8. HEAPS AND PRIORITY QUEUES
CH 8. HEAPS AND PRIORITY QUEUES ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++, GOODRICH, TAMASSIA AND MOUNT (WILEY 2004) AND SLIDES FROM NANCY
More informationCS301 - Data Structures Glossary By
CS301 - Data Structures Glossary By Abstract Data Type : A set of data values and associated operations that are precisely specified independent of any particular implementation. Also known as ADT Algorithm
More informationDivide-and-Conquer. The most-well known algorithm design strategy: smaller instances. combining these solutions
Divide-and-Conquer The most-well known algorithm design strategy: 1. Divide instance of problem into two or more smaller instances 2. Solve smaller instances recursively 3. Obtain solution to original
More informationSuccessor/Predecessor Rules in Binary Trees
Successor/Predecessor Rules in inary Trees Thomas. nastasio July 7, 2003 Introduction inary tree traversals are commonly made in one of three patterns, inorder, preorder, and postorder. These traversals
More information( ) 1 B. 1. Suppose f x
CSE Name Test Spring Last Digits of Student ID Multiple Choice. Write your answer to the LEFT of each problem. points each is a monotonically increasing function. Which of the following approximates the
More informationCS-301 Data Structure. Tariq Hanif
1. The tree data structure is a Linear data structure Non-linear data structure Graphical data structure Data structure like queue FINALTERM EXAMINATION Spring 2012 CS301- Data Structure 25-07-2012 2.
More informationData Structures and Algorithms Week 4
Data Structures and Algorithms Week. About sorting algorithms. Heapsort Complete binary trees Heap data structure. Quicksort a popular algorithm very fast on average Previous Week Divide and conquer Merge
More informationΛέων-Χαράλαμπος Σταματάρης
Λέων-Χαράλαμπος Σταματάρης INTRODUCTION Two classical problems of information dissemination in computer networks: The broadcasting problem: Distributing a particular message from a distinguished source
More informationCH. 8 PRIORITY QUEUES AND HEAPS
CH. 8 PRIORITY QUEUES AND HEAPS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++, GOODRICH, TAMASSIA AND MOUNT (WILEY 2004) AND SLIDES FROM NANCY
More informationSelection, Bubble, Insertion, Merge, Heap, Quick Bucket, Radix
Spring 2010 Review Topics Big O Notation Heaps Sorting Selection, Bubble, Insertion, Merge, Heap, Quick Bucket, Radix Hashtables Tree Balancing: AVL trees and DSW algorithm Graphs: Basic terminology and
More informationIntroduction to the Analysis of Algorithms. Algorithm
Introduction to the Analysis of Algorithms Based on the notes from David Fernandez-Baca Bryn Mawr College CS206 Intro to Data Structures Algorithm An algorithm is a strategy (well-defined computational
More informationFor searching and sorting algorithms, this is particularly dependent on the number of data elements.
Looking up a phone number, accessing a website and checking the definition of a word in a dictionary all involve searching large amounts of data. Searching algorithms all accomplish the same goal finding
More informationProgramming II (CS300)
1 Programming II (CS300) Chapter 11: Binary Search Trees MOUNA KACEM mouna@cs.wisc.edu Fall 2018 General Overview of Data Structures 2 Introduction to trees 3 Tree: Important non-linear data structure
More informationImproved parameterized complexity of the Maximum Agreement Subtree and Maximum Compatible Tree problems LIRMM, Tech.Rep. num 04026
Improved parameterized complexity of the Maximum Agreement Subtree and Maximum Compatible Tree problems LIRMM, Tech.Rep. num 04026 Vincent Berry, François Nicolas Équipe Méthodes et Algorithmes pour la
More information3 Fractional Ramsey Numbers
27 3 Fractional Ramsey Numbers Since the definition of Ramsey numbers makes use of the clique number of graphs, we may define fractional Ramsey numbers simply by substituting fractional clique number into
More information