Scaling species tree estimation methods to large datasets using NJMerge
|
|
- Spencer Ball
- 5 years ago
- Views:
Transcription
1 Scaling species tree estimation methods to large datasets using NJMerge Erin Molloy and Tandy Warnow {emolloy2, University of Illinois at Urbana Champaign 2018 Phylogenomics Software Symposium August 17, 2018 Molloy & Warnow NJMerge 1/48
2 Large-scale Phylogeny Estimation is computationally intensive A New View of the Tree of Life [Hug et al., 2016] 3,083 genomes 2,596 AA positions from concatenating 16 protein alignments RAxML with 156 bootstrap replicates 3,840 CPU hours on the CIPRES supercomputer Molloy & Warnow NJMerge 2/48
3 NJMerge NJMerge can enable species tree methods (ASTRAL-III, SVDquartets, and RAxML) to analyze large and/or hetergeneous datasets using a single compute node with 16 cores 64 GB of physical memory 48 hours maximum wall-clock time without sacrificing accuracy. Molloy & Warnow NJMerge 3/48
4 Divide-and-Conquer Methods [e.g., Huson et al., 1999; Nelesen et al., 2012] Molloy & Warnow NJMerge 4/48
5 Divide-and-Conquer Methods [e.g., Huson et al., 1999; Nelesen et al., 2012] Molloy & Warnow NJMerge 5/48
6 Divide-and-Conquer Methods [e.g., Huson et al., 1999; Nelesen et al., 2012] Molloy & Warnow NJMerge 6/48
7 Divide-and-Conquer Methods [e.g., Huson et al., 1999; Nelesen et al., 2012] Molloy & Warnow NJMerge 7/48
8 Divide-and-Conquer Methods Subset trees can be estimated using a highly accurate (but computationally intensive) method and then combined using a supertree method. Molloy & Warnow NJMerge 8/48
9 Divide-and-Conquer Methods [e.g., Huson et al., 1999; Nelesen et al., 2012] Molloy & Warnow NJMerge 9/48
10 Divide-and-Conquer Methods [Steel, 1992; Jiang et al., 2001; Bansal et al., 2010] Molloy & Warnow NJMerge 10/48
11 Divide-and-Conquer Methods [e.g., Huson et al., 1999; Nelesen et al., 2012] Molloy & Warnow NJMerge 11/48
12 New Divide-and-Conquer Strategy [e.g., Huson et al., 1999; Nelesen et al., 2012] Molloy & Warnow NJMerge 12/48
13 New Divide-and-Conquer Strategy Subset trees can be estimated using a highly accurate (but computationally intensive) method and then merged together using a fast (but potentially less accurate) method. Molloy & Warnow NJMerge 13/48
14 New Divide-and-Conquer Strategy [e.g., Huson et al., 1999; Nelesen et al., 2012] Molloy & Warnow NJMerge 14/48
15 Merging disjoint trees A D + E H B C F G Merging G H Joining with single edge A D A H F C B C D E F G B E Molloy & Warnow NJMerge 15/48
16 Merging disjoint trees Given a set T of pairwise disjoint trees with combined leaf set S, we seek a compatibility supertree for T that is close to the true tree on S. NJMerge attempts to do this by using a dissimilarity matrix D. Molloy & Warnow NJMerge 16/48
17 NJMerge NJMerge is a polynomial-time extension of Neighbor-Joining [Saitou and Nei, 1987] takes a set of trees (on pairwise disjoint subsets of the leaf set) as input that operate as constraints on the output tree. Input: Output: Dissimilarity matrix D on species set S Set of k unrooted binary trees T = {T 1, T 2,..., T k } on pairwise disjoint subsets of the species set S Unrooted binary tree T on the species set S that agrees with every tree in T Molloy & Warnow NJMerge 17/48
18 NJMerge NOTE: Distance matrix is additive for (((A, B), (C, D)), E, (F, (G, H))); Molloy & Warnow NJMerge 18/48
19 NJMerge NOTE: Distance matrix is additive for (((A, B), (C, D)), E, (F, (G, H))); Molloy & Warnow NJMerge 19/48
20 NJMerge NJ is an agglomerative technique that joins pairs of nodes at each iteration and reduces the number of remaining nodes by one. NJMerge differs from NJ in that it only accepts a siblinghood proposal (x, y) if and only if making x and y siblings does not violate the pairwise compatibility of the set T of constraint trees. Molloy & Warnow NJMerge 20/48
21 NJMerge Determining the compatibility of a set of unrooted phylogenetic trees is NP-complete [Steel, 1992; Warnow, 1994]. Thus, NJMerge uses a polynomial-time heuristic: Update constraint trees based on siblinghood proposal (x, y) Test whether the set of constraint trees that contain leaf (x, y) are compatible in polynomial time by rooting these trees at leaf (x, y) [Aho et al., 1981] Not every constraint tree will contain the leaf (x, y), so this does not a guarantee that the set of k constraint trees is compatible. It is possible for NJMerge to fail. Molloy & Warnow NJMerge 21/48
22 NJMerge Determining the compatibility of a set of unrooted phylogenetic trees is NP-complete [Steel, 1992; Warnow, 1994]. Thus, NJMerge uses a polynomial-time heuristic: Update constraint trees based on siblinghood proposal (x, y) Test whether the set of constraint trees that contain leaf (x, y) are compatible in polynomial time by rooting these trees at leaf (x, y) [Aho et al., 1981] Not every constraint tree will contain the leaf (x, y), so this does not a guarantee that the set of k constraint trees is compatible. It is possible for NJMerge to fail. Molloy & Warnow NJMerge 22/48
23 NJMerge Advantages No supertree estimation Polynomial Time Parallel, e.g., each siblinghood proposal can be evaluated in parallel Disadvantages Can fail to return a tree Molloy & Warnow NJMerge 23/48
24 Species Tree Estimation Molloy & Warnow NJMerge 24/48
25 Using NJMerge in species tree estimation pipeline Molloy & Warnow NJMerge 25/48
26 Using NJMerge in a pipeline for species tree estimation How should the distance matrix be computed? How large should the maximum subset size be? How should subsets be created? Which species tree method should be used on subsets? Molloy & Warnow NJMerge 26/48
27 Simulated Datasets Dataset Size 100 species; 25, 100, 1000 genes 1000 species; 1000 genes Gene (alignment) length varied from 300 to 1500 bp Level of Incomplete Lineage Sorting (ILS) Moderate: 8-10% AD Very high: 68-69% AD where AD is the Average (RF) Distance between the true species tree and the tree gene trees. Sequence Type Exons: lower phylogenetic signal (not shown) Introns: greater phylogenetic signal Molloy & Warnow NJMerge 27/48
28 How does error in the estimated distance matrix impact NJMerge? 100 taxa, Moderate ILS 100 taxa, Very High ILS 0.30 Species Tree Error Number of Genes NJst NJMerge+True NOTE: NJMerge failed to return a tree on 1 dataset with 100 taxa, 25 genes, and very high ILS; average is on datasets for which both methods completed. Molloy & Warnow NJMerge 28/48
29 Evaluation Subset trees are estimated using ASTRAL-III, SVDquartets, and unpartitioned concatenation using RAxML. Metrics Robinson-Foulds (RF) distance between the true and the estimated species trees Running time All methods limited to 48 hours wall-clock time one compute node with 16 cores 64 GB of physical memory Molloy & Warnow NJMerge 29/48
30 Running time for pipeline using NJMerge T = ( T1 Method + + Tk Method ) + T NJMerge where k = number of subsets T Method i = time to compute tree on subset i for given method T NJMerge = time to run NJMerge Molloy & Warnow NJMerge 30/48
31 ASTRAL [Mirarab et al., 2014; Mirarab and Warnow, 2015; Zhang et al., 2018] Gene tree summary method Finds optimal solution to search problem within a constrained search space Computationally intensive when Constrained search space is large, e.g., large datasets with high levels of gene tree heterogeneity due to ILS or gene tree estimation error Molloy & Warnow NJMerge 31/48
32 ASTRAL-III and NJMerge Species Tree Error species, 1000 introns 1000 species, 1000 introns Moderate Very High Moderate Very High ASTRAL Level of ILS NJMerge+ASTRAL NOTE: ASTRAL completed on 16/20 datasets with 1000 taxa and very high ILS; NJMerge+ASTRAL completed on 20/20 of these datasets; box plots show all datasets on which at least one method completed. Molloy & Warnow NJMerge 32/48
33 ASTRAL-III and NJMerge 100 species, 1000 introns 1000 species, 1000 introns Running Time (m) Moderate Very High Moderate Very High ASTRAL Level of ILS NJMerge+ASTRAL NOTE: ASTRAL completed on 16/20 datasets with 1000 taxa and very high ILS; NJMerge+ASTRAL completed on 20/20 of these datasets; bar graphs show all datasets on which at least one method completed. Molloy & Warnow NJMerge 33/48
34 SVDquartets [Chifman and Kubatko, 2014] Site-based coalescent method Use the concatenated alignment to estimate quartet trees and then combine the quartet trees Computationally intensive when Number of species is large Molloy & Warnow NJMerge 34/48
35 SVDquartets and NJMerge Species Tree Error species, 1000 introns 1000 species, 1000 introns Moderate Very High Moderate Very High SVDquartets Level of ILS NJMerge+SVDquartets NOTE: SVDquartets ran on 0/40 datasets with 1000 taxa due to segmentation faults; NJMerge+SVDquartets completed on 40/40 of these datasets; box plots show all datasets on which at least one method completed. Molloy & Warnow NJMerge 35/48
36 SVDquartets and NJMerge Running Time (m) species, 1000 introns 1000 species, 1000 introns Moderate Very High Moderate Very High SVDquartets Level of ILS NJMerge+SVDquartets NOTE: SVDquartets ran on 0/40 datasets with 1000 taxa due to segmentation faults; NJMerge+SVDquartets completed on 40/40 of these datasets; bar graphs show all datasets on which at least one method completed. Molloy & Warnow NJMerge 36/48
37 Unpartitioned concatenation using RAxML [Stamatakis, 2014] ML tree estimation on unpartitioned concatenated alignment Not statistically consistent under MSC model Computationally intensive when Number of species is large Number of unique site patterns in alignment is large, i.e., cannot effectively compress the alignment Molloy & Warnow NJMerge 37/48
38 RAxML and NJMerge Species Tree Error species, 1000 introns 1000 species, 1000 introns Moderate Very High Moderate Very High RAxML Level of ILS NJMerge+RAxML NOTE: RAxML completed on 1/40 datasets with 1000 taxa due to Out of Memory errors; NJMerge+RAxML completed on 40/40 of these datasets; box plots show all datasets on which at least one method completed. Molloy & Warnow NJMerge 38/48
39 RAxML and NJMerge 100 species, 1000 introns 1000 species, 1000 introns Running Time (m) Moderate Very High Moderate Very High RAxML Level of ILS NJMerge+RAxML NOTE: RAxML completed on 1/40 datasets with 1000 taxa due to Out of Memory errors; NJMerge+RAxML completed on 40/40 of these datasets; bar graphs show all datasets on which at least one method completed. Molloy & Warnow NJMerge 39/48
40 Summary NJMerge is a generic technique for scaling phylogeny estimation methods to large and/or hetergeneous datasets. In our experiments, NJMerge improved running time without sacrificing accuracy enabled all three methods to run on large and/or heterogeneous when constrained by memory (64GB) and running time (48 hours) had a low failure rate (< 1%) Molloy & Warnow NJMerge 40/48
41 Which species tree method do I use on datasets with 1000 taxa and 1000 genes? Species Tree Error Moderate ILS Very High ILS Method NJMerge+ASTRAL NJMerge+RAxML NJMerge+SVDquartets NJst It depends on the dataset! Molloy & Warnow NJMerge 41/48
42 Which species tree method do I use on datasets with 1000 taxa and 1000 genes? Species Tree Error Moderate ILS Very High ILS Method NJMerge+ASTRAL NJMerge+RAxML NJMerge+SVDquartets NJst It depends on the dataset! Molloy & Warnow NJMerge 42/48
43 Conclusions NJMerge enables species tree methods (ASTRAL-III, SVDquartets, and RAxML) to analyze large and/or hetergeneous datasets using a single compute node with 16 cores 64 GB of physical memory 48 hours maximum wall-clock time without sacrificing accuracy. How to best run NJMerge depends on the dataset! Several algorithmic choices are left to the discretion of the user. Molloy & Warnow NJMerge 43/48
44 Funding This work was supported by the National Science Foundation Graduate Research Fellowship Program (DGE ) Blue Waters Sustained-Petascale Computing Project (OCI and ACI ) Graph-Theoretic Algorithms to Improve Phylogenomic Analyses (CCF ) and the Ira & Debra Cohen Graduate Fellowship in Computer Science. Molloy & Warnow NJMerge 44/48
45 Use NJMerge! Paper to appear in proceedings of RECOMB-CG Available on Github: github.com/ekmolloy/njmerge Molloy & Warnow NJMerge 45/48
46 References Hug et al. (2016). A new view of the tree of life. Nature Microbiology 1: Huson et al. (1999). Disk-covering, a fast-converging method for phylogenetic tree reconstruction. Journal of Computational Biology, 6(3-4): Nelesen et al. (2012). DACTAL: Divide-And-Conquer Trees (almost) without Alignments. Bioinformatics 28(12):i274 i282. Steel. (1992). The complexity of reconstructing trees from qualitative characters and subtrees. Journal of Classification 9: Jiang et al. (2001). A polynomial time approximation scheme for inferring evolutationay trees from quartet topologies and its application. SIAM Journal on Computing 30(6): Molloy & Warnow NJMerge 46/48
47 References Bansal et al. (2010). Robinson-Foulds Supertrees. Algorithms for Molecular Biology 5(1). Saitou and Nei. (1987). The neighbor-joining method: a new method for reconstruction of phylogenetic trees. Molecular Biology and Evolution 4: Warnow (1994). Tree compatibility and inferring evolutionary history. Journal of Algorithms, 16(3): Agho et al. (1981). Inferring a Tree from Lowest Common Ancestors with an Application to the Optimization of Relational Expressions. SIAM Journal on Computing, 10(3): Mirarab et al. (2014) ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics, 30(17):i541 i548. Molloy & Warnow NJMerge 47/48
48 References Mirarab and Warnow. (2015). ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics, 31(12):i44 i52. Zhang et al. (2017). ASTRAL-III: Increased Scalability and Impacts of Contract- ing Low Support Branches. In Meidanis and Nakhleh, editors, Comparative Genomics. RECOMB- CG Lecture Notes in Computer Science, volume Springer, Cham. Chifman and Kubatko. (2014). Quartet Inference from SNP Data Under the Coalescent Model. Bioinformatics, 30(23): Stamatakis. (2014). RAxML Version 8: A tool for Phylogenetic Analysis and Post-Analysis of Large Phylogenies. Bioinformatics, 30(9). Molloy & Warnow NJMerge 48/48
Designing parallel algorithms for constructing large phylogenetic trees on Blue Waters
Designing parallel algorithms for constructing large phylogenetic trees on Blue Waters Erin Molloy University of Illinois at Urbana Champaign General Allocation (PI: Tandy Warnow) Exploratory Allocation
More informationDynamic Programming for Phylogenetic Estimation
1 / 45 Dynamic Programming for Phylogenetic Estimation CS598AGB Pranjal Vachaspati University of Illinois at Urbana-Champaign 2 / 45 Coalescent-based Species Tree Estimation Find evolutionary tree for
More informationSupplementary Material, corresponding to the manuscript Accumulated Coalescence Rank and Excess Gene count for Species Tree Inference
Supplementary Material, corresponding to the manuscript Accumulated Coalescence Rank and Excess Gene count for Species Tree Inference Sourya Bhattacharyya and Jayanta Mukherjee Department of Computer Science
More informationApplied Mathematics Letters. Graph triangulations and the compatibility of unrooted phylogenetic trees
Applied Mathematics Letters 24 (2011) 719 723 Contents lists available at ScienceDirect Applied Mathematics Letters journal homepage: www.elsevier.com/locate/aml Graph triangulations and the compatibility
More informationIntroduction to Triangulated Graphs. Tandy Warnow
Introduction to Triangulated Graphs Tandy Warnow Topics for today Triangulated graphs: theorems and algorithms (Chapters 11.3 and 11.9) Examples of triangulated graphs in phylogeny estimation (Chapters
More informationEfficient Quartet Representations of Trees and Applications to Supertree and Summary Methods
1 Efficient Quartet Representations of Trees and Applications to Supertree and Summary Methods Ruth Davidson, MaLyn Lawhorn, Joseph Rusinko*, and Noah Weber arxiv:1512.05302v3 [q-bio.pe] 6 Dec 2016 Abstract
More informationDIMACS Tutorial on Phylogenetic Trees and Rapidly Evolving Pathogens. Katherine St. John City University of New York 1
DIMACS Tutorial on Phylogenetic Trees and Rapidly Evolving Pathogens Katherine St. John City University of New York 1 Thanks to the DIMACS Staff Linda Casals Walter Morris Nicole Clark Katherine St. John
More informationDistance based tree reconstruction. Hierarchical clustering (UPGMA) Neighbor-Joining (NJ)
Distance based tree reconstruction Hierarchical clustering (UPGMA) Neighbor-Joining (NJ) All organisms have evolved from a common ancestor. Infer the evolutionary tree (tree topology and edge lengths)
More informationCS 581. Tandy Warnow
CS 581 Tandy Warnow This week Maximum parsimony: solving it on small datasets Maximum Likelihood optimization problem Felsenstein s pruning algorithm Bayesian MCMC methods Research opportunities Maximum
More informationOlivier Gascuel Arbres formels et Arbre de la Vie Conférence ENS Cachan, septembre Arbres formels et Arbre de la Vie.
Arbres formels et Arbre de la Vie Olivier Gascuel Centre National de la Recherche Scientifique LIRMM, Montpellier, France www.lirmm.fr/gascuel 10 permanent researchers 2 technical staff 3 postdocs, 10
More informationA Lookahead Branch-and-Bound Algorithm for the Maximum Quartet Consistency Problem
A Lookahead Branch-and-Bound Algorithm for the Maximum Quartet Consistency Problem Gang Wu Jia-Huai You Guohui Lin January 17, 2005 Abstract A lookahead branch-and-bound algorithm is proposed for solving
More informationWhat is a phylogenetic tree? Algorithms for Computational Biology. Phylogenetics Summary. Di erent types of phylogenetic trees
What is a phylogenetic tree? Algorithms for Computational Biology Zsuzsanna Lipták speciation events Masters in Molecular and Medical Biotechnology a.a. 25/6, fall term Phylogenetics Summary wolf cat lion
More informationEvolutionary tree reconstruction (Chapter 10)
Evolutionary tree reconstruction (Chapter 10) Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships between species since Darwin till early
More informationIntroduction to Trees
Introduction to Trees Tandy Warnow December 28, 2016 Introduction to Trees Tandy Warnow Clades of a rooted tree Every node v in a leaf-labelled rooted tree defines a subset of the leafset that is below
More informationComparison of commonly used methods for combining multiple phylogenetic data sets
Comparison of commonly used methods for combining multiple phylogenetic data sets Anne Kupczok, Heiko A. Schmidt and Arndt von Haeseler Center for Integrative Bioinformatics Vienna Max F. Perutz Laboratories
More informationSequence length requirements. Tandy Warnow Department of Computer Science The University of Texas at Austin
Sequence length requirements Tandy Warnow Department of Computer Science The University of Texas at Austin Part 1: Absolute Fast Convergence DNA Sequence Evolution AAGGCCT AAGACTT TGGACTT -3 mil yrs -2
More informationQuartet Inference from SNP Data Under the Coalescent Model
Quartet Inference from SNP Data Under the Coalescent Model Julia Chifman and Laura Kubatko By Shashank Yaduvanshi EsDmaDng Species Tree from Gene Sequences Input: Alignments from muldple genes Output:
More informationIntroduction to Computational Phylogenetics
Introduction to Computational Phylogenetics Tandy Warnow The University of Texas at Austin No Institute Given This textbook is a draft, and should not be distributed. Much of what is in this textbook appeared
More informationMolecular Evolution & Phylogenetics Complexity of the search space, distance matrix methods, maximum parsimony
Molecular Evolution & Phylogenetics Complexity of the search space, distance matrix methods, maximum parsimony Basic Bioinformatics Workshop, ILRI Addis Ababa, 12 December 2017 Learning Objectives understand
More informationPhylogenetics on CUDA (Parallel) Architectures Bradly Alicea
Descent w/modification Descent w/modification Descent w/modification Descent w/modification CPU Descent w/modification Descent w/modification Phylogenetics on CUDA (Parallel) Architectures Bradly Alicea
More informationSPR-BASED TREE RECONCILIATION: NON-BINARY TREES AND MULTIPLE SOLUTIONS
1 SPR-BASED TREE RECONCILIATION: NON-BINARY TREES AND MULTIPLE SOLUTIONS C. THAN and L. NAKHLEH Department of Computer Science Rice University 6100 Main Street, MS 132 Houston, TX 77005, USA Email: {cvthan,nakhleh}@cs.rice.edu
More informationAlgorithms for Bioinformatics
Adapted from slides by Leena Salmena and Veli Mäkinen, which are partly from http: //bix.ucsd.edu/bioalgorithms/slides.php. 582670 Algorithms for Bioinformatics Lecture 6: Distance based clustering and
More informationParallel Implementation of a Quartet-Based Algorithm for Phylogenetic Analysis
Parallel Implementation of a Quartet-Based Algorithm for Phylogenetic Analysis B. B. Zhou 1, D. Chu 1, M. Tarawneh 1, P. Wang 1, C. Wang 1, A. Y. Zomaya 1, and R. P. Brent 2 1 School of Information Technologies
More informationINFERRING OPTIMAL SPECIES TREES UNDER GENE DUPLICATION AND LOSS
INFERRING OPTIMAL SPECIES TREES UNDER GENE DUPLICATION AND LOSS M. S. BAYZID, S. MIRARAB and T. WARNOW Department of Computer Science, The University of Texas at Austin, Austin, Texas 78712, USA E-mail:
More informationA practical O(n log 2 n) time algorithm for computing the triplet distance on binary trees
A practical O(n log 2 n) time algorithm for computing the triplet distance on binary trees Andreas Sand 1,2, Gerth Stølting Brodal 2,3, Rolf Fagerberg 4, Christian N. S. Pedersen 1,2 and Thomas Mailund
More informationABOUT THE LARGEST SUBTREE COMMON TO SEVERAL PHYLOGENETIC TREES Alain Guénoche 1, Henri Garreta 2 and Laurent Tichit 3
The XIII International Conference Applied Stochastic Models and Data Analysis (ASMDA-2009) June 30-July 3, 2009, Vilnius, LITHUANIA ISBN 978-9955-28-463-5 L. Sakalauskas, C. Skiadas and E. K. Zavadskas
More informationDistance-based Phylogenetic Methods Near a Polytomy
Distance-based Phylogenetic Methods Near a Polytomy Ruth Davidson and Seth Sullivant NCSU UIUC May 21, 2014 2 Phylogenetic trees model the common evolutionary history of a group of species Leaves = extant
More informationRecent Research Results. Evolutionary Trees Distance Methods
Recent Research Results Evolutionary Trees Distance Methods Indo-European Languages After Tandy Warnow What is the purpose? Understand evolutionary history (relationship between species). Uderstand how
More informationParallelizing SuperFine
Parallelizing SuperFine Diogo Telmo Neves ESTGF - IPP and Universidade do Minho Portugal dtn@ices.utexas.edu Tandy Warnow Dept. of Computer Science The Univ. of Texas at Austin Austin, TX 78712 tandy@cs.utexas.edu
More informationAlignment of Trees and Directed Acyclic Graphs
Alignment of Trees and Directed Acyclic Graphs Gabriel Valiente Algorithms, Bioinformatics, Complexity and Formal Methods Research Group Technical University of Catalonia Computational Biology and Bioinformatics
More informationThe Performance of Phylogenetic Methods on Trees of Bounded Diameter
The Performance of Phylogenetic Methods on Trees of Bounded Diameter Luay Nakhleh 1, Usman Roshan 1, Katherine St. John 1 2, Jerry Sun 1, and Tandy Warnow 1 3 1 Department of Computer Sciences, University
More informationof the Balanced Minimum Evolution Polytope Ruriko Yoshida
Optimality of the Neighbor Joining Algorithm and Faces of the Balanced Minimum Evolution Polytope Ruriko Yoshida Figure 19.1 Genomes 3 ( Garland Science 2007) Origins of Species Tree (or web) of life eukarya
More informationPhylogenetics. Introduction to Bioinformatics Dortmund, Lectures: Sven Rahmann. Exercises: Udo Feldkamp, Michael Wurst
Phylogenetics Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1 Phylogenetics phylum = tree phylogenetics: reconstruction of evolutionary
More informationLecture 20: Clustering and Evolution
Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 11/11/2014 Comp 555 Bioalgorithms (Fall 2014) 1 Clique Graphs A clique is a graph where every vertex is connected via an edge to every other
More informationSequence clustering. Introduction. Clustering basics. Hierarchical clustering
Sequence clustering Introduction Data clustering is one of the key tools used in various incarnations of data-mining - trying to make sense of large datasets. It is, thus, natural to ask whether clustering
More informationLecture 20: Clustering and Evolution
Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 11/12/2013 Comp 465 Fall 2013 1 Clique Graphs A clique is a graph where every vertex is connected via an edge to every other vertex A clique
More informationSEEING THE TREES AND THEIR BRANCHES IN THE NETWORK IS HARD
1 SEEING THE TREES AND THEIR BRANCHES IN THE NETWORK IS HARD I A KANJ School of Computer Science, Telecommunications, and Information Systems, DePaul University, Chicago, IL 60604-2301, USA E-mail: ikanj@csdepauledu
More informationRapid Neighbour-Joining
Rapid Neighbour-Joining Martin Simonsen, Thomas Mailund and Christian N. S. Pedersen Bioinformatics Research Center (BIRC), University of Aarhus, C. F. Møllers Allé, Building 1110, DK-8000 Århus C, Denmark.
More informationAnswer Set Programming or Hypercleaning: Where does the Magic Lie in Solving Maximum Quartet Consistency?
Answer Set Programming or Hypercleaning: Where does the Magic Lie in Solving Maximum Quartet Consistency? Fathiyeh Faghih and Daniel G. Brown David R. Cheriton School of Computer Science, University of
More informationCodon models. In reality we use codon model Amino acid substitution rates meet nucleotide models Codon(nucleotide triplet)
Phylogeny Codon models Last lecture: poor man s way of calculating dn/ds (Ka/Ks) Tabulate synonymous/non- synonymous substitutions Normalize by the possibilities Transform to genetic distance K JC or K
More informationCLUSTERING IN BIOINFORMATICS
CLUSTERING IN BIOINFORMATICS CSE/BIMM/BENG 8 MAY 4, 0 OVERVIEW Define the clustering problem Motivation: gene expression and microarrays Types of clustering Clustering algorithms Other applications of
More informationSeeing the wood for the trees: Analysing multiple alternative phylogenies
Seeing the wood for the trees: Analysing multiple alternative phylogenies Tom M. W. Nye, Newcastle University tom.nye@ncl.ac.uk Isaac Newton Institute, 17 December 2007 Multiple alternative phylogenies
More informationStudy of a Simple Pruning Strategy with Days Algorithm
Study of a Simple Pruning Strategy with ays Algorithm Thomas G. Kristensen Abstract We wish to calculate all pairwise Robinson Foulds distances in a set of trees. Traditional algorithms for doing this
More informationBasics of Multiple Sequence Alignment
Basics of Multiple Sequence Alignment Tandy Warnow February 10, 2018 Basics of Multiple Sequence Alignment Tandy Warnow Basic issues What is a multiple sequence alignment? Evolutionary processes operating
More information4/4/16 Comp 555 Spring
4/4/16 Comp 555 Spring 2016 1 A clique is a graph where every vertex is connected via an edge to every other vertex A clique graph is a graph where each connected component is a clique The concept of clustering
More informationGenetics/MBT 541 Spring, 2002 Lecture 1 Joe Felsenstein Department of Genome Sciences Phylogeny methods, part 1 (Parsimony and such)
Genetics/MBT 541 Spring, 2002 Lecture 1 Joe Felsenstein Department of Genome Sciences joe@gs Phylogeny methods, part 1 (Parsimony and such) Methods of reconstructing phylogenies (evolutionary trees) Parsimony
More informationFastJoin, an improved neighbor-joining algorithm
Methodology FastJoin, an improved neighbor-joining algorithm J. Wang, M.-Z. Guo and L.L. Xing School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, P.R. China
More informationCSE 549: Computational Biology
CSE 549: Computational Biology Phylogenomics 1 slides marked with * by Carl Kingsford Tree of Life 2 * H5N1 Influenza Strains Salzberg, Kingsford, et al., 2007 3 * H5N1 Influenza Strains The 2007 outbreak
More information11/17/2009 Comp 590/Comp Fall
Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 Problem Set #5 will be available tonight 11/17/2009 Comp 590/Comp 790-90 Fall 2009 1 Clique Graphs A clique is a graph with every vertex connected
More informationReconstruction of Large Phylogenetic Trees: A Parallel Approach
Reconstruction of Large Phylogenetic Trees: A Parallel Approach Zhihua Du and Feng Lin BioInformatics Research Centre, Nanyang Technological University, Nanyang Avenue, Singapore 639798 Usman W. Roshan
More informationSupplementary Online Material PASTA: ultra-large multiple sequence alignment
Supplementary Online Material PASTA: ultra-large multiple sequence alignment Siavash Mirarab, Nam Nguyen, and Tandy Warnow University of Texas at Austin - Department of Computer Science {smirarab,bayzid,tandy}@cs.utexas.edu
More informationReconciliation Problems for Duplication, Loss and Horizontal Gene Transfer Pawel Górecki. Presented by Connor Magill November 20, 2008
Reconciliation Problems for Duplication, Loss and Horizontal Gene Transfer Pawel Górecki Presented by Connor Magill November 20, 2008 Introduction Problem: Relationships between species cannot always be
More informationML phylogenetic inference and GARLI. Derrick Zwickl. University of Arizona (and University of Kansas) Workshop on Molecular Evolution 2015
ML phylogenetic inference and GARLI Derrick Zwickl University of Arizona (and University of Kansas) Workshop on Molecular Evolution 2015 Outline Heuristics and tree searches ML phylogeny inference and
More informationReconstructing Reticulate Evolution in Species Theory and Practice
Reconstructing Reticulate Evolution in Species Theory and Practice Luay Nakhleh Department of Computer Science Rice University Houston, Texas 77005 nakhleh@cs.rice.edu Tandy Warnow Department of Computer
More informationThroughout the chapter, we will assume that the reader is familiar with the basics of phylogenetic trees.
Chapter 7 SUPERTREE ALGORITHMS FOR NESTED TAXA Philip Daniel and Charles Semple Abstract: Keywords: Most supertree algorithms combine collections of rooted phylogenetic trees with overlapping leaf sets
More informationLecture: Bioinformatics
Lecture: Bioinformatics ENS Sacley, 2018 Some slides graciously provided by Daniel Huson & Celine Scornavacca Phylogenetic Trees - Motivation 2 / 31 2 / 31 Phylogenetic Trees - Motivation Motivation -
More informationRapid Neighbour-Joining
Rapid Neighbour-Joining Martin Simonsen, Thomas Mailund, and Christian N.S. Pedersen Bioinformatics Research Center (BIRC), University of Aarhus, C. F. Møllers Allé, Building 1110, DK-8000 Århus C, Denmark
More informationEVOLUTIONARY DISTANCES INFERRING PHYLOGENIES
EVOLUTIONARY DISTANCES INFERRING PHYLOGENIES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 28 th November 2007 OUTLINE 1 INFERRING
More informationLeast Common Ancestor Based Method for Efficiently Constructing Rooted Supertrees
Least ommon ncestor ased Method for fficiently onstructing Rooted Supertrees M.. Hai Zahid, nkush Mittal, R.. Joshi epartment of lectronics and omputer ngineering, IIT-Roorkee Roorkee, Uttaranchal, INI
More informationOn the Optimality of the Neighbor Joining Algorithm
On the Optimality of the Neighbor Joining Algorithm Ruriko Yoshida Dept. of Statistics University of Kentucky Joint work with K. Eickmeyer, P. Huggins, and L. Pachter www.ms.uky.edu/ ruriko Louisville
More informationPASTA: Ultra-Large Multiple Sequence Alignment
PASTA: Ultra-Large Multiple Sequence Alignment Siavash Mirarab, Nam Nguyen, and Tandy Warnow University of Texas at Austin - Department of Computer Science 2317 Speedway, Stop D9500 Austin, TX 78712 {smirarab,namphuon,tandy}@cs.utexas.edu
More informationChordal Graphs and Evolutionary Trees. Tandy Warnow
Chordal Graphs and Evolutionary Trees Tandy Warnow Possible Indo-European tree (Ringe, Warnow and Taylor 2000) Anatolian Vedic Iranian Greek Italic Celtic Tocharian Armenian Germanic Baltic Slavic Albanian
More informationPRec-I-DCM3: A Parallel Framework for Fast and Accurate Large Scale Phylogeny Reconstruction
PRec-I-DCM3: A Parallel Framework for Fast and Accurate Large Scale Phylogeny Reconstruction Cristian Coarfa Yuri Dotsenko John Mellor-Crummey Luay Nakhleh Usman Roshan Abstract Phylogenetic trees play
More informationA New Approach For Tree Alignment Based on Local Re-Optimization
A New Approach For Tree Alignment Based on Local Re-Optimization Feng Yue and Jijun Tang Department of Computer Science and Engineering University of South Carolina Columbia, SC 29063, USA yuef, jtang
More informationMultiPhyl: a high-throughput phylogenomics webserver using distributed computing
Nucleic Acids Research, 2007, Vol. 35, Web Server issue W33 W37 doi:10.1093/nar/gkm359 MultiPhyl: a high-throughput phylogenomics webserver using distributed computing Thomas M. Keane 1,2, *, Thomas J.
More information9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology
9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example
More informationINFERENCE OF PARSIMONIOUS SPECIES TREES FROM MULTI-LOCUS DATA BY MINIMIZING DEEP COALESCENCES CUONG THAN AND LUAY NAKHLEH
INFERENCE OF PARSIMONIOUS SPECIES TREES FROM MULTI-LOCUS DATA BY MINIMIZING DEEP COALESCENCES CUONG THAN AND LUAY NAKHLEH Abstract. One approach for inferring a species tree from a given multi-locus data
More informationClustering of Proteins
Melroy Saldanha saldanha@stanford.edu CS 273 Project Report Clustering of Proteins Introduction Numerous genome-sequencing projects have led to a huge growth in the size of protein databases. Manual annotation
More informationMEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 forbiggerdatasets
MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 forbiggerdatasets Sudhir Kumar, 1,2,3 Glen Stecher 1 and Koichiro Tamura*,4,5 1 Institute for Genomics and Evolutionary Medicine, Temple University
More informationThe worst case complexity of Maximum Parsimony
he worst case complexity of Maximum Parsimony mir armel Noa Musa-Lempel Dekel sur Michal Ziv-Ukelson Ben-urion University June 2, 20 / 2 What s a phylogeny Phylogenies: raph-like structures whose topology
More informationParsimony-Based Approaches to Inferring Phylogenetic Trees
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 www.biostat.wisc.edu/bmi576.html Mark Craven craven@biostat.wisc.edu Fall 0 Phylogenetic tree approaches! three general types! distance:
More informationHeterotachy models in BayesPhylogenies
Heterotachy models in is a general software package for inferring phylogenetic trees using Bayesian Markov Chain Monte Carlo (MCMC) methods. The program allows a range of models of gene sequence evolution,
More informationHybrid Parallelization of the MrBayes & RAxML Phylogenetics Codes
Hybrid Parallelization of the MrBayes & RAxML Phylogenetics Codes Wayne Pfeiffer (SDSC/UCSD) & Alexandros Stamatakis (TUM) February 25, 2010 What was done? Why is it important? Who cares? Hybrid MPI/OpenMP
More informationCLC Phylogeny Module User manual
CLC Phylogeny Module User manual User manual for Phylogeny Module 1.0 Windows, Mac OS X and Linux September 13, 2013 This software is for research purposes only. CLC bio Silkeborgvej 2 Prismet DK-8000
More informationAlgorithms for MDC-Based Multi-locus Phylogeny Inference
Algorithms for MDC-Based Multi-locus Phylogeny Inference Yun Yu 1, Tandy Warnow 2, and Luay Nakhleh 1 1 Dept. of Computer Science, Rice University, 61 Main Street, Houston, TX 775, USA {yy9,nakhleh}@cs.rice.edu
More informationFast and accurate branch lengths estimation for phylogenomic trees
Binet et al. BMC Bioinformatics (2016) 17:23 DOI 10.1186/s12859-015-0821-8 RESEARCH ARTICLE Open Access Fast and accurate branch lengths estimation for phylogenomic trees Manuel Binet 1,2,3, Olivier Gascuel
More informationComputing the All-Pairs Quartet Distance on a set of Evolutionary Trees
Journal of Bioinformatics and Computational Biology c Imperial College Press Computing the All-Pairs Quartet Distance on a set of Evolutionary Trees M. Stissing, T. Mailund, C. N. S. Pedersen and G. S.
More informationA RANDOMIZED ALGORITHM FOR COMPARING SETS OF PHYLOGENETIC TREES
A RANDOMIZED ALGORITHM FOR COMPARING SETS OF PHYLOGENETIC TREES SEUNG-JIN SUL AND TIFFANI L. WILLIAMS Department of Computer Science Texas A&M University College Station, TX 77843-3112 USA E-mail: {sulsj,tlw}@cs.tamu.edu
More informationA New Algorithm for the Reconstruction of Near-Perfect Binary Phylogenetic Trees
A New Algorithm for the Reconstruction of Near-Perfect Binary Phylogenetic Trees Kedar Dhamdhere, Srinath Sridhar, Guy E. Blelloch, Eran Halperin R. Ravi and Russell Schwartz March 17, 2005 CMU-CS-05-119
More informationAUTOMATED PLAUSIBILITY ANALYSIS OF LARGE PHYLOGENIES
CHAPTER 1 AUTOMATED PLAUSIBILITY ANALYSIS OF LARGE PHYLOGENIES David Dao 1, Tomáš Flouri 2, Alexandros Stamatakis 1,2 1 KarlsruheInstituteofTechnology,InstituteforTheoreticalInformatics,Postfach 6980,
More informationFast Structural Search in Phylogenetic Databases
ORIGINAL RESEARCH Fast Structural Search in Phylogenetic Databases Jason T. L. Wang 1, Huiyuan Shan 2, Dennis Shasha 3 and William H. Piel 4 1 Department of Computer Science, New Jersey Institute of Technology,
More informationAn Experimental Analysis of Robinson-Foulds Distance Matrix Algorithms
An Experimental Analysis of Robinson-Foulds Distance Matrix Algorithms Seung-Jin Sul and Tiffani L. Williams Department of Computer Science Texas A&M University College Station, TX 77843-3 {sulsj,tlw}@cs.tamu.edu
More informationLecture 5: Multiple sequence alignment
Lecture 5: Multiple sequence alignment Introduction to Computational Biology Teresa Przytycka, PhD (with some additions by Martin Vingron) Why do we need multiple sequence alignment Pairwise sequence alignment
More information1 High-Performance Phylogeny Reconstruction Under Maximum Parsimony. David A. Bader, Bernard M.E. Moret, Tiffani L. Williams and Mi Yan
Contents 1 High-Performance Phylogeny Reconstruction Under Maximum Parsimony 1 David A. Bader, Bernard M.E. Moret, Tiffani L. Williams and Mi Yan 1.1 Introduction 1 1.2 Maximum Parsimony 7 1.3 Exact MP:
More informationA Randomized Algorithm for Comparing Sets of Phylogenetic Trees
A Randomized Algorithm for Comparing Sets of Phylogenetic Trees Seung-Jin Sul and Tiffani L. Williams Department of Computer Science Texas A&M University E-mail: {sulsj,tlw}@cs.tamu.edu Technical Report
More informationA Fast Algorithm for Optimal Alignment between Similar Ordered Trees
Fundamenta Informaticae 56 (2003) 105 120 105 IOS Press A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Jesper Jansson Department of Computer Science Lund University, Box 118 SE-221
More informationReconsidering the Performance of Cooperative Rec-I-DCM3
Reconsidering the Performance of Cooperative Rec-I-DCM3 Tiffani L. Williams Department of Computer Science Texas A&M University College Station, TX 77843 Email: tlw@cs.tamu.edu Marc L. Smith Computer Science
More informationhuman chimp mouse rat
Michael rudno These notes are based on earlier notes by Tomas abak Phylogenetic Trees Phylogenetic Trees demonstrate the amoun of evolution, and order of divergence for several genomes. Phylogenetic trees
More informationTracy Heath Workshop on Molecular Evolution, Woods Hole USA
INTRODUCTION TO BAYESIAN PHYLOGENETIC SOFTWARE Tracy Heath Integrative Biology, University of California, Berkeley Ecology & Evolutionary Biology, University of Kansas 2013 Workshop on Molecular Evolution,
More informationA Memetic Algorithm for Phylogenetic Reconstruction with Maximum Parsimony
A Memetic Algorithm for Phylogenetic Reconstruction with Maximum Parsimony Jean-Michel Richer 1 and Adrien Goëffon 2 and Jin-Kao Hao 1 1 University of Angers, LERIA, 2 Bd Lavoisier, 49045 Anger Cedex 01,
More informationPhylogenetic Trees Lecture 12. Section 7.4, in Durbin et al., 6.5 in Setubal et al. Shlomo Moran, Ilan Gronau
Phylogenetic Trees Lecture 12 Section 7.4, in Durbin et al., 6.5 in Setubal et al. Shlomo Moran, Ilan Gronau. Maximum Parsimony. Last week we presented Fitch algorithm for (unweighted) Maximum Parsimony:
More informationThe CIPRES Science Gateway: Enabling High-Impact Science for Phylogenetics Researchers with Limited Resources
The CIPRES Science Gateway: Enabling High-Impact Science for Phylogenetics Researchers with Limited Resources Mark Miller, Wayne Pfeiffer, and Terri Schwartz San Diego Supercomputer Center Phylogenetics
More informationLab 07: Maximum Likelihood Model Selection and RAxML Using CIPRES
Integrative Biology 200, Spring 2014 Principles of Phylogenetics: Systematics University of California, Berkeley Updated by Traci L. Grzymala Lab 07: Maximum Likelihood Model Selection and RAxML Using
More informationGeneration of distancebased phylogenetic trees
primer for practical phylogenetic data gathering. Uconn EEB3899-007. Spring 2015 Session 12 Generation of distancebased phylogenetic trees Rafael Medina (rafael.medina.bry@gmail.com) Yang Liu (yang.liu@uconn.edu)
More informationFixed Parameter Tractability of Binary Near-Perfect Phylogenetic Tree Reconstruction
Fixed Parameter Tractability of Binary Near-Perfect Phylogenetic Tree Reconstruction Guy E. Blelloch, Kedar Dhamdhere, Eran Halperin, R. Ravi, Russell Schwartz and Srinath Sridhar Abstract. We consider
More informationAlgorithms for constructing more accurate and inclusive phylogenetic trees
Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 2013 Algorithms for constructing more accurate and inclusive phylogenetic trees Ruchi Chaudhary Iowa State University
More informationParsimony Least squares Minimum evolution Balanced minimum evolution Maximum likelihood (later in the course)
Tree Searching We ve discussed how we rank trees Parsimony Least squares Minimum evolution alanced minimum evolution Maximum likelihood (later in the course) So we have ways of deciding what a good tree
More informationEvolution of Tandemly Repeated Sequences
University of Canterbury Department of Mathematics and Statistics Evolution of Tandemly Repeated Sequences A thesis submitted in partial fulfilment of the requirements of the Degree for Master of Science
More informationA high-throughput bioinformatics distributed computing platform
A high-throughput bioinformatics distributed computing platform Thomas M. Keane 1, Andrew J. Page 2, James O. McInerney 1, and Thomas J. Naughton 2 1 Bioinformatics and Pharmacogenomics Laboratory, National
More informationImprovement of Distance-Based Phylogenetic Methods by a Local Maximum Likelihood Approach Using Triplets
Improvement of Distance-Based Phylogenetic Methods by a Local Maximum Likelihood Approach Using Triplets Vincent Ranwez and Olivier Gascuel Département Informatique Fondamentale et Applications, LIRMM,
More information