Scaling species tree estimation methods to large datasets using NJMerge

Size: px
Start display at page:

Download "Scaling species tree estimation methods to large datasets using NJMerge"

Transcription

1 Scaling species tree estimation methods to large datasets using NJMerge Erin Molloy and Tandy Warnow {emolloy2, University of Illinois at Urbana Champaign 2018 Phylogenomics Software Symposium August 17, 2018 Molloy & Warnow NJMerge 1/48

2 Large-scale Phylogeny Estimation is computationally intensive A New View of the Tree of Life [Hug et al., 2016] 3,083 genomes 2,596 AA positions from concatenating 16 protein alignments RAxML with 156 bootstrap replicates 3,840 CPU hours on the CIPRES supercomputer Molloy & Warnow NJMerge 2/48

3 NJMerge NJMerge can enable species tree methods (ASTRAL-III, SVDquartets, and RAxML) to analyze large and/or hetergeneous datasets using a single compute node with 16 cores 64 GB of physical memory 48 hours maximum wall-clock time without sacrificing accuracy. Molloy & Warnow NJMerge 3/48

4 Divide-and-Conquer Methods [e.g., Huson et al., 1999; Nelesen et al., 2012] Molloy & Warnow NJMerge 4/48

5 Divide-and-Conquer Methods [e.g., Huson et al., 1999; Nelesen et al., 2012] Molloy & Warnow NJMerge 5/48

6 Divide-and-Conquer Methods [e.g., Huson et al., 1999; Nelesen et al., 2012] Molloy & Warnow NJMerge 6/48

7 Divide-and-Conquer Methods [e.g., Huson et al., 1999; Nelesen et al., 2012] Molloy & Warnow NJMerge 7/48

8 Divide-and-Conquer Methods Subset trees can be estimated using a highly accurate (but computationally intensive) method and then combined using a supertree method. Molloy & Warnow NJMerge 8/48

9 Divide-and-Conquer Methods [e.g., Huson et al., 1999; Nelesen et al., 2012] Molloy & Warnow NJMerge 9/48

10 Divide-and-Conquer Methods [Steel, 1992; Jiang et al., 2001; Bansal et al., 2010] Molloy & Warnow NJMerge 10/48

11 Divide-and-Conquer Methods [e.g., Huson et al., 1999; Nelesen et al., 2012] Molloy & Warnow NJMerge 11/48

12 New Divide-and-Conquer Strategy [e.g., Huson et al., 1999; Nelesen et al., 2012] Molloy & Warnow NJMerge 12/48

13 New Divide-and-Conquer Strategy Subset trees can be estimated using a highly accurate (but computationally intensive) method and then merged together using a fast (but potentially less accurate) method. Molloy & Warnow NJMerge 13/48

14 New Divide-and-Conquer Strategy [e.g., Huson et al., 1999; Nelesen et al., 2012] Molloy & Warnow NJMerge 14/48

15 Merging disjoint trees A D + E H B C F G Merging G H Joining with single edge A D A H F C B C D E F G B E Molloy & Warnow NJMerge 15/48

16 Merging disjoint trees Given a set T of pairwise disjoint trees with combined leaf set S, we seek a compatibility supertree for T that is close to the true tree on S. NJMerge attempts to do this by using a dissimilarity matrix D. Molloy & Warnow NJMerge 16/48

17 NJMerge NJMerge is a polynomial-time extension of Neighbor-Joining [Saitou and Nei, 1987] takes a set of trees (on pairwise disjoint subsets of the leaf set) as input that operate as constraints on the output tree. Input: Output: Dissimilarity matrix D on species set S Set of k unrooted binary trees T = {T 1, T 2,..., T k } on pairwise disjoint subsets of the species set S Unrooted binary tree T on the species set S that agrees with every tree in T Molloy & Warnow NJMerge 17/48

18 NJMerge NOTE: Distance matrix is additive for (((A, B), (C, D)), E, (F, (G, H))); Molloy & Warnow NJMerge 18/48

19 NJMerge NOTE: Distance matrix is additive for (((A, B), (C, D)), E, (F, (G, H))); Molloy & Warnow NJMerge 19/48

20 NJMerge NJ is an agglomerative technique that joins pairs of nodes at each iteration and reduces the number of remaining nodes by one. NJMerge differs from NJ in that it only accepts a siblinghood proposal (x, y) if and only if making x and y siblings does not violate the pairwise compatibility of the set T of constraint trees. Molloy & Warnow NJMerge 20/48

21 NJMerge Determining the compatibility of a set of unrooted phylogenetic trees is NP-complete [Steel, 1992; Warnow, 1994]. Thus, NJMerge uses a polynomial-time heuristic: Update constraint trees based on siblinghood proposal (x, y) Test whether the set of constraint trees that contain leaf (x, y) are compatible in polynomial time by rooting these trees at leaf (x, y) [Aho et al., 1981] Not every constraint tree will contain the leaf (x, y), so this does not a guarantee that the set of k constraint trees is compatible. It is possible for NJMerge to fail. Molloy & Warnow NJMerge 21/48

22 NJMerge Determining the compatibility of a set of unrooted phylogenetic trees is NP-complete [Steel, 1992; Warnow, 1994]. Thus, NJMerge uses a polynomial-time heuristic: Update constraint trees based on siblinghood proposal (x, y) Test whether the set of constraint trees that contain leaf (x, y) are compatible in polynomial time by rooting these trees at leaf (x, y) [Aho et al., 1981] Not every constraint tree will contain the leaf (x, y), so this does not a guarantee that the set of k constraint trees is compatible. It is possible for NJMerge to fail. Molloy & Warnow NJMerge 22/48

23 NJMerge Advantages No supertree estimation Polynomial Time Parallel, e.g., each siblinghood proposal can be evaluated in parallel Disadvantages Can fail to return a tree Molloy & Warnow NJMerge 23/48

24 Species Tree Estimation Molloy & Warnow NJMerge 24/48

25 Using NJMerge in species tree estimation pipeline Molloy & Warnow NJMerge 25/48

26 Using NJMerge in a pipeline for species tree estimation How should the distance matrix be computed? How large should the maximum subset size be? How should subsets be created? Which species tree method should be used on subsets? Molloy & Warnow NJMerge 26/48

27 Simulated Datasets Dataset Size 100 species; 25, 100, 1000 genes 1000 species; 1000 genes Gene (alignment) length varied from 300 to 1500 bp Level of Incomplete Lineage Sorting (ILS) Moderate: 8-10% AD Very high: 68-69% AD where AD is the Average (RF) Distance between the true species tree and the tree gene trees. Sequence Type Exons: lower phylogenetic signal (not shown) Introns: greater phylogenetic signal Molloy & Warnow NJMerge 27/48

28 How does error in the estimated distance matrix impact NJMerge? 100 taxa, Moderate ILS 100 taxa, Very High ILS 0.30 Species Tree Error Number of Genes NJst NJMerge+True NOTE: NJMerge failed to return a tree on 1 dataset with 100 taxa, 25 genes, and very high ILS; average is on datasets for which both methods completed. Molloy & Warnow NJMerge 28/48

29 Evaluation Subset trees are estimated using ASTRAL-III, SVDquartets, and unpartitioned concatenation using RAxML. Metrics Robinson-Foulds (RF) distance between the true and the estimated species trees Running time All methods limited to 48 hours wall-clock time one compute node with 16 cores 64 GB of physical memory Molloy & Warnow NJMerge 29/48

30 Running time for pipeline using NJMerge T = ( T1 Method + + Tk Method ) + T NJMerge where k = number of subsets T Method i = time to compute tree on subset i for given method T NJMerge = time to run NJMerge Molloy & Warnow NJMerge 30/48

31 ASTRAL [Mirarab et al., 2014; Mirarab and Warnow, 2015; Zhang et al., 2018] Gene tree summary method Finds optimal solution to search problem within a constrained search space Computationally intensive when Constrained search space is large, e.g., large datasets with high levels of gene tree heterogeneity due to ILS or gene tree estimation error Molloy & Warnow NJMerge 31/48

32 ASTRAL-III and NJMerge Species Tree Error species, 1000 introns 1000 species, 1000 introns Moderate Very High Moderate Very High ASTRAL Level of ILS NJMerge+ASTRAL NOTE: ASTRAL completed on 16/20 datasets with 1000 taxa and very high ILS; NJMerge+ASTRAL completed on 20/20 of these datasets; box plots show all datasets on which at least one method completed. Molloy & Warnow NJMerge 32/48

33 ASTRAL-III and NJMerge 100 species, 1000 introns 1000 species, 1000 introns Running Time (m) Moderate Very High Moderate Very High ASTRAL Level of ILS NJMerge+ASTRAL NOTE: ASTRAL completed on 16/20 datasets with 1000 taxa and very high ILS; NJMerge+ASTRAL completed on 20/20 of these datasets; bar graphs show all datasets on which at least one method completed. Molloy & Warnow NJMerge 33/48

34 SVDquartets [Chifman and Kubatko, 2014] Site-based coalescent method Use the concatenated alignment to estimate quartet trees and then combine the quartet trees Computationally intensive when Number of species is large Molloy & Warnow NJMerge 34/48

35 SVDquartets and NJMerge Species Tree Error species, 1000 introns 1000 species, 1000 introns Moderate Very High Moderate Very High SVDquartets Level of ILS NJMerge+SVDquartets NOTE: SVDquartets ran on 0/40 datasets with 1000 taxa due to segmentation faults; NJMerge+SVDquartets completed on 40/40 of these datasets; box plots show all datasets on which at least one method completed. Molloy & Warnow NJMerge 35/48

36 SVDquartets and NJMerge Running Time (m) species, 1000 introns 1000 species, 1000 introns Moderate Very High Moderate Very High SVDquartets Level of ILS NJMerge+SVDquartets NOTE: SVDquartets ran on 0/40 datasets with 1000 taxa due to segmentation faults; NJMerge+SVDquartets completed on 40/40 of these datasets; bar graphs show all datasets on which at least one method completed. Molloy & Warnow NJMerge 36/48

37 Unpartitioned concatenation using RAxML [Stamatakis, 2014] ML tree estimation on unpartitioned concatenated alignment Not statistically consistent under MSC model Computationally intensive when Number of species is large Number of unique site patterns in alignment is large, i.e., cannot effectively compress the alignment Molloy & Warnow NJMerge 37/48

38 RAxML and NJMerge Species Tree Error species, 1000 introns 1000 species, 1000 introns Moderate Very High Moderate Very High RAxML Level of ILS NJMerge+RAxML NOTE: RAxML completed on 1/40 datasets with 1000 taxa due to Out of Memory errors; NJMerge+RAxML completed on 40/40 of these datasets; box plots show all datasets on which at least one method completed. Molloy & Warnow NJMerge 38/48

39 RAxML and NJMerge 100 species, 1000 introns 1000 species, 1000 introns Running Time (m) Moderate Very High Moderate Very High RAxML Level of ILS NJMerge+RAxML NOTE: RAxML completed on 1/40 datasets with 1000 taxa due to Out of Memory errors; NJMerge+RAxML completed on 40/40 of these datasets; bar graphs show all datasets on which at least one method completed. Molloy & Warnow NJMerge 39/48

40 Summary NJMerge is a generic technique for scaling phylogeny estimation methods to large and/or hetergeneous datasets. In our experiments, NJMerge improved running time without sacrificing accuracy enabled all three methods to run on large and/or heterogeneous when constrained by memory (64GB) and running time (48 hours) had a low failure rate (< 1%) Molloy & Warnow NJMerge 40/48

41 Which species tree method do I use on datasets with 1000 taxa and 1000 genes? Species Tree Error Moderate ILS Very High ILS Method NJMerge+ASTRAL NJMerge+RAxML NJMerge+SVDquartets NJst It depends on the dataset! Molloy & Warnow NJMerge 41/48

42 Which species tree method do I use on datasets with 1000 taxa and 1000 genes? Species Tree Error Moderate ILS Very High ILS Method NJMerge+ASTRAL NJMerge+RAxML NJMerge+SVDquartets NJst It depends on the dataset! Molloy & Warnow NJMerge 42/48

43 Conclusions NJMerge enables species tree methods (ASTRAL-III, SVDquartets, and RAxML) to analyze large and/or hetergeneous datasets using a single compute node with 16 cores 64 GB of physical memory 48 hours maximum wall-clock time without sacrificing accuracy. How to best run NJMerge depends on the dataset! Several algorithmic choices are left to the discretion of the user. Molloy & Warnow NJMerge 43/48

44 Funding This work was supported by the National Science Foundation Graduate Research Fellowship Program (DGE ) Blue Waters Sustained-Petascale Computing Project (OCI and ACI ) Graph-Theoretic Algorithms to Improve Phylogenomic Analyses (CCF ) and the Ira & Debra Cohen Graduate Fellowship in Computer Science. Molloy & Warnow NJMerge 44/48

45 Use NJMerge! Paper to appear in proceedings of RECOMB-CG Available on Github: github.com/ekmolloy/njmerge Molloy & Warnow NJMerge 45/48

46 References Hug et al. (2016). A new view of the tree of life. Nature Microbiology 1: Huson et al. (1999). Disk-covering, a fast-converging method for phylogenetic tree reconstruction. Journal of Computational Biology, 6(3-4): Nelesen et al. (2012). DACTAL: Divide-And-Conquer Trees (almost) without Alignments. Bioinformatics 28(12):i274 i282. Steel. (1992). The complexity of reconstructing trees from qualitative characters and subtrees. Journal of Classification 9: Jiang et al. (2001). A polynomial time approximation scheme for inferring evolutationay trees from quartet topologies and its application. SIAM Journal on Computing 30(6): Molloy & Warnow NJMerge 46/48

47 References Bansal et al. (2010). Robinson-Foulds Supertrees. Algorithms for Molecular Biology 5(1). Saitou and Nei. (1987). The neighbor-joining method: a new method for reconstruction of phylogenetic trees. Molecular Biology and Evolution 4: Warnow (1994). Tree compatibility and inferring evolutionary history. Journal of Algorithms, 16(3): Agho et al. (1981). Inferring a Tree from Lowest Common Ancestors with an Application to the Optimization of Relational Expressions. SIAM Journal on Computing, 10(3): Mirarab et al. (2014) ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics, 30(17):i541 i548. Molloy & Warnow NJMerge 47/48

48 References Mirarab and Warnow. (2015). ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics, 31(12):i44 i52. Zhang et al. (2017). ASTRAL-III: Increased Scalability and Impacts of Contract- ing Low Support Branches. In Meidanis and Nakhleh, editors, Comparative Genomics. RECOMB- CG Lecture Notes in Computer Science, volume Springer, Cham. Chifman and Kubatko. (2014). Quartet Inference from SNP Data Under the Coalescent Model. Bioinformatics, 30(23): Stamatakis. (2014). RAxML Version 8: A tool for Phylogenetic Analysis and Post-Analysis of Large Phylogenies. Bioinformatics, 30(9). Molloy & Warnow NJMerge 48/48

Designing parallel algorithms for constructing large phylogenetic trees on Blue Waters

Designing parallel algorithms for constructing large phylogenetic trees on Blue Waters Designing parallel algorithms for constructing large phylogenetic trees on Blue Waters Erin Molloy University of Illinois at Urbana Champaign General Allocation (PI: Tandy Warnow) Exploratory Allocation

More information

Dynamic Programming for Phylogenetic Estimation

Dynamic Programming for Phylogenetic Estimation 1 / 45 Dynamic Programming for Phylogenetic Estimation CS598AGB Pranjal Vachaspati University of Illinois at Urbana-Champaign 2 / 45 Coalescent-based Species Tree Estimation Find evolutionary tree for

More information

Supplementary Material, corresponding to the manuscript Accumulated Coalescence Rank and Excess Gene count for Species Tree Inference

Supplementary Material, corresponding to the manuscript Accumulated Coalescence Rank and Excess Gene count for Species Tree Inference Supplementary Material, corresponding to the manuscript Accumulated Coalescence Rank and Excess Gene count for Species Tree Inference Sourya Bhattacharyya and Jayanta Mukherjee Department of Computer Science

More information

Applied Mathematics Letters. Graph triangulations and the compatibility of unrooted phylogenetic trees

Applied Mathematics Letters. Graph triangulations and the compatibility of unrooted phylogenetic trees Applied Mathematics Letters 24 (2011) 719 723 Contents lists available at ScienceDirect Applied Mathematics Letters journal homepage: www.elsevier.com/locate/aml Graph triangulations and the compatibility

More information

Introduction to Triangulated Graphs. Tandy Warnow

Introduction to Triangulated Graphs. Tandy Warnow Introduction to Triangulated Graphs Tandy Warnow Topics for today Triangulated graphs: theorems and algorithms (Chapters 11.3 and 11.9) Examples of triangulated graphs in phylogeny estimation (Chapters

More information

Efficient Quartet Representations of Trees and Applications to Supertree and Summary Methods

Efficient Quartet Representations of Trees and Applications to Supertree and Summary Methods 1 Efficient Quartet Representations of Trees and Applications to Supertree and Summary Methods Ruth Davidson, MaLyn Lawhorn, Joseph Rusinko*, and Noah Weber arxiv:1512.05302v3 [q-bio.pe] 6 Dec 2016 Abstract

More information

DIMACS Tutorial on Phylogenetic Trees and Rapidly Evolving Pathogens. Katherine St. John City University of New York 1

DIMACS Tutorial on Phylogenetic Trees and Rapidly Evolving Pathogens. Katherine St. John City University of New York 1 DIMACS Tutorial on Phylogenetic Trees and Rapidly Evolving Pathogens Katherine St. John City University of New York 1 Thanks to the DIMACS Staff Linda Casals Walter Morris Nicole Clark Katherine St. John

More information

Distance based tree reconstruction. Hierarchical clustering (UPGMA) Neighbor-Joining (NJ)

Distance based tree reconstruction. Hierarchical clustering (UPGMA) Neighbor-Joining (NJ) Distance based tree reconstruction Hierarchical clustering (UPGMA) Neighbor-Joining (NJ) All organisms have evolved from a common ancestor. Infer the evolutionary tree (tree topology and edge lengths)

More information

CS 581. Tandy Warnow

CS 581. Tandy Warnow CS 581 Tandy Warnow This week Maximum parsimony: solving it on small datasets Maximum Likelihood optimization problem Felsenstein s pruning algorithm Bayesian MCMC methods Research opportunities Maximum

More information

Olivier Gascuel Arbres formels et Arbre de la Vie Conférence ENS Cachan, septembre Arbres formels et Arbre de la Vie.

Olivier Gascuel Arbres formels et Arbre de la Vie Conférence ENS Cachan, septembre Arbres formels et Arbre de la Vie. Arbres formels et Arbre de la Vie Olivier Gascuel Centre National de la Recherche Scientifique LIRMM, Montpellier, France www.lirmm.fr/gascuel 10 permanent researchers 2 technical staff 3 postdocs, 10

More information

A Lookahead Branch-and-Bound Algorithm for the Maximum Quartet Consistency Problem

A Lookahead Branch-and-Bound Algorithm for the Maximum Quartet Consistency Problem A Lookahead Branch-and-Bound Algorithm for the Maximum Quartet Consistency Problem Gang Wu Jia-Huai You Guohui Lin January 17, 2005 Abstract A lookahead branch-and-bound algorithm is proposed for solving

More information

What is a phylogenetic tree? Algorithms for Computational Biology. Phylogenetics Summary. Di erent types of phylogenetic trees

What is a phylogenetic tree? Algorithms for Computational Biology. Phylogenetics Summary. Di erent types of phylogenetic trees What is a phylogenetic tree? Algorithms for Computational Biology Zsuzsanna Lipták speciation events Masters in Molecular and Medical Biotechnology a.a. 25/6, fall term Phylogenetics Summary wolf cat lion

More information

Evolutionary tree reconstruction (Chapter 10)

Evolutionary tree reconstruction (Chapter 10) Evolutionary tree reconstruction (Chapter 10) Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships between species since Darwin till early

More information

Introduction to Trees

Introduction to Trees Introduction to Trees Tandy Warnow December 28, 2016 Introduction to Trees Tandy Warnow Clades of a rooted tree Every node v in a leaf-labelled rooted tree defines a subset of the leafset that is below

More information

Comparison of commonly used methods for combining multiple phylogenetic data sets

Comparison of commonly used methods for combining multiple phylogenetic data sets Comparison of commonly used methods for combining multiple phylogenetic data sets Anne Kupczok, Heiko A. Schmidt and Arndt von Haeseler Center for Integrative Bioinformatics Vienna Max F. Perutz Laboratories

More information

Sequence length requirements. Tandy Warnow Department of Computer Science The University of Texas at Austin

Sequence length requirements. Tandy Warnow Department of Computer Science The University of Texas at Austin Sequence length requirements Tandy Warnow Department of Computer Science The University of Texas at Austin Part 1: Absolute Fast Convergence DNA Sequence Evolution AAGGCCT AAGACTT TGGACTT -3 mil yrs -2

More information

Quartet Inference from SNP Data Under the Coalescent Model

Quartet Inference from SNP Data Under the Coalescent Model Quartet Inference from SNP Data Under the Coalescent Model Julia Chifman and Laura Kubatko By Shashank Yaduvanshi EsDmaDng Species Tree from Gene Sequences Input: Alignments from muldple genes Output:

More information

Introduction to Computational Phylogenetics

Introduction to Computational Phylogenetics Introduction to Computational Phylogenetics Tandy Warnow The University of Texas at Austin No Institute Given This textbook is a draft, and should not be distributed. Much of what is in this textbook appeared

More information

Molecular Evolution & Phylogenetics Complexity of the search space, distance matrix methods, maximum parsimony

Molecular Evolution & Phylogenetics Complexity of the search space, distance matrix methods, maximum parsimony Molecular Evolution & Phylogenetics Complexity of the search space, distance matrix methods, maximum parsimony Basic Bioinformatics Workshop, ILRI Addis Ababa, 12 December 2017 Learning Objectives understand

More information

Phylogenetics on CUDA (Parallel) Architectures Bradly Alicea

Phylogenetics on CUDA (Parallel) Architectures Bradly Alicea Descent w/modification Descent w/modification Descent w/modification Descent w/modification CPU Descent w/modification Descent w/modification Phylogenetics on CUDA (Parallel) Architectures Bradly Alicea

More information

SPR-BASED TREE RECONCILIATION: NON-BINARY TREES AND MULTIPLE SOLUTIONS

SPR-BASED TREE RECONCILIATION: NON-BINARY TREES AND MULTIPLE SOLUTIONS 1 SPR-BASED TREE RECONCILIATION: NON-BINARY TREES AND MULTIPLE SOLUTIONS C. THAN and L. NAKHLEH Department of Computer Science Rice University 6100 Main Street, MS 132 Houston, TX 77005, USA Email: {cvthan,nakhleh}@cs.rice.edu

More information

Algorithms for Bioinformatics

Algorithms for Bioinformatics Adapted from slides by Leena Salmena and Veli Mäkinen, which are partly from http: //bix.ucsd.edu/bioalgorithms/slides.php. 582670 Algorithms for Bioinformatics Lecture 6: Distance based clustering and

More information

Parallel Implementation of a Quartet-Based Algorithm for Phylogenetic Analysis

Parallel Implementation of a Quartet-Based Algorithm for Phylogenetic Analysis Parallel Implementation of a Quartet-Based Algorithm for Phylogenetic Analysis B. B. Zhou 1, D. Chu 1, M. Tarawneh 1, P. Wang 1, C. Wang 1, A. Y. Zomaya 1, and R. P. Brent 2 1 School of Information Technologies

More information

INFERRING OPTIMAL SPECIES TREES UNDER GENE DUPLICATION AND LOSS

INFERRING OPTIMAL SPECIES TREES UNDER GENE DUPLICATION AND LOSS INFERRING OPTIMAL SPECIES TREES UNDER GENE DUPLICATION AND LOSS M. S. BAYZID, S. MIRARAB and T. WARNOW Department of Computer Science, The University of Texas at Austin, Austin, Texas 78712, USA E-mail:

More information

A practical O(n log 2 n) time algorithm for computing the triplet distance on binary trees

A practical O(n log 2 n) time algorithm for computing the triplet distance on binary trees A practical O(n log 2 n) time algorithm for computing the triplet distance on binary trees Andreas Sand 1,2, Gerth Stølting Brodal 2,3, Rolf Fagerberg 4, Christian N. S. Pedersen 1,2 and Thomas Mailund

More information

ABOUT THE LARGEST SUBTREE COMMON TO SEVERAL PHYLOGENETIC TREES Alain Guénoche 1, Henri Garreta 2 and Laurent Tichit 3

ABOUT THE LARGEST SUBTREE COMMON TO SEVERAL PHYLOGENETIC TREES Alain Guénoche 1, Henri Garreta 2 and Laurent Tichit 3 The XIII International Conference Applied Stochastic Models and Data Analysis (ASMDA-2009) June 30-July 3, 2009, Vilnius, LITHUANIA ISBN 978-9955-28-463-5 L. Sakalauskas, C. Skiadas and E. K. Zavadskas

More information

Distance-based Phylogenetic Methods Near a Polytomy

Distance-based Phylogenetic Methods Near a Polytomy Distance-based Phylogenetic Methods Near a Polytomy Ruth Davidson and Seth Sullivant NCSU UIUC May 21, 2014 2 Phylogenetic trees model the common evolutionary history of a group of species Leaves = extant

More information

Recent Research Results. Evolutionary Trees Distance Methods

Recent Research Results. Evolutionary Trees Distance Methods Recent Research Results Evolutionary Trees Distance Methods Indo-European Languages After Tandy Warnow What is the purpose? Understand evolutionary history (relationship between species). Uderstand how

More information

Parallelizing SuperFine

Parallelizing SuperFine Parallelizing SuperFine Diogo Telmo Neves ESTGF - IPP and Universidade do Minho Portugal dtn@ices.utexas.edu Tandy Warnow Dept. of Computer Science The Univ. of Texas at Austin Austin, TX 78712 tandy@cs.utexas.edu

More information

Alignment of Trees and Directed Acyclic Graphs

Alignment of Trees and Directed Acyclic Graphs Alignment of Trees and Directed Acyclic Graphs Gabriel Valiente Algorithms, Bioinformatics, Complexity and Formal Methods Research Group Technical University of Catalonia Computational Biology and Bioinformatics

More information

The Performance of Phylogenetic Methods on Trees of Bounded Diameter

The Performance of Phylogenetic Methods on Trees of Bounded Diameter The Performance of Phylogenetic Methods on Trees of Bounded Diameter Luay Nakhleh 1, Usman Roshan 1, Katherine St. John 1 2, Jerry Sun 1, and Tandy Warnow 1 3 1 Department of Computer Sciences, University

More information

of the Balanced Minimum Evolution Polytope Ruriko Yoshida

of the Balanced Minimum Evolution Polytope Ruriko Yoshida Optimality of the Neighbor Joining Algorithm and Faces of the Balanced Minimum Evolution Polytope Ruriko Yoshida Figure 19.1 Genomes 3 ( Garland Science 2007) Origins of Species Tree (or web) of life eukarya

More information

Phylogenetics. Introduction to Bioinformatics Dortmund, Lectures: Sven Rahmann. Exercises: Udo Feldkamp, Michael Wurst

Phylogenetics. Introduction to Bioinformatics Dortmund, Lectures: Sven Rahmann. Exercises: Udo Feldkamp, Michael Wurst Phylogenetics Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1 Phylogenetics phylum = tree phylogenetics: reconstruction of evolutionary

More information

Lecture 20: Clustering and Evolution

Lecture 20: Clustering and Evolution Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 11/11/2014 Comp 555 Bioalgorithms (Fall 2014) 1 Clique Graphs A clique is a graph where every vertex is connected via an edge to every other

More information

Sequence clustering. Introduction. Clustering basics. Hierarchical clustering

Sequence clustering. Introduction. Clustering basics. Hierarchical clustering Sequence clustering Introduction Data clustering is one of the key tools used in various incarnations of data-mining - trying to make sense of large datasets. It is, thus, natural to ask whether clustering

More information

Lecture 20: Clustering and Evolution

Lecture 20: Clustering and Evolution Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 11/12/2013 Comp 465 Fall 2013 1 Clique Graphs A clique is a graph where every vertex is connected via an edge to every other vertex A clique

More information

SEEING THE TREES AND THEIR BRANCHES IN THE NETWORK IS HARD

SEEING THE TREES AND THEIR BRANCHES IN THE NETWORK IS HARD 1 SEEING THE TREES AND THEIR BRANCHES IN THE NETWORK IS HARD I A KANJ School of Computer Science, Telecommunications, and Information Systems, DePaul University, Chicago, IL 60604-2301, USA E-mail: ikanj@csdepauledu

More information

Rapid Neighbour-Joining

Rapid Neighbour-Joining Rapid Neighbour-Joining Martin Simonsen, Thomas Mailund and Christian N. S. Pedersen Bioinformatics Research Center (BIRC), University of Aarhus, C. F. Møllers Allé, Building 1110, DK-8000 Århus C, Denmark.

More information

Answer Set Programming or Hypercleaning: Where does the Magic Lie in Solving Maximum Quartet Consistency?

Answer Set Programming or Hypercleaning: Where does the Magic Lie in Solving Maximum Quartet Consistency? Answer Set Programming or Hypercleaning: Where does the Magic Lie in Solving Maximum Quartet Consistency? Fathiyeh Faghih and Daniel G. Brown David R. Cheriton School of Computer Science, University of

More information

Codon models. In reality we use codon model Amino acid substitution rates meet nucleotide models Codon(nucleotide triplet)

Codon models. In reality we use codon model Amino acid substitution rates meet nucleotide models Codon(nucleotide triplet) Phylogeny Codon models Last lecture: poor man s way of calculating dn/ds (Ka/Ks) Tabulate synonymous/non- synonymous substitutions Normalize by the possibilities Transform to genetic distance K JC or K

More information

CLUSTERING IN BIOINFORMATICS

CLUSTERING IN BIOINFORMATICS CLUSTERING IN BIOINFORMATICS CSE/BIMM/BENG 8 MAY 4, 0 OVERVIEW Define the clustering problem Motivation: gene expression and microarrays Types of clustering Clustering algorithms Other applications of

More information

Seeing the wood for the trees: Analysing multiple alternative phylogenies

Seeing the wood for the trees: Analysing multiple alternative phylogenies Seeing the wood for the trees: Analysing multiple alternative phylogenies Tom M. W. Nye, Newcastle University tom.nye@ncl.ac.uk Isaac Newton Institute, 17 December 2007 Multiple alternative phylogenies

More information

Study of a Simple Pruning Strategy with Days Algorithm

Study of a Simple Pruning Strategy with Days Algorithm Study of a Simple Pruning Strategy with ays Algorithm Thomas G. Kristensen Abstract We wish to calculate all pairwise Robinson Foulds distances in a set of trees. Traditional algorithms for doing this

More information

Basics of Multiple Sequence Alignment

Basics of Multiple Sequence Alignment Basics of Multiple Sequence Alignment Tandy Warnow February 10, 2018 Basics of Multiple Sequence Alignment Tandy Warnow Basic issues What is a multiple sequence alignment? Evolutionary processes operating

More information

4/4/16 Comp 555 Spring

4/4/16 Comp 555 Spring 4/4/16 Comp 555 Spring 2016 1 A clique is a graph where every vertex is connected via an edge to every other vertex A clique graph is a graph where each connected component is a clique The concept of clustering

More information

Genetics/MBT 541 Spring, 2002 Lecture 1 Joe Felsenstein Department of Genome Sciences Phylogeny methods, part 1 (Parsimony and such)

Genetics/MBT 541 Spring, 2002 Lecture 1 Joe Felsenstein Department of Genome Sciences Phylogeny methods, part 1 (Parsimony and such) Genetics/MBT 541 Spring, 2002 Lecture 1 Joe Felsenstein Department of Genome Sciences joe@gs Phylogeny methods, part 1 (Parsimony and such) Methods of reconstructing phylogenies (evolutionary trees) Parsimony

More information

FastJoin, an improved neighbor-joining algorithm

FastJoin, an improved neighbor-joining algorithm Methodology FastJoin, an improved neighbor-joining algorithm J. Wang, M.-Z. Guo and L.L. Xing School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, P.R. China

More information

CSE 549: Computational Biology

CSE 549: Computational Biology CSE 549: Computational Biology Phylogenomics 1 slides marked with * by Carl Kingsford Tree of Life 2 * H5N1 Influenza Strains Salzberg, Kingsford, et al., 2007 3 * H5N1 Influenza Strains The 2007 outbreak

More information

11/17/2009 Comp 590/Comp Fall

11/17/2009 Comp 590/Comp Fall Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 Problem Set #5 will be available tonight 11/17/2009 Comp 590/Comp 790-90 Fall 2009 1 Clique Graphs A clique is a graph with every vertex connected

More information

Reconstruction of Large Phylogenetic Trees: A Parallel Approach

Reconstruction of Large Phylogenetic Trees: A Parallel Approach Reconstruction of Large Phylogenetic Trees: A Parallel Approach Zhihua Du and Feng Lin BioInformatics Research Centre, Nanyang Technological University, Nanyang Avenue, Singapore 639798 Usman W. Roshan

More information

Supplementary Online Material PASTA: ultra-large multiple sequence alignment

Supplementary Online Material PASTA: ultra-large multiple sequence alignment Supplementary Online Material PASTA: ultra-large multiple sequence alignment Siavash Mirarab, Nam Nguyen, and Tandy Warnow University of Texas at Austin - Department of Computer Science {smirarab,bayzid,tandy}@cs.utexas.edu

More information

Reconciliation Problems for Duplication, Loss and Horizontal Gene Transfer Pawel Górecki. Presented by Connor Magill November 20, 2008

Reconciliation Problems for Duplication, Loss and Horizontal Gene Transfer Pawel Górecki. Presented by Connor Magill November 20, 2008 Reconciliation Problems for Duplication, Loss and Horizontal Gene Transfer Pawel Górecki Presented by Connor Magill November 20, 2008 Introduction Problem: Relationships between species cannot always be

More information

ML phylogenetic inference and GARLI. Derrick Zwickl. University of Arizona (and University of Kansas) Workshop on Molecular Evolution 2015

ML phylogenetic inference and GARLI. Derrick Zwickl. University of Arizona (and University of Kansas) Workshop on Molecular Evolution 2015 ML phylogenetic inference and GARLI Derrick Zwickl University of Arizona (and University of Kansas) Workshop on Molecular Evolution 2015 Outline Heuristics and tree searches ML phylogeny inference and

More information

Reconstructing Reticulate Evolution in Species Theory and Practice

Reconstructing Reticulate Evolution in Species Theory and Practice Reconstructing Reticulate Evolution in Species Theory and Practice Luay Nakhleh Department of Computer Science Rice University Houston, Texas 77005 nakhleh@cs.rice.edu Tandy Warnow Department of Computer

More information

Throughout the chapter, we will assume that the reader is familiar with the basics of phylogenetic trees.

Throughout the chapter, we will assume that the reader is familiar with the basics of phylogenetic trees. Chapter 7 SUPERTREE ALGORITHMS FOR NESTED TAXA Philip Daniel and Charles Semple Abstract: Keywords: Most supertree algorithms combine collections of rooted phylogenetic trees with overlapping leaf sets

More information

Lecture: Bioinformatics

Lecture: Bioinformatics Lecture: Bioinformatics ENS Sacley, 2018 Some slides graciously provided by Daniel Huson & Celine Scornavacca Phylogenetic Trees - Motivation 2 / 31 2 / 31 Phylogenetic Trees - Motivation Motivation -

More information

Rapid Neighbour-Joining

Rapid Neighbour-Joining Rapid Neighbour-Joining Martin Simonsen, Thomas Mailund, and Christian N.S. Pedersen Bioinformatics Research Center (BIRC), University of Aarhus, C. F. Møllers Allé, Building 1110, DK-8000 Århus C, Denmark

More information

EVOLUTIONARY DISTANCES INFERRING PHYLOGENIES

EVOLUTIONARY DISTANCES INFERRING PHYLOGENIES EVOLUTIONARY DISTANCES INFERRING PHYLOGENIES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 28 th November 2007 OUTLINE 1 INFERRING

More information

Least Common Ancestor Based Method for Efficiently Constructing Rooted Supertrees

Least Common Ancestor Based Method for Efficiently Constructing Rooted Supertrees Least ommon ncestor ased Method for fficiently onstructing Rooted Supertrees M.. Hai Zahid, nkush Mittal, R.. Joshi epartment of lectronics and omputer ngineering, IIT-Roorkee Roorkee, Uttaranchal, INI

More information

On the Optimality of the Neighbor Joining Algorithm

On the Optimality of the Neighbor Joining Algorithm On the Optimality of the Neighbor Joining Algorithm Ruriko Yoshida Dept. of Statistics University of Kentucky Joint work with K. Eickmeyer, P. Huggins, and L. Pachter www.ms.uky.edu/ ruriko Louisville

More information

PASTA: Ultra-Large Multiple Sequence Alignment

PASTA: Ultra-Large Multiple Sequence Alignment PASTA: Ultra-Large Multiple Sequence Alignment Siavash Mirarab, Nam Nguyen, and Tandy Warnow University of Texas at Austin - Department of Computer Science 2317 Speedway, Stop D9500 Austin, TX 78712 {smirarab,namphuon,tandy}@cs.utexas.edu

More information

Chordal Graphs and Evolutionary Trees. Tandy Warnow

Chordal Graphs and Evolutionary Trees. Tandy Warnow Chordal Graphs and Evolutionary Trees Tandy Warnow Possible Indo-European tree (Ringe, Warnow and Taylor 2000) Anatolian Vedic Iranian Greek Italic Celtic Tocharian Armenian Germanic Baltic Slavic Albanian

More information

PRec-I-DCM3: A Parallel Framework for Fast and Accurate Large Scale Phylogeny Reconstruction

PRec-I-DCM3: A Parallel Framework for Fast and Accurate Large Scale Phylogeny Reconstruction PRec-I-DCM3: A Parallel Framework for Fast and Accurate Large Scale Phylogeny Reconstruction Cristian Coarfa Yuri Dotsenko John Mellor-Crummey Luay Nakhleh Usman Roshan Abstract Phylogenetic trees play

More information

A New Approach For Tree Alignment Based on Local Re-Optimization

A New Approach For Tree Alignment Based on Local Re-Optimization A New Approach For Tree Alignment Based on Local Re-Optimization Feng Yue and Jijun Tang Department of Computer Science and Engineering University of South Carolina Columbia, SC 29063, USA yuef, jtang

More information

MultiPhyl: a high-throughput phylogenomics webserver using distributed computing

MultiPhyl: a high-throughput phylogenomics webserver using distributed computing Nucleic Acids Research, 2007, Vol. 35, Web Server issue W33 W37 doi:10.1093/nar/gkm359 MultiPhyl: a high-throughput phylogenomics webserver using distributed computing Thomas M. Keane 1,2, *, Thomas J.

More information

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology 9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example

More information

INFERENCE OF PARSIMONIOUS SPECIES TREES FROM MULTI-LOCUS DATA BY MINIMIZING DEEP COALESCENCES CUONG THAN AND LUAY NAKHLEH

INFERENCE OF PARSIMONIOUS SPECIES TREES FROM MULTI-LOCUS DATA BY MINIMIZING DEEP COALESCENCES CUONG THAN AND LUAY NAKHLEH INFERENCE OF PARSIMONIOUS SPECIES TREES FROM MULTI-LOCUS DATA BY MINIMIZING DEEP COALESCENCES CUONG THAN AND LUAY NAKHLEH Abstract. One approach for inferring a species tree from a given multi-locus data

More information

Clustering of Proteins

Clustering of Proteins Melroy Saldanha saldanha@stanford.edu CS 273 Project Report Clustering of Proteins Introduction Numerous genome-sequencing projects have led to a huge growth in the size of protein databases. Manual annotation

More information

MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 forbiggerdatasets

MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 forbiggerdatasets MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 forbiggerdatasets Sudhir Kumar, 1,2,3 Glen Stecher 1 and Koichiro Tamura*,4,5 1 Institute for Genomics and Evolutionary Medicine, Temple University

More information

The worst case complexity of Maximum Parsimony

The worst case complexity of Maximum Parsimony he worst case complexity of Maximum Parsimony mir armel Noa Musa-Lempel Dekel sur Michal Ziv-Ukelson Ben-urion University June 2, 20 / 2 What s a phylogeny Phylogenies: raph-like structures whose topology

More information

Parsimony-Based Approaches to Inferring Phylogenetic Trees

Parsimony-Based Approaches to Inferring Phylogenetic Trees Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 www.biostat.wisc.edu/bmi576.html Mark Craven craven@biostat.wisc.edu Fall 0 Phylogenetic tree approaches! three general types! distance:

More information

Heterotachy models in BayesPhylogenies

Heterotachy models in BayesPhylogenies Heterotachy models in is a general software package for inferring phylogenetic trees using Bayesian Markov Chain Monte Carlo (MCMC) methods. The program allows a range of models of gene sequence evolution,

More information

Hybrid Parallelization of the MrBayes & RAxML Phylogenetics Codes

Hybrid Parallelization of the MrBayes & RAxML Phylogenetics Codes Hybrid Parallelization of the MrBayes & RAxML Phylogenetics Codes Wayne Pfeiffer (SDSC/UCSD) & Alexandros Stamatakis (TUM) February 25, 2010 What was done? Why is it important? Who cares? Hybrid MPI/OpenMP

More information

CLC Phylogeny Module User manual

CLC Phylogeny Module User manual CLC Phylogeny Module User manual User manual for Phylogeny Module 1.0 Windows, Mac OS X and Linux September 13, 2013 This software is for research purposes only. CLC bio Silkeborgvej 2 Prismet DK-8000

More information

Algorithms for MDC-Based Multi-locus Phylogeny Inference

Algorithms for MDC-Based Multi-locus Phylogeny Inference Algorithms for MDC-Based Multi-locus Phylogeny Inference Yun Yu 1, Tandy Warnow 2, and Luay Nakhleh 1 1 Dept. of Computer Science, Rice University, 61 Main Street, Houston, TX 775, USA {yy9,nakhleh}@cs.rice.edu

More information

Fast and accurate branch lengths estimation for phylogenomic trees

Fast and accurate branch lengths estimation for phylogenomic trees Binet et al. BMC Bioinformatics (2016) 17:23 DOI 10.1186/s12859-015-0821-8 RESEARCH ARTICLE Open Access Fast and accurate branch lengths estimation for phylogenomic trees Manuel Binet 1,2,3, Olivier Gascuel

More information

Computing the All-Pairs Quartet Distance on a set of Evolutionary Trees

Computing the All-Pairs Quartet Distance on a set of Evolutionary Trees Journal of Bioinformatics and Computational Biology c Imperial College Press Computing the All-Pairs Quartet Distance on a set of Evolutionary Trees M. Stissing, T. Mailund, C. N. S. Pedersen and G. S.

More information

A RANDOMIZED ALGORITHM FOR COMPARING SETS OF PHYLOGENETIC TREES

A RANDOMIZED ALGORITHM FOR COMPARING SETS OF PHYLOGENETIC TREES A RANDOMIZED ALGORITHM FOR COMPARING SETS OF PHYLOGENETIC TREES SEUNG-JIN SUL AND TIFFANI L. WILLIAMS Department of Computer Science Texas A&M University College Station, TX 77843-3112 USA E-mail: {sulsj,tlw}@cs.tamu.edu

More information

A New Algorithm for the Reconstruction of Near-Perfect Binary Phylogenetic Trees

A New Algorithm for the Reconstruction of Near-Perfect Binary Phylogenetic Trees A New Algorithm for the Reconstruction of Near-Perfect Binary Phylogenetic Trees Kedar Dhamdhere, Srinath Sridhar, Guy E. Blelloch, Eran Halperin R. Ravi and Russell Schwartz March 17, 2005 CMU-CS-05-119

More information

AUTOMATED PLAUSIBILITY ANALYSIS OF LARGE PHYLOGENIES

AUTOMATED PLAUSIBILITY ANALYSIS OF LARGE PHYLOGENIES CHAPTER 1 AUTOMATED PLAUSIBILITY ANALYSIS OF LARGE PHYLOGENIES David Dao 1, Tomáš Flouri 2, Alexandros Stamatakis 1,2 1 KarlsruheInstituteofTechnology,InstituteforTheoreticalInformatics,Postfach 6980,

More information

Fast Structural Search in Phylogenetic Databases

Fast Structural Search in Phylogenetic Databases ORIGINAL RESEARCH Fast Structural Search in Phylogenetic Databases Jason T. L. Wang 1, Huiyuan Shan 2, Dennis Shasha 3 and William H. Piel 4 1 Department of Computer Science, New Jersey Institute of Technology,

More information

An Experimental Analysis of Robinson-Foulds Distance Matrix Algorithms

An Experimental Analysis of Robinson-Foulds Distance Matrix Algorithms An Experimental Analysis of Robinson-Foulds Distance Matrix Algorithms Seung-Jin Sul and Tiffani L. Williams Department of Computer Science Texas A&M University College Station, TX 77843-3 {sulsj,tlw}@cs.tamu.edu

More information

Lecture 5: Multiple sequence alignment

Lecture 5: Multiple sequence alignment Lecture 5: Multiple sequence alignment Introduction to Computational Biology Teresa Przytycka, PhD (with some additions by Martin Vingron) Why do we need multiple sequence alignment Pairwise sequence alignment

More information

1 High-Performance Phylogeny Reconstruction Under Maximum Parsimony. David A. Bader, Bernard M.E. Moret, Tiffani L. Williams and Mi Yan

1 High-Performance Phylogeny Reconstruction Under Maximum Parsimony. David A. Bader, Bernard M.E. Moret, Tiffani L. Williams and Mi Yan Contents 1 High-Performance Phylogeny Reconstruction Under Maximum Parsimony 1 David A. Bader, Bernard M.E. Moret, Tiffani L. Williams and Mi Yan 1.1 Introduction 1 1.2 Maximum Parsimony 7 1.3 Exact MP:

More information

A Randomized Algorithm for Comparing Sets of Phylogenetic Trees

A Randomized Algorithm for Comparing Sets of Phylogenetic Trees A Randomized Algorithm for Comparing Sets of Phylogenetic Trees Seung-Jin Sul and Tiffani L. Williams Department of Computer Science Texas A&M University E-mail: {sulsj,tlw}@cs.tamu.edu Technical Report

More information

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Fundamenta Informaticae 56 (2003) 105 120 105 IOS Press A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Jesper Jansson Department of Computer Science Lund University, Box 118 SE-221

More information

Reconsidering the Performance of Cooperative Rec-I-DCM3

Reconsidering the Performance of Cooperative Rec-I-DCM3 Reconsidering the Performance of Cooperative Rec-I-DCM3 Tiffani L. Williams Department of Computer Science Texas A&M University College Station, TX 77843 Email: tlw@cs.tamu.edu Marc L. Smith Computer Science

More information

human chimp mouse rat

human chimp mouse rat Michael rudno These notes are based on earlier notes by Tomas abak Phylogenetic Trees Phylogenetic Trees demonstrate the amoun of evolution, and order of divergence for several genomes. Phylogenetic trees

More information

Tracy Heath Workshop on Molecular Evolution, Woods Hole USA

Tracy Heath Workshop on Molecular Evolution, Woods Hole USA INTRODUCTION TO BAYESIAN PHYLOGENETIC SOFTWARE Tracy Heath Integrative Biology, University of California, Berkeley Ecology & Evolutionary Biology, University of Kansas 2013 Workshop on Molecular Evolution,

More information

A Memetic Algorithm for Phylogenetic Reconstruction with Maximum Parsimony

A Memetic Algorithm for Phylogenetic Reconstruction with Maximum Parsimony A Memetic Algorithm for Phylogenetic Reconstruction with Maximum Parsimony Jean-Michel Richer 1 and Adrien Goëffon 2 and Jin-Kao Hao 1 1 University of Angers, LERIA, 2 Bd Lavoisier, 49045 Anger Cedex 01,

More information

Phylogenetic Trees Lecture 12. Section 7.4, in Durbin et al., 6.5 in Setubal et al. Shlomo Moran, Ilan Gronau

Phylogenetic Trees Lecture 12. Section 7.4, in Durbin et al., 6.5 in Setubal et al. Shlomo Moran, Ilan Gronau Phylogenetic Trees Lecture 12 Section 7.4, in Durbin et al., 6.5 in Setubal et al. Shlomo Moran, Ilan Gronau. Maximum Parsimony. Last week we presented Fitch algorithm for (unweighted) Maximum Parsimony:

More information

The CIPRES Science Gateway: Enabling High-Impact Science for Phylogenetics Researchers with Limited Resources

The CIPRES Science Gateway: Enabling High-Impact Science for Phylogenetics Researchers with Limited Resources The CIPRES Science Gateway: Enabling High-Impact Science for Phylogenetics Researchers with Limited Resources Mark Miller, Wayne Pfeiffer, and Terri Schwartz San Diego Supercomputer Center Phylogenetics

More information

Lab 07: Maximum Likelihood Model Selection and RAxML Using CIPRES

Lab 07: Maximum Likelihood Model Selection and RAxML Using CIPRES Integrative Biology 200, Spring 2014 Principles of Phylogenetics: Systematics University of California, Berkeley Updated by Traci L. Grzymala Lab 07: Maximum Likelihood Model Selection and RAxML Using

More information

Generation of distancebased phylogenetic trees

Generation of distancebased phylogenetic trees primer for practical phylogenetic data gathering. Uconn EEB3899-007. Spring 2015 Session 12 Generation of distancebased phylogenetic trees Rafael Medina (rafael.medina.bry@gmail.com) Yang Liu (yang.liu@uconn.edu)

More information

Fixed Parameter Tractability of Binary Near-Perfect Phylogenetic Tree Reconstruction

Fixed Parameter Tractability of Binary Near-Perfect Phylogenetic Tree Reconstruction Fixed Parameter Tractability of Binary Near-Perfect Phylogenetic Tree Reconstruction Guy E. Blelloch, Kedar Dhamdhere, Eran Halperin, R. Ravi, Russell Schwartz and Srinath Sridhar Abstract. We consider

More information

Algorithms for constructing more accurate and inclusive phylogenetic trees

Algorithms for constructing more accurate and inclusive phylogenetic trees Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 2013 Algorithms for constructing more accurate and inclusive phylogenetic trees Ruchi Chaudhary Iowa State University

More information

Parsimony Least squares Minimum evolution Balanced minimum evolution Maximum likelihood (later in the course)

Parsimony Least squares Minimum evolution Balanced minimum evolution Maximum likelihood (later in the course) Tree Searching We ve discussed how we rank trees Parsimony Least squares Minimum evolution alanced minimum evolution Maximum likelihood (later in the course) So we have ways of deciding what a good tree

More information

Evolution of Tandemly Repeated Sequences

Evolution of Tandemly Repeated Sequences University of Canterbury Department of Mathematics and Statistics Evolution of Tandemly Repeated Sequences A thesis submitted in partial fulfilment of the requirements of the Degree for Master of Science

More information

A high-throughput bioinformatics distributed computing platform

A high-throughput bioinformatics distributed computing platform A high-throughput bioinformatics distributed computing platform Thomas M. Keane 1, Andrew J. Page 2, James O. McInerney 1, and Thomas J. Naughton 2 1 Bioinformatics and Pharmacogenomics Laboratory, National

More information

Improvement of Distance-Based Phylogenetic Methods by a Local Maximum Likelihood Approach Using Triplets

Improvement of Distance-Based Phylogenetic Methods by a Local Maximum Likelihood Approach Using Triplets Improvement of Distance-Based Phylogenetic Methods by a Local Maximum Likelihood Approach Using Triplets Vincent Ranwez and Olivier Gascuel Département Informatique Fondamentale et Applications, LIRMM,

More information