Quartet Inference from SNP Data Under the Coalescent Model

Size: px
Start display at page:

Download "Quartet Inference from SNP Data Under the Coalescent Model"

Transcription

1 Quartet Inference from SNP Data Under the Coalescent Model Julia Chifman and Laura Kubatko By Shashank Yaduvanshi

2 EsDmaDng Species Tree from Gene Sequences Input: Alignments from muldple genes Output: Unified species tree Challenges: Every gene has its own phylogeny Gene trees might vary from species tree due to ILS, horizontal gene transfer etc

3 Phylogeny EsDmaDon Methods under the Coalescent Model Used to model ILS in gene trees Summary based methods Quartet based methods ConcatenaDon methods Co- esdmadon methods

4 Summary Based Methods First esdmate independent gene trees for each gene using methods like RaxML Second step is combining gene trees to get species trees by methods like Astral ComputaDonally efficient for large data sets EsDmaDon error in gene trees will lower the overall accuracy

5 Quartet Based Methods EsDmate the most likely true quartet tree for each 4 set of taxa using muld gene sequences Combine all (or a subset) of these quartet trees using a Supertree method to get the species tree Works on the endre data together while sdll remaining computadonally efficient

6 ConcatenaDon Methods Concatenate all gene sequence alignments to get one long sequence alignment for each taxon Get the species tree using these long alignments directly with methods such as ML Ignores differences in the gene trees for different genes

7 Co- esdmadon Methods Co- esdmate sequence alignments and species tree with methods such as Bayesian inference Generally higher accuracy than other methods ComputaDonally inefficient for large datasets

8 EsDmaDng Quartet Trees Most methods seen so far are distance based, or ML- based This paper introduces a new measure, SVD scores that is based on the frequency of quartet pa\erns amongst all gene alignments SVD scores can be used to esdmate the most likely quartet tree for any quartet of taxa

9 Important Concepts p ijkl =P(X1 =i; X2 =j; X3 =k; X4 =l) A SPLIT of a taxa set L is a biparddon of L into two non- overlapping subsets L1 & L2, denoted L1 L2. VALID SPLIT L1 L2 for tree T: There is some edge in T that results in the same biparddon L1 L2. If no such edge exists, then the split is INVALID For taxa quartets, we will talk about splits corresponding to groups of two. There are 3 such possible splits for each quartet.

10 Fla\ening

11 Important Concepts The RANK of a matrix A is the size of the largest collecdon of linearly independent columns(or rows) of A. SVD: The singular value decomposidon of a matrix A is the factorizadon of A into the product of three matrices A = UDV T where the columns of U and V are orthonormal and the matrix D is diagonal with posidve real entries. Rank(A) equals the number of non- zero diagonal elements(singular values) in D.

12 Theorem [Chifman and Kubatko, 2014]. Let C denote the class of coalescent models under the four- state GTR model on a four- taxon binary species tree. For a valid split L1 L2, rank(flat L1 L2 (P))<= 10 for all distribudons P arising from C. For a non- valid split L1 L2, rank(flat L1 L2 (P)) > 10.

13 ApproximaDon to Fla\ening

14 Finding the Best Split Calculate Flat L1 L2 (P ) for all three possible splits. Calculate the rank of each of these three matrices. True split will have rank<=10. Not computadonally intensive to get these counts and calculate rank Can be run in parallel for different quartets

15 SVD Scores SVD score 0 implies rank(l1 L2)<=10, hence L1 L2 is a valid split SVD score >0 implies rank(l1 L2)>10, hence L1 L2 is an invalid split Choose the split with the lowest SVD score

16 Suitable Data SVD scores are applicable to data where each site evolves independently, coming from a different locus However, authors claim that this method also works well when each locus produces muldple sites, simulated and real world. Bootstrapping for a dataset consisdng of M aligned sites Re- sample columns with replacement M Dmes Calculate SVD scores of the three splits for this data matrix Repeat this procedure B Dmes Each bootstrap matrix votes for a pardcular split. Total votes for each split is its bootstrap support

17 Experiments SimulaDon Study Ra\lesnake MulD- Loci Data Soybean SNP Data

18 SimulaDon Study 1 x x 3 x x x 2 4

19 SimulaDon Study Generate a sample of g gene trees from the model species tree ((1:x,2:x):x,(3:x,4:x):x), where x is the length of each branch under the coalescent model using the program COAL (Degnan and Salter). Generate sequence data of length n on each gene tree under a specified subsdtudon model. Construct the fla\ening matrix for each of the three possible splits, and compute SVD(L1 L2) for each Repeat 1000 Dmes and record SVD(L1 L2) k ; k=1; 2;... ; 1000, for each split. For each of the 1000 datasets, generate B bootstrapped datasets and record SVD(L1 L2) k;b for each split.

20 SimulaDon Study x(branch length)=0.5,1,2 g=5000, n=1: Simulate SNP data, one site per gene g=10, n=500: Simulate muldple sites per gene SubsDtuDon Model: Jukes Cantor model (JC69) and the GTR model with a propordon of invariant sites and with gamma- distributed mutadon rates across sites (GTR + I + Γ) n=1, g=1000,5000,10000: Check rundme for quartets

21 Results

22 Results

23 Results

24 Results In all cases, there is good separadon of SVD scores of valid split versus the other two splits. SVD score can be a good measure to find the correct quartet tree for each quartet Longer branch lengths results in be\er separadon of SVD scores for quartets. As expected, unlinked SNP data has be\er separadon than muld- sites per gene data. RunDme is less than linear in the total number of site pa\erns. However this rundme is only for quartets. RunDme for general n- taxa datasets discussed later.

25 Results Experiments only cover a specific topology, other quartet topologies with different branch lengths need to be experimented with as we know certain topologies are difficult to esdmate RunDme is only measured for quartets. Running this in combinadon with quartet aggregadon methods to esdmate species tree for n- taxa discussed later Other suitable values of g and n should be analyzed.

26 Ra\lesnake Data

27 Ra\lesnake Data Using SVD scores and QMC on dataset previously analyzed by Kubatko et al. 52 sequences with 8466 aligned nucleodde posidons each in the complete data matrix Method Randomly sample quartets from the 52 sequences Use SVD scores to infer the true quartet reladonship for each quartet Apply QMC to get species tree from quartet trees

28 Results Produces similar findings on ra\lesnake data compared to the original analysis in Kubatko et al. (2011) Original analysis took ~10 days using BEAST while using SVD scores took ~1 day without parallelizing quartets sampled out of 52 C 4 = total quartets. Why random sampling? Using quartets that are more reliable may be be\er. Analyze rundme for using all quartets or other sampling strategies

29 Soybean Data

30 Soybean SNP Data Previously published SNP dataset originally analyzed by Lam et al. (2010) Compared with computadon using SNAPP which is suitable for SNP data SNAPP infers the species tree using the coalescent model and is designed for biallelic data consisdng of unlinked SNPs. It bypasses gene trees and computes species trees using ML.

31 Results Produced results in agreement with the original findings SNAPP failed to converge even axer 28 days. SVD Quartets method with 100 bootstrap samples and quartets sampled per replicate required 600 hrs. Need to compare with other ML measures that are be\er than SNAPP.

32 Conclusion SVD Quartets is an efficient algorithm that esdmates quartet trees for a 4- taxa set Can be combined with a supertree method to get species tree from muldple gene alignments without calculadng gene trees explicitly Experiments so far lack breadth and depth, scope for doing more intensive experiments and comparison with other methods solving the same problem

33 QuesDons?

Scaling species tree estimation methods to large datasets using NJMerge

Scaling species tree estimation methods to large datasets using NJMerge Scaling species tree estimation methods to large datasets using NJMerge Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana Champaign 2018 Phylogenomics Software

More information

Dynamic Programming for Phylogenetic Estimation

Dynamic Programming for Phylogenetic Estimation 1 / 45 Dynamic Programming for Phylogenetic Estimation CS598AGB Pranjal Vachaspati University of Illinois at Urbana-Champaign 2 / 45 Coalescent-based Species Tree Estimation Find evolutionary tree for

More information

CS 581. Tandy Warnow

CS 581. Tandy Warnow CS 581 Tandy Warnow This week Maximum parsimony: solving it on small datasets Maximum Likelihood optimization problem Felsenstein s pruning algorithm Bayesian MCMC methods Research opportunities Maximum

More information

Lab 07: Maximum Likelihood Model Selection and RAxML Using CIPRES

Lab 07: Maximum Likelihood Model Selection and RAxML Using CIPRES Integrative Biology 200, Spring 2014 Principles of Phylogenetics: Systematics University of California, Berkeley Updated by Traci L. Grzymala Lab 07: Maximum Likelihood Model Selection and RAxML Using

More information

Graphite IntroducDon and Overview. Goals, Architecture, and Performance

Graphite IntroducDon and Overview. Goals, Architecture, and Performance Graphite IntroducDon and Overview Goals, Architecture, and Performance 4 The Future of MulDcore #Cores 128 1000 cores? CompuDng has moved aggressively to muldcore 64 32 MIT Raw Intel SSC Up to 72 cores

More information

Phylogenetics on CUDA (Parallel) Architectures Bradly Alicea

Phylogenetics on CUDA (Parallel) Architectures Bradly Alicea Descent w/modification Descent w/modification Descent w/modification Descent w/modification CPU Descent w/modification Descent w/modification Phylogenetics on CUDA (Parallel) Architectures Bradly Alicea

More information

Comparison of commonly used methods for combining multiple phylogenetic data sets

Comparison of commonly used methods for combining multiple phylogenetic data sets Comparison of commonly used methods for combining multiple phylogenetic data sets Anne Kupczok, Heiko A. Schmidt and Arndt von Haeseler Center for Integrative Bioinformatics Vienna Max F. Perutz Laboratories

More information

Evolutionary tree reconstruction (Chapter 10)

Evolutionary tree reconstruction (Chapter 10) Evolutionary tree reconstruction (Chapter 10) Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships between species since Darwin till early

More information

Supplementary Material, corresponding to the manuscript Accumulated Coalescence Rank and Excess Gene count for Species Tree Inference

Supplementary Material, corresponding to the manuscript Accumulated Coalescence Rank and Excess Gene count for Species Tree Inference Supplementary Material, corresponding to the manuscript Accumulated Coalescence Rank and Excess Gene count for Species Tree Inference Sourya Bhattacharyya and Jayanta Mukherjee Department of Computer Science

More information

ML phylogenetic inference and GARLI. Derrick Zwickl. University of Arizona (and University of Kansas) Workshop on Molecular Evolution 2015

ML phylogenetic inference and GARLI. Derrick Zwickl. University of Arizona (and University of Kansas) Workshop on Molecular Evolution 2015 ML phylogenetic inference and GARLI Derrick Zwickl University of Arizona (and University of Kansas) Workshop on Molecular Evolution 2015 Outline Heuristics and tree searches ML phylogeny inference and

More information

Workshop Practical on concatenation and model testing

Workshop Practical on concatenation and model testing Workshop Practical on concatenation and model testing Jacob L. Steenwyk & Antonis Rokas Programs that you will use: Bash, Python, Perl, Phyutility, PartitionFinder, awk To infer a putative species phylogeny

More information

Heterotachy models in BayesPhylogenies

Heterotachy models in BayesPhylogenies Heterotachy models in is a general software package for inferring phylogenetic trees using Bayesian Markov Chain Monte Carlo (MCMC) methods. The program allows a range of models of gene sequence evolution,

More information

Introduction to Trees

Introduction to Trees Introduction to Trees Tandy Warnow December 28, 2016 Introduction to Trees Tandy Warnow Clades of a rooted tree Every node v in a leaf-labelled rooted tree defines a subset of the leafset that is below

More information

Olivier Gascuel Arbres formels et Arbre de la Vie Conférence ENS Cachan, septembre Arbres formels et Arbre de la Vie.

Olivier Gascuel Arbres formels et Arbre de la Vie Conférence ENS Cachan, septembre Arbres formels et Arbre de la Vie. Arbres formels et Arbre de la Vie Olivier Gascuel Centre National de la Recherche Scientifique LIRMM, Montpellier, France www.lirmm.fr/gascuel 10 permanent researchers 2 technical staff 3 postdocs, 10

More information

Sequence length requirements. Tandy Warnow Department of Computer Science The University of Texas at Austin

Sequence length requirements. Tandy Warnow Department of Computer Science The University of Texas at Austin Sequence length requirements Tandy Warnow Department of Computer Science The University of Texas at Austin Part 1: Absolute Fast Convergence DNA Sequence Evolution AAGGCCT AAGACTT TGGACTT -3 mil yrs -2

More information

Designing parallel algorithms for constructing large phylogenetic trees on Blue Waters

Designing parallel algorithms for constructing large phylogenetic trees on Blue Waters Designing parallel algorithms for constructing large phylogenetic trees on Blue Waters Erin Molloy University of Illinois at Urbana Champaign General Allocation (PI: Tandy Warnow) Exploratory Allocation

More information

Introduction to Triangulated Graphs. Tandy Warnow

Introduction to Triangulated Graphs. Tandy Warnow Introduction to Triangulated Graphs Tandy Warnow Topics for today Triangulated graphs: theorems and algorithms (Chapters 11.3 and 11.9) Examples of triangulated graphs in phylogeny estimation (Chapters

More information

3D Face Modeling. Lacey Best- Rowden, Joseph Roth Feb. 18, MSU

3D Face Modeling. Lacey Best- Rowden, Joseph Roth Feb. 18, MSU 3D Face Modeling Lacey Best- Rowden, Joseph Roth Feb. 18, MSU Outline ApplicaDon / Benefits 3D ReconstrucDon Techniques o Range Scanners o Single Image! 3DMM o MulDple Image! Shape from stereo! Photometric

More information

ABOUT THE LARGEST SUBTREE COMMON TO SEVERAL PHYLOGENETIC TREES Alain Guénoche 1, Henri Garreta 2 and Laurent Tichit 3

ABOUT THE LARGEST SUBTREE COMMON TO SEVERAL PHYLOGENETIC TREES Alain Guénoche 1, Henri Garreta 2 and Laurent Tichit 3 The XIII International Conference Applied Stochastic Models and Data Analysis (ASMDA-2009) June 30-July 3, 2009, Vilnius, LITHUANIA ISBN 978-9955-28-463-5 L. Sakalauskas, C. Skiadas and E. K. Zavadskas

More information

Efficient Quartet Representations of Trees and Applications to Supertree and Summary Methods

Efficient Quartet Representations of Trees and Applications to Supertree and Summary Methods 1 Efficient Quartet Representations of Trees and Applications to Supertree and Summary Methods Ruth Davidson, MaLyn Lawhorn, Joseph Rusinko*, and Noah Weber arxiv:1512.05302v3 [q-bio.pe] 6 Dec 2016 Abstract

More information

INFERENCE OF PARSIMONIOUS SPECIES TREES FROM MULTI-LOCUS DATA BY MINIMIZING DEEP COALESCENCES CUONG THAN AND LUAY NAKHLEH

INFERENCE OF PARSIMONIOUS SPECIES TREES FROM MULTI-LOCUS DATA BY MINIMIZING DEEP COALESCENCES CUONG THAN AND LUAY NAKHLEH INFERENCE OF PARSIMONIOUS SPECIES TREES FROM MULTI-LOCUS DATA BY MINIMIZING DEEP COALESCENCES CUONG THAN AND LUAY NAKHLEH Abstract. One approach for inferring a species tree from a given multi-locus data

More information

Distance Methods. "PRINCIPLES OF PHYLOGENETICS" Spring 2006

Distance Methods. PRINCIPLES OF PHYLOGENETICS Spring 2006 Integrative Biology 200A University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS" Spring 2006 Distance Methods Due at the end of class: - Distance matrices and trees for two different distance

More information

Distributed Systems. Peer- to- Peer. Rik Sarkar James Cheney. University of Edinburgh Spring 2014

Distributed Systems. Peer- to- Peer. Rik Sarkar James Cheney. University of Edinburgh Spring 2014 Distributed Systems Peer- to- Peer Rik Sarkar James Cheney University of Edinburgh Spring 2014 Peer to Peer The common percepdon A system for distribudng (sharing?) files Using the computers of common

More information

Lecture 20: Clustering and Evolution

Lecture 20: Clustering and Evolution Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 11/11/2014 Comp 555 Bioalgorithms (Fall 2014) 1 Clique Graphs A clique is a graph where every vertex is connected via an edge to every other

More information

Hybrid Parallelization of the MrBayes & RAxML Phylogenetics Codes

Hybrid Parallelization of the MrBayes & RAxML Phylogenetics Codes Hybrid Parallelization of the MrBayes & RAxML Phylogenetics Codes Wayne Pfeiffer (SDSC/UCSD) & Alexandros Stamatakis (TUM) February 25, 2010 What was done? Why is it important? Who cares? Hybrid MPI/OpenMP

More information

Supplementary Online Material PASTA: ultra-large multiple sequence alignment

Supplementary Online Material PASTA: ultra-large multiple sequence alignment Supplementary Online Material PASTA: ultra-large multiple sequence alignment Siavash Mirarab, Nam Nguyen, and Tandy Warnow University of Texas at Austin - Department of Computer Science {smirarab,bayzid,tandy}@cs.utexas.edu

More information

Codon models. In reality we use codon model Amino acid substitution rates meet nucleotide models Codon(nucleotide triplet)

Codon models. In reality we use codon model Amino acid substitution rates meet nucleotide models Codon(nucleotide triplet) Phylogeny Codon models Last lecture: poor man s way of calculating dn/ds (Ka/Ks) Tabulate synonymous/non- synonymous substitutions Normalize by the possibilities Transform to genetic distance K JC or K

More information

CSE 549: Computational Biology

CSE 549: Computational Biology CSE 549: Computational Biology Phylogenomics 1 slides marked with * by Carl Kingsford Tree of Life 2 * H5N1 Influenza Strains Salzberg, Kingsford, et al., 2007 3 * H5N1 Influenza Strains The 2007 outbreak

More information

Distance based tree reconstruction. Hierarchical clustering (UPGMA) Neighbor-Joining (NJ)

Distance based tree reconstruction. Hierarchical clustering (UPGMA) Neighbor-Joining (NJ) Distance based tree reconstruction Hierarchical clustering (UPGMA) Neighbor-Joining (NJ) All organisms have evolved from a common ancestor. Infer the evolutionary tree (tree topology and edge lengths)

More information

Fast and accurate branch lengths estimation for phylogenomic trees

Fast and accurate branch lengths estimation for phylogenomic trees Binet et al. BMC Bioinformatics (2016) 17:23 DOI 10.1186/s12859-015-0821-8 RESEARCH ARTICLE Open Access Fast and accurate branch lengths estimation for phylogenomic trees Manuel Binet 1,2,3, Olivier Gascuel

More information

Lecture 20: Clustering and Evolution

Lecture 20: Clustering and Evolution Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 11/12/2013 Comp 465 Fall 2013 1 Clique Graphs A clique is a graph where every vertex is connected via an edge to every other vertex A clique

More information

TreeTime User Manual

TreeTime User Manual TreeTime User Manual Lin Himmelmann www.linhi.de Dirk Metzler www.zi.biologie.uni-muenchen.de/evol/statgen.html March 2009 TreeTime is controlled via an input file in Nexus file format (view Maddison 1997).

More information

11/17/2009 Comp 590/Comp Fall

11/17/2009 Comp 590/Comp Fall Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 Problem Set #5 will be available tonight 11/17/2009 Comp 590/Comp 790-90 Fall 2009 1 Clique Graphs A clique is a graph with every vertex connected

More information

4/4/16 Comp 555 Spring

4/4/16 Comp 555 Spring 4/4/16 Comp 555 Spring 2016 1 A clique is a graph where every vertex is connected via an edge to every other vertex A clique graph is a graph where each connected component is a clique The concept of clustering

More information

CLC Phylogeny Module User manual

CLC Phylogeny Module User manual CLC Phylogeny Module User manual User manual for Phylogeny Module 1.0 Windows, Mac OS X and Linux September 13, 2013 This software is for research purposes only. CLC bio Silkeborgvej 2 Prismet DK-8000

More information

Phylogenetic Trees Lecture 12. Section 7.4, in Durbin et al., 6.5 in Setubal et al. Shlomo Moran, Ilan Gronau

Phylogenetic Trees Lecture 12. Section 7.4, in Durbin et al., 6.5 in Setubal et al. Shlomo Moran, Ilan Gronau Phylogenetic Trees Lecture 12 Section 7.4, in Durbin et al., 6.5 in Setubal et al. Shlomo Moran, Ilan Gronau. Maximum Parsimony. Last week we presented Fitch algorithm for (unweighted) Maximum Parsimony:

More information

LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier

LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier Wang Ding, Songnian Yu, Shanqing Yu, Wei Wei, and Qianfeng Wang School of Computer Engineering and Science, Shanghai University, 200072

More information

DIMACS Tutorial on Phylogenetic Trees and Rapidly Evolving Pathogens. Katherine St. John City University of New York 1

DIMACS Tutorial on Phylogenetic Trees and Rapidly Evolving Pathogens. Katherine St. John City University of New York 1 DIMACS Tutorial on Phylogenetic Trees and Rapidly Evolving Pathogens Katherine St. John City University of New York 1 Thanks to the DIMACS Staff Linda Casals Walter Morris Nicole Clark Katherine St. John

More information

LBANN: Livermore Big Ar.ficial Neural Network HPC Toolkit

LBANN: Livermore Big Ar.ficial Neural Network HPC Toolkit LBANN: Livermore Big Ar.ficial Neural Network HPC Toolkit MLHPC 2015 Nov. 15, 2015 Brian Van Essen, Hyojin Kim, Roger Pearce, Kofi Boakye, Barry Chen Center for Applied ScienDfic CompuDng (CASC) + ComputaDonal

More information

InserDonSort. InserDonSort. SelecDonSort. MergeSort. Divide & Conquer? 9/27/12

InserDonSort. InserDonSort. SelecDonSort. MergeSort. Divide & Conquer? 9/27/12 CS/ENGRD 2110 Object- Oriented Programming and Data Structures Fall 2012 Doug James Lecture 11: SorDng //sort a[], an array of int for (int i = 1; i < a.length; i++) { int temp = a[i]; int k; for (k =

More information

Protein phylogenetics

Protein phylogenetics Protein phylogenetics Robert Hirt PAUP4.0* can be used for an impressive range of analytical methods involving DNA alignments. This, unfortunately is not the case for estimating protein phylogenies. Only

More information

HybridCheck User Manual

HybridCheck User Manual HybridCheck User Manual Ben J. Ward February 2015 HybridCheck is a software package to visualise the recombination signal in assembled next generation sequence data, and it can be used to detect recombination,

More information

A New Algorithm for the Reconstruction of Near-Perfect Binary Phylogenetic Trees

A New Algorithm for the Reconstruction of Near-Perfect Binary Phylogenetic Trees A New Algorithm for the Reconstruction of Near-Perfect Binary Phylogenetic Trees Kedar Dhamdhere, Srinath Sridhar, Guy E. Blelloch, Eran Halperin R. Ravi and Russell Schwartz March 17, 2005 CMU-CS-05-119

More information

Supervised vs unsupervised clustering

Supervised vs unsupervised clustering Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful

More information

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry 55:148 Digital Image Processing Chapter 11 3D Vision, Geometry Topics: Basics of projective geometry Points and hyperplanes in projective space Homography Estimating homography from point correspondence

More information

The History Bound and ILP

The History Bound and ILP The History Bound and ILP Julia Matsieva and Dan Gusfield UC Davis March 15, 2017 Bad News for Tree Huggers More Bad News Far more convincingly even than the (also highly convincing) fossil evidence, the

More information

Tutorial using BEAST v2.4.7 MASCOT Tutorial Nicola F. Müller

Tutorial using BEAST v2.4.7 MASCOT Tutorial Nicola F. Müller Tutorial using BEAST v2.4.7 MASCOT Tutorial Nicola F. Müller Parameter and State inference using the approximate structured coalescent 1 Background Phylogeographic methods can help reveal the movement

More information

Distributed Systems. Peer- to- Peer. Rik Sarkar. University of Edinburgh Fall 2014

Distributed Systems. Peer- to- Peer. Rik Sarkar. University of Edinburgh Fall 2014 Distributed Systems Peer- to- Peer Rik Sarkar University of Edinburgh Fall 2014 Peer to Peer The common percepdon A system for distribudng (sharing?) files Using the computers of common users (instead

More information

Identifiability of Large Phylogenetic Mixture Models

Identifiability of Large Phylogenetic Mixture Models Identifiability of Large Phylogenetic Mixture Models John Rhodes and Seth Sullivant University of Alaska Fairbanks and NCSU April 18, 2012 Seth Sullivant (NCSU) Phylogenetic Mixtures April 18, 2012 1 /

More information

Inferring Regulatory Networks by Combining Perturbation Screens and Steady State Gene Expression Profiles

Inferring Regulatory Networks by Combining Perturbation Screens and Steady State Gene Expression Profiles Supporting Information to Inferring Regulatory Networks by Combining Perturbation Screens and Steady State Gene Expression Profiles Ali Shojaie,#, Alexandra Jauhiainen 2,#, Michael Kallitsis 3,#, George

More information

Bryan Carstens Matthew Demarest Maxim Kim Tara Pelletier Jordan Satler. spedestem tutorial

Bryan Carstens Matthew Demarest Maxim Kim Tara Pelletier Jordan Satler. spedestem tutorial Bryan Carstens Matthew Demarest Maxim Kim Tara Pelletier Jordan Satler spedestem tutorial Acknowledgements Development of spedestem was funded via a grant from the National Science Foundation (DEB-0918212).

More information

Lab 8: Molecular Evolution

Lab 8: Molecular Evolution Integrative Biology 200B University of California, Berkeley, Spring 2011 "Ecology and Evolution" by NM Hallinan, updated by Nick Matzke Lab 8: Molecular Evolution There are many different features of genes

More information

Molecular Evolution & Phylogenetics Complexity of the search space, distance matrix methods, maximum parsimony

Molecular Evolution & Phylogenetics Complexity of the search space, distance matrix methods, maximum parsimony Molecular Evolution & Phylogenetics Complexity of the search space, distance matrix methods, maximum parsimony Basic Bioinformatics Workshop, ILRI Addis Ababa, 12 December 2017 Learning Objectives understand

More information

Reconciliation Problems for Duplication, Loss and Horizontal Gene Transfer Pawel Górecki. Presented by Connor Magill November 20, 2008

Reconciliation Problems for Duplication, Loss and Horizontal Gene Transfer Pawel Górecki. Presented by Connor Magill November 20, 2008 Reconciliation Problems for Duplication, Loss and Horizontal Gene Transfer Pawel Górecki Presented by Connor Magill November 20, 2008 Introduction Problem: Relationships between species cannot always be

More information

Reminder: Lecture 20: The Eight-Point Algorithm. Essential/Fundamental Matrix. E/F Matrix Summary. Computing F. Computing F from Point Matches

Reminder: Lecture 20: The Eight-Point Algorithm. Essential/Fundamental Matrix. E/F Matrix Summary. Computing F. Computing F from Point Matches Reminder: Lecture 20: The Eight-Point Algorithm F = -0.00310695-0.0025646 2.96584-0.028094-0.00771621 56.3813 13.1905-29.2007-9999.79 Readings T&V 7.3 and 7.4 Essential/Fundamental Matrix E/F Matrix Summary

More information

MLSTest Tutorial Contents

MLSTest Tutorial Contents MLSTest Tutorial Contents About MLSTest... 2 Installing MLSTest... 2 Loading Data... 3 Main window... 4 DATA Menu... 5 View, modify and export your alignments... 6 Alignment>viewer... 6 Alignment> export...

More information

Subspace Clustering with Global Dimension Minimization And Application to Motion Segmentation

Subspace Clustering with Global Dimension Minimization And Application to Motion Segmentation Subspace Clustering with Global Dimension Minimization And Application to Motion Segmentation Bryan Poling University of Minnesota Joint work with Gilad Lerman University of Minnesota The Problem of Subspace

More information

Iterative Signature Algorithm for the Analysis of Large-Scale Gene Expression Data. By S. Bergmann, J. Ihmels, N. Barkai

Iterative Signature Algorithm for the Analysis of Large-Scale Gene Expression Data. By S. Bergmann, J. Ihmels, N. Barkai Iterative Signature Algorithm for the Analysis of Large-Scale Gene Expression Data By S. Bergmann, J. Ihmels, N. Barkai Reasoning Both clustering and Singular Value Decomposition(SVD) are useful tools

More information

Weighted Powers Ranking Method

Weighted Powers Ranking Method Weighted Powers Ranking Method Introduction The Weighted Powers Ranking Method is a method for ranking sports teams utilizing both number of teams, and strength of the schedule (i.e. how good are the teams

More information

ROTS: Reproducibility Optimized Test Statistic

ROTS: Reproducibility Optimized Test Statistic ROTS: Reproducibility Optimized Test Statistic Fatemeh Seyednasrollah, Tomi Suomi, Laura L. Elo fatsey (at) utu.fi March 3, 2016 Contents 1 Introduction 2 2 Algorithm overview 3 3 Input data 3 4 Preprocessing

More information

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency 1 2 Parallel Algorithms for Linear Algebra Richard P. Brent Computer Sciences Laboratory Australian National University Outline Basic concepts Parallel architectures Practical design issues Programming

More information

Generation of distancebased phylogenetic trees

Generation of distancebased phylogenetic trees primer for practical phylogenetic data gathering. Uconn EEB3899-007. Spring 2015 Session 12 Generation of distancebased phylogenetic trees Rafael Medina (rafael.medina.bry@gmail.com) Yang Liu (yang.liu@uconn.edu)

More information

CSC 411 Lecture 18: Matrix Factorizations

CSC 411 Lecture 18: Matrix Factorizations CSC 411 Lecture 18: Matrix Factorizations Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 18-Matrix Factorizations 1 / 27 Overview Recall PCA: project data

More information

in interleaved format. The same data set in sequential format:

in interleaved format. The same data set in sequential format: PHYML user's guide Introduction PHYML is a software implementing a new method for building phylogenies from sequences using maximum likelihood. The executables can be downloaded at: http://www.lirmm.fr/~guindon/phyml.html.

More information

Genetics/MBT 541 Spring, 2002 Lecture 1 Joe Felsenstein Department of Genome Sciences Phylogeny methods, part 1 (Parsimony and such)

Genetics/MBT 541 Spring, 2002 Lecture 1 Joe Felsenstein Department of Genome Sciences Phylogeny methods, part 1 (Parsimony and such) Genetics/MBT 541 Spring, 2002 Lecture 1 Joe Felsenstein Department of Genome Sciences joe@gs Phylogeny methods, part 1 (Parsimony and such) Methods of reconstructing phylogenies (evolutionary trees) Parsimony

More information

CS 231A: Computer Vision (Winter 2018) Problem Set 2

CS 231A: Computer Vision (Winter 2018) Problem Set 2 CS 231A: Computer Vision (Winter 2018) Problem Set 2 Due Date: Feb 09 2018, 11:59pm Note: In this PS, using python2 is recommended, as the data files are dumped with python2. Using python3 might cause

More information

STEM-hy Tutorial Workshop on Molecular Evolution 2013

STEM-hy Tutorial Workshop on Molecular Evolution 2013 STEM-hy Tutorial Workshop on Molecular Evolution 2013 Getting started: To run the examples in this tutorial, you should copy the file STEMhy tutorial 2013.zip from the /class/shared/ directory and unzip

More information

Prior Distributions on Phylogenetic Trees

Prior Distributions on Phylogenetic Trees Prior Distributions on Phylogenetic Trees Magnus Johansson Masteruppsats i matematisk statistik Master Thesis in Mathematical Statistics Masteruppsats 2011:4 Matematisk statistik Juni 2011 www.math.su.se

More information

Study of a Simple Pruning Strategy with Days Algorithm

Study of a Simple Pruning Strategy with Days Algorithm Study of a Simple Pruning Strategy with ays Algorithm Thomas G. Kristensen Abstract We wish to calculate all pairwise Robinson Foulds distances in a set of trees. Traditional algorithms for doing this

More information

Unsupervised learning in Vision

Unsupervised learning in Vision Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual

More information

"PRINCIPLES OF PHYLOGENETICS" Spring 2008

PRINCIPLES OF PHYLOGENETICS Spring 2008 Integrative Biology 200A University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS" Spring 2008 Lab 7: Introduction to PAUP* Today we will be learning about some of the basic features of PAUP* (Phylogenetic

More information

CISC 636 Computational Biology & Bioinformatics (Fall 2016) Phylogenetic Trees (I)

CISC 636 Computational Biology & Bioinformatics (Fall 2016) Phylogenetic Trees (I) CISC 636 Computational iology & ioinformatics (Fall 2016) Phylogenetic Trees (I) Maximum Parsimony CISC636, F16, Lec13, Liao 1 Evolution Mutation, selection, Only the Fittest Survive. Speciation. t one

More information

The worst case complexity of Maximum Parsimony

The worst case complexity of Maximum Parsimony he worst case complexity of Maximum Parsimony mir armel Noa Musa-Lempel Dekel sur Michal Ziv-Ukelson Ben-urion University June 2, 20 / 2 What s a phylogeny Phylogenies: raph-like structures whose topology

More information

Advanced HBase Schema Design. Berlin Buzzwords, June 2012 Lars George

Advanced HBase Schema Design. Berlin Buzzwords, June 2012 Lars George Advanced HBase Schema Design Berlin Buzzwords, June 2012 Lars George lars@cloudera.com About Me SoluDons Architect @ Cloudera Apache HBase & Whirr CommiIer Author of HBase The Defini.ve Guide Working with

More information

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017 Clustering and Dimensionality Reduction Stony Brook University CSE545, Fall 2017 Goal: Generalize to new data Model New Data? Original Data Does the model accurately reflect new data? Supervised vs. Unsupervised

More information

Distributed Systems. Peer-to-Peer. Rik Sarkar. University of Edinburgh Fall 2018

Distributed Systems. Peer-to-Peer. Rik Sarkar. University of Edinburgh Fall 2018 Distributed Systems Peer-to-Peer Rik Sarkar University of Edinburgh Fall 2018 Peer to Peer The common percepdon A system for distribudng (sharing?) files Using the computers of common users (instead of

More information

Math 308 Autumn 2016 MIDTERM /18/2016

Math 308 Autumn 2016 MIDTERM /18/2016 Name: Math 38 Autumn 26 MIDTERM - 2 /8/26 Instructions: The exam is 9 pages long, including this title page. The number of points each problem is worth is listed after the problem number. The exam totals

More information

Convex Optimization / Homework 2, due Oct 3

Convex Optimization / Homework 2, due Oct 3 Convex Optimization 0-725/36-725 Homework 2, due Oct 3 Instructions: You must complete Problems 3 and either Problem 4 or Problem 5 (your choice between the two) When you submit the homework, upload a

More information

Regularization and model selection

Regularization and model selection CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial

More information

LSRN: A Parallel Iterative Solver for Strongly Over- or Under-Determined Systems

LSRN: A Parallel Iterative Solver for Strongly Over- or Under-Determined Systems LSRN: A Parallel Iterative Solver for Strongly Over- or Under-Determined Systems Xiangrui Meng Joint with Michael A. Saunders and Michael W. Mahoney Stanford University June 19, 2012 Meng, Saunders, Mahoney

More information

Bayesian Inference of Species Trees from Multilocus Data using *BEAST

Bayesian Inference of Species Trees from Multilocus Data using *BEAST Bayesian Inference of Species Trees from Multilocus Data using *BEAST Alexei J Drummond, Walter Xie and Joseph Heled April 13, 2012 Introduction We describe a full Bayesian framework for species tree estimation.

More information

Alternative Statistical Methods for Bone Atlas Modelling

Alternative Statistical Methods for Bone Atlas Modelling Alternative Statistical Methods for Bone Atlas Modelling Sharmishtaa Seshamani, Gouthami Chintalapani, Russell Taylor Department of Computer Science, Johns Hopkins University, Baltimore, MD Traditional

More information

Factorization with Missing and Noisy Data

Factorization with Missing and Noisy Data Factorization with Missing and Noisy Data Carme Julià, Angel Sappa, Felipe Lumbreras, Joan Serrat, and Antonio López Computer Vision Center and Computer Science Department, Universitat Autònoma de Barcelona,

More information

MetaPIGA v2.1. Maximum likelihood large phylogeny estimation using the metapopulation genetic algorithm (MetaGA) and other stochastic heuristics

MetaPIGA v2.1. Maximum likelihood large phylogeny estimation using the metapopulation genetic algorithm (MetaGA) and other stochastic heuristics MetaPIGA v2.1 Maximum likelihood large phylogeny estimation using the metapopulation genetic algorithm (MetaGA) and other stochastic heuristics Manual version 2.1 (June 20, 2011) Michel C. Milinkovitch

More information

Alignment of Trees and Directed Acyclic Graphs

Alignment of Trees and Directed Acyclic Graphs Alignment of Trees and Directed Acyclic Graphs Gabriel Valiente Algorithms, Bioinformatics, Complexity and Formal Methods Research Group Technical University of Catalonia Computational Biology and Bioinformatics

More information

Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation

Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation Daniel H. Huson and David Bryant Software Demo, ISMB, Detroit, June 27, 2005 Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document under the terms of the

More information

Reconstructing Reticulate Evolution in Species Theory and Practice

Reconstructing Reticulate Evolution in Species Theory and Practice Reconstructing Reticulate Evolution in Species Theory and Practice Luay Nakhleh Department of Computer Science Rice University Houston, Texas 77005 nakhleh@cs.rice.edu Tandy Warnow Department of Computer

More information

Seeing the wood for the trees: Analysing multiple alternative phylogenies

Seeing the wood for the trees: Analysing multiple alternative phylogenies Seeing the wood for the trees: Analysing multiple alternative phylogenies Tom M. W. Nye, Newcastle University tom.nye@ncl.ac.uk Isaac Newton Institute, 17 December 2007 Multiple alternative phylogenies

More information

Stat 547 Assignment 3

Stat 547 Assignment 3 Stat 547 Assignment 3 Release Date: Saturday April 16, 2011 Due Date: Wednesday, April 27, 2011 at 4:30 PST Note that the deadline for this assignment is one day before the final project deadline, and

More information

Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD

Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD Goals. The goal of the first part of this lab is to demonstrate how the SVD can be used to remove redundancies in data; in this example

More information

Which is more useful?

Which is more useful? Which is more useful? Reality Detailed map Detailed public transporta:on Simplified metro Models don t need to reflect reality A model is an inten:onal simplifica:on of a complex situa:on designed to eliminate

More information

Informa(on Retrieval

Informa(on Retrieval Introduc)on to Informa)on Retrieval CS3245 Informa(on Retrieval Lecture 8: A complete search system Scoring and results assembly 8 Ch. 6 Last Time: @- idf weighdng The @- idf weight of a term is the product

More information

Two C++ Libraries for Counting Trees on a Phylogenetic Terrace

Two C++ Libraries for Counting Trees on a Phylogenetic Terrace Two C++ Libraries for Counting Trees on a Phylogenetic Terrace R. Biczok 1, P. Bozsoky 1, P. Eisenmann 1, J. Ernst 1, T. Ribizel 1, F. Scholz 1, A. Trefzer 1, F. Weber 1, M. Hamann 1, and A. Stamatakis

More information

A Spectral-based Clustering Algorithm for Categorical Data Using Data Summaries (SCCADDS)

A Spectral-based Clustering Algorithm for Categorical Data Using Data Summaries (SCCADDS) A Spectral-based Clustering Algorithm for Categorical Data Using Data Summaries (SCCADDS) Eman Abdu eha90@aol.com Graduate Center The City University of New York Douglas Salane dsalane@jjay.cuny.edu Center

More information

CO405H. Department of Compu:ng Imperial College London. Computing in Space with OpenSPL Topic 15: Porting CPU Software to DFEs

CO405H. Department of Compu:ng Imperial College London. Computing in Space with OpenSPL Topic 15: Porting CPU Software to DFEs CO405H Computing in Space with OpenSPL Topic 15: Porting CPU Software to DFEs Oskar Mencer Georgi Gaydadjiev Department of Compu:ng Imperial College London h#p://www.doc.ic.ac.uk/~oskar/ h#p://www.doc.ic.ac.uk/~georgig/

More information

Copyright 2000, Kevin Wayne 1

Copyright 2000, Kevin Wayne 1 Chapter 3 - Graphs Undirected Graphs Undirected graph. G = (V, E) V = nodes. E = edges between pairs of nodes. Captures pairwise relationship between objects. Graph size parameters: n = V, m = E. Directed

More information

Improved parameterized complexity of the Maximum Agreement Subtree and Maximum Compatible Tree problems LIRMM, Tech.Rep. num 04026

Improved parameterized complexity of the Maximum Agreement Subtree and Maximum Compatible Tree problems LIRMM, Tech.Rep. num 04026 Improved parameterized complexity of the Maximum Agreement Subtree and Maximum Compatible Tree problems LIRMM, Tech.Rep. num 04026 Vincent Berry, François Nicolas Équipe Méthodes et Algorithmes pour la

More information

Time Series Prediction as a Problem of Missing Values: Application to ESTSP2007 and NN3 Competition Benchmarks

Time Series Prediction as a Problem of Missing Values: Application to ESTSP2007 and NN3 Competition Benchmarks Series Prediction as a Problem of Missing Values: Application to ESTSP7 and NN3 Competition Benchmarks Antti Sorjamaa and Amaury Lendasse Abstract In this paper, time series prediction is considered as

More information

CS 2410 Mid term (fall 2018)

CS 2410 Mid term (fall 2018) CS 2410 Mid term (fall 2018) Name: Question 1 (6+6+3=15 points): Consider two machines, the first being a 5-stage operating at 1ns clock and the second is a 12-stage operating at 0.7ns clock. Due to data

More information

Lecture: Bioinformatics

Lecture: Bioinformatics Lecture: Bioinformatics ENS Sacley, 2018 Some slides graciously provided by Daniel Huson & Celine Scornavacca Phylogenetic Trees - Motivation 2 / 31 2 / 31 Phylogenetic Trees - Motivation Motivation -

More information