human chimp mouse rat

Similar documents
Evolutionary tree reconstruction (Chapter 10)

Molecular Evolution & Phylogenetics Complexity of the search space, distance matrix methods, maximum parsimony

Generation of distancebased phylogenetic trees

Codon models. In reality we use codon model Amino acid substitution rates meet nucleotide models Codon(nucleotide triplet)

Parsimony-Based Approaches to Inferring Phylogenetic Trees

EVOLUTIONARY DISTANCES INFERRING PHYLOGENIES

Algorithms for Bioinformatics

Sequence clustering. Introduction. Clustering basics. Hierarchical clustering

Phylogenetics. Introduction to Bioinformatics Dortmund, Lectures: Sven Rahmann. Exercises: Udo Feldkamp, Michael Wurst

Lecture 20: Clustering and Evolution

Lecture 20: Clustering and Evolution

Consider the character matrix shown in table 1. We assume that the investigator has conducted primary homology analysis such that:

Construction of a distance tree using clustering with the Unweighted Pair Group Method with Arithmatic Mean (UPGMA).

What is a phylogenetic tree? Algorithms for Computational Biology. Phylogenetics Summary. Di erent types of phylogenetic trees

4/4/16 Comp 555 Spring

Introduction to Trees

Terminology. A phylogeny is the evolutionary history of an organism

11/17/2009 Comp 590/Comp Fall

Phylogenetic Trees - Parsimony Tutorial #11

Recent Research Results. Evolutionary Trees Distance Methods

The worst case complexity of Maximum Parsimony

Isometric gene tree reconciliation revisited

Alignment ABC. Most slides are modified from Serafim s lectures

Parsimony methods. Chapter 1

Distance Methods. "PRINCIPLES OF PHYLOGENETICS" Spring 2006

Distance based tree reconstruction. Hierarchical clustering (UPGMA) Neighbor-Joining (NJ)

Phylogenetics on CUDA (Parallel) Architectures Bradly Alicea

Parsimony Least squares Minimum evolution Balanced minimum evolution Maximum likelihood (later in the course)

Assignment 3 Phylogenetic Tree Reconstruction and Motif Finding

BMI/CS 576 Fall 2015 Midterm Exam

Least Common Ancestor Based Method for Efficiently Constructing Rooted Supertrees

Basic Tree Building With PAUP

CSE 549: Computational Biology

12/5/17. trees. CS 220: Discrete Structures and their Applications. Trees Chapter 11 in zybooks. rooted trees. rooted trees

CISC 636 Computational Biology & Bioinformatics (Fall 2016) Phylogenetic Trees (I)

Scaling species tree estimation methods to large datasets using NJMerge

Lesson 13 Molecular Evolution

Seeing the wood for the trees: Analysing multiple alternative phylogenies

Sistemática Teórica. Hernán Dopazo. Biomedical Genomics and Evolution Lab. Lesson 03 Statistical Model Selection

Introduction to Computational Phylogenetics

Rapid Neighbour-Joining

Multiple sequence alignment. November 20, 2018

Lecture: Bioinformatics

46 Grundlagen der Bioinformatik, SoSe 11, D. Huson, May 16, 2011

CLUSTERING IN BIOINFORMATICS

CLC Phylogeny Module User manual

DIMACS Tutorial on Phylogenetic Trees and Rapidly Evolving Pathogens. Katherine St. John City University of New York 1

Trees. Example: Binary expression trees. Example: Evolution trees. Readings: HtDP, sections 14, 15, 16.

Tutorial. Phylogenetic Trees and Metadata. Sample to Insight. November 21, 2017

Olivier Gascuel Arbres formels et Arbre de la Vie Conférence ENS Cachan, septembre Arbres formels et Arbre de la Vie.

Undirected Graphs. V = { 1, 2, 3, 4, 5, 6, 7, 8 } E = { 1-2, 1-3, 2-3, 2-4, 2-5, 3-5, 3-7, 3-8, 4-5, 5-6 } n = 8 m = 11

Genetics/MBT 541 Spring, 2002 Lecture 1 Joe Felsenstein Department of Genome Sciences Phylogeny methods, part 1 (Parsimony and such)

Minimum spanning trees

Algorithms: Lecture 10. Chalmers University of Technology

March 20/2003 Jayakanth Srinivasan,

Evolution of Tandemly Repeated Sequences

Trees. Readings: HtDP, sections 14, 15, 16.

Trees. Binary arithmetic expressions. Visualizing binary arithmetic expressions. ((2 6) + (5 2))/(5 3) can be defined in terms of two smaller

On the Optimality of the Neighbor Joining Algorithm

Prior Distributions on Phylogenetic Trees

Quiz section 10. June 1, 2018

Unsupervised Learning. Unsupervised Learning. What is Clustering? Unsupervised Learning I Clustering 9/7/2017. Clustering

UNIVERSITY OF CALIFORNIA. Santa Barbara. Testing Trait Evolution Models on Phylogenetic Trees. A Thesis submitted in partial satisfaction of the

Gestão e Tratamento da Informação

One report (in pdf format) addressing each of following questions.

Phylogenetic Trees Lecture 12. Section 7.4, in Durbin et al., 6.5 in Setubal et al. Shlomo Moran, Ilan Gronau

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

ML phylogenetic inference and GARLI. Derrick Zwickl. University of Arizona (and University of Kansas) Workshop on Molecular Evolution 2015

We ve all seen trees. No, not that sprightly spruce in your garden, but your family tree. Rachael

Copyright 2000, Kevin Wayne 1

Clustering of Proteins

of the Balanced Minimum Evolution Polytope Ruriko Yoshida

Recoloring k-degenerate graphs

CS 581. Tandy Warnow

Throughout this course, we use the terms vertex and node interchangeably.

Throughout the chapter, we will assume that the reader is familiar with the basics of phylogenetic trees.

Package treedater. R topics documented: May 4, Type Package

Simulation of Molecular Evolution with Bioinformatics Analysis

CISC 889 Bioinformatics (Spring 2003) Multiple Sequence Alignment

A Comparison of Document Clustering Techniques

Graph and Digraph Glossary

(Refer Slide Time: 02.06)

Lab 8 Phylogenetics I: creating and analysing a data matrix

Multiple Sequence Alignment Gene Finding, Conserved Elements

Improvement of Distance-Based Phylogenetic Methods by a Local Maximum Likelihood Approach Using Triplets

CSE 421 Greedy Alg: Union Find/Dijkstra s Alg

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees

Bioinformatics Programming. EE, NCKU Tien-Hao Chang (Darby Chang)

Stats fest Multivariate analysis. Multivariate analyses. Aims. Multivariate analyses. Objects. Variables

Ecological cluster analysis with Past

Fast Hierarchical Clustering via Dynamic Closest Pairs

Stephen Scott.

3.1 Basic Definitions and Applications. Chapter 3. Graphs. Undirected Graphs. Some Graph Applications

Figure 4.1: The evolution of a rooted tree.

Chapter 3. Graphs CLRS Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Designing parallel algorithms for constructing large phylogenetic trees on Blue Waters

The Basics of Graphical Models

Reconstructing Broadcasts on Trees

DISTANCE BASED METHODS IN PHYLOGENTIC TREE CONSTRUCTION

Transcription:

Michael rudno These notes are based on earlier notes by Tomas abak Phylogenetic Trees Phylogenetic Trees demonstrate the amoun of evolution, and order of divergence for several genomes. Phylogenetic trees come in two types: rooted and unrooted. In a rooted tree we know the position of the last common ancestor of all organisms. In an unrooted tree we do not. or example, for many years there was an argument about the proper order of divergence for humans, dogs and mice: are humans close to dogs or to mice? The answer to this question would have no effect on the unrooted tree, but would change the location of the root. Within an unrooted tree the root is unspecified and can be anywhere. or example, the following tree emonstrates that,,, and shared a common ancestor, but we don t know when (unrooted tree). To find out where along the horizontal line common ancestor exist, we need an outgroup (an organism that diverged prior to existence of common ancestor). In other words, the outgroup defines the distance to common ancestry between the two highest nodes on a hierarchical tree (the two ancestors adjoining the root of the tree). or example if,,, and were human, chimp, mouse, and rat as shown below efines position of common ancestor human chimp mouse rat fish fish would define the distance from the mouse-rat common ancestor and human-chimp common ancestor to the common ancestor of both lineages. Unrooted trees have three edges incident on every node. Rooted trees have two children and one parent, except for at the root, where the parent is absent.

Michael rudno uilding Trees There are two types of approaches to building trees: (i) istance based methods, which work from pairwise distances between the sequences (e.g. UPGM, Neighbour Joining), (ii) character-based methods, which work directly from multiply aligned sequences (e.g. parsimony and likelihood approaches). While character-based methods are more accurate, they are also more complicated, and beyond the scope of the class. However it is notable that most phylogeny inference packages (e.g. PML) use these. istance-based approaches are also important because they represent a type of algorithm that w will see many times throughout this class clustering. In particular the UPGM approach we will discuss next is a classical example of hierarchical clustering. UPGM straightforward method to draw a tree is a greedy approach called the Unweighted Pair Group Method using rithmetic verages (UPGM). tree is drawn by recursively joining two closest nodes, grouping them into one cluster, and treating them as single node to which new distances are calculated. This is done until only two nodes remain. uppose you are given four sequences separated by the following distances: 4 5 8 7 9 7 To construct the tree we begin with a star tree, where the initial branch point is arbitrary.

Michael rudno and are the closest so let s group them under a new node called and calculate the distances to the new node: The distance from to can be calculated as: d(, ) + d(, )! d(, ) d(, ) = 2 You divide by 2 because you covered the path from to twice. The distance matrix then becomes: 4.5 6 7 and are now closest, repeating the procedure by grouping and into G produces the following tree: G N.. This is still an unrooted tree as we do not know when and diverged. Neighbor Joining UPGM produces the correct tree when the mutation rates along all branches are equal, however, this is not always the case. onsider the following phylogenetic tree (known to be correct):

Michael rudno (http://www.icp.ucl.ac.be/~opperd/private/neighbor.html) In this case diverged more quickly than. Thus, UPGM would join and producing an incorrect tree topology. Neighbour Joining (NJ) was developed to take non-uniform divergence rates into account. The idea is to produce a modified distance matrix that takes into account the net divergence rate of the entire tree, so that nodes close together but far apart from all other nodes are not necessarily joined directly. aitou and Nei (1987) derived a model for calculating the modified matrix. If d is the original distance matrix, the modified matrix = d! + where, ( Ri R j 1 R =.! i d im N # 2 m" N ) N is the set of leaves in the tree. The number of leaves is N. The tree topology is constructed as described above (UPGM) using the modified distance matrix. tteson (1999) proved this model to be correct under the assumption that the we are dealing with an additive tree (i.e. for any two leaves the distance between them is the sum of edges along a path that connects them). uppose you have the following distance matrix d (corresponding to the above tree):,

Michael rudno R i, the net divergence matrix for each sequence is calculated as the sum of each node to all other nodes. R() = (5+4+7+6+8)/4 = 30/4, R() = 42/4, R() = 32/4, R() = 38/4, R() = 34/4, R() = 44/4. is: It is now apparent that and, and and are closest and these are initially joined as neighbours. ample calculation for : = d = d = 5! ( =! 13! ( R + R i! ( R 30 4 j ) + R 42 + ) 4 ) We can now use the greedy algorithm to construct the tree. We would start by joining and (->), since this is one of the minima of (we could have also joined and, the other minimum), and recompute the distances to (nodes joining ; see figure below).

Michael rudno ubbing in for and, we can calculate the distances from the remaining nodes to, d() = d() + d() - d() / 2 = 3 d() = d() + d() - d() / 2 = 6 d() = d() + d() - d() / 2 = 5 d() = d() + d() - d() / 2 = 7 d becomes: 3 6 7 5 6 5 7 8 9 8 Where is: -19.5-18 -18.5-17.5-18 -20.5-19.5-20 -19-20 We would next join and -> T.

Michael rudno T We would continue to join nodes, recalculate d and until only two nodes remain.