Distance based tree reconstruction. Hierarchical clustering (UPGMA) Neighbor-Joining (NJ)

Size: px
Start display at page:

Download "Distance based tree reconstruction. Hierarchical clustering (UPGMA) Neighbor-Joining (NJ)"

Transcription

1 Distance based tree reconstruction Hierarchical clustering (UPGMA) Neighbor-Joining (NJ)

2 All organisms have evolved from a common ancestor. Infer the evolutionary tree (tree topology and edge lengths) from molecular data. A gene tree inferred from a homologous gene from different species might be different from the species tree.

3 Distance based reconstruction Goal: Reconstruct an evolutionary tree from a distance matrix Input: n x n distance matrix d Output: weighted tree T with n leaves fitting d If d is additive, this problem has an efficient solution (which we will come back to), otherwise we should aim at finding a tree which is as good as possible

4 Distance based reconstruction The ideal case The tree reflects the distance matrix, i.e. the distances induced by the tree are equal to the distances in the matrix If this is possible, then the distance matrix is called additive

5 A reconstruction method Least square methods Given a tree topology T, select edge lengths such that n n Q T = w ij D ij d ij 2 i=1 j=1 is minimized, where dij is given by the matrix, Dij is the distance induced by the tree and the selected edge lengths, and wij is the confidence we have in dij, e.g. wij=1. Optimal edge lengths can be found by standard least square methods in time O(n3). The large version of the problem when the tree is unknown is NP-complete...

6 Counting rooted binary trees

7 Counting rooted binary trees

8 Counting rooted binary trees

9 Hierarchical clustering and UPGMA

10 Hierarchical clustering Input: A set of n species (1,2,...,n) and a distance matrix d giving the pairwise distances. Output: A rooted binary tree T with edge lengths and leaves labelled 1, 2,..., n such that similar species (according to the distance matrix) are grouped in the same subtree, and the tree distances correspond to the distance matrix.

11 Example

12 Example Group the closet two data points.

13 Example In each iteration, we join the closet two cluster.

14 Example

15 Example

16 Hierarchical clustering Input: A set of n species (1,2,...,n) and a distance matrix d giving the pairwise distances. Output: A rooted binary tree T with edge lengths and leaves labelled 1, 2,..., n such that similar species (according to the distance matrix) are grouped in the same subtree, and the tree distances correspond to the distance matrix. Algorithm: Step 1: Let the n species 1,2,...,n be n clusters C1,..,Cn of size 1. Let each cluster correspond to a leaf in a tree T. The distance D(Ci,Cj) between cluster Ci and Cj is d(i,j). Step 2: Pick the pair of cluster Ci and Cj where D(Ci,Cj) is minimized and form a new cluster Ck by merging Ci and Cj, i.e. Ck = Ci U Cj. Make a new internal node Ck in tree T with children corresponding to Ci and Cj. Place internal node Ck at height D(Ci,Cj) / 2. Step 3: Update distance matrix D such that distance between the new cluster Ck and all remaining clusters are computed. Step 4: Repeat Step 2 and 3 until only one cluster remains.

17 Hierarchical clustering Input: A set of n species (1,2,...,n) and a distance matrix d giving the pairwise distances. Output: A rooted binary tree T with edge lengths and leaves labelled 1, 2,..., n such that similar species (according to the distance matrix) are grouped in the same subtree, and the tree distances correspond to the distance matrix. Algorithm: Step 1: Let the n species 1,2,...,n be n clusters C1,..,Cn of size 1. Let each cluster correspond to a leaf in a tree T. The distance D(Ci,Cj) between cluster Ci and Cj is d(i,j). Step 2: Pick the pair of cluster Ci and Cj where D(Ci,Cj) is minimized and form a new cluster Ck by merging Ci and Cj, i.e. Ck = Ci U Cj. Make a new internal node Ck in tree T with children corresponding to Ci and Cj. Place internal node Ck at height D(Ci,Cj) / 2. Step 3: Update distance matrix D such that distance between the new cluster Ck and all remaining clusters are computed. Step 4: Repeat Step 2 and 3 until only one cluster remains. How to define and compute the distances between clusters?

18 Distance measures between clusters

19 Distance measures between clusters

20 Distance measures between clusters

21 Distance measures between clusters

22 Hierarchical clustering (UPGMA) Input: A set of n species (1,2,...,n) and a distance matrix d giving the pairwise distances. Output: A rooted binary tree T with edge lengths and leaves labelled 1, 2,..., n such that similar species (according to the distance matrix) are grouped in the same subtree, and the tree distances correspond to the distance matrix. Algorithm: Step 1: Let the n species 1,2,...,n be n clusters C1,..,Cn of size 1. Let each cluster correspond to a leaf in a tree T. The distance D(Ci,Cj) between cluster Ci and Cj is d(i,j). Step 2: Pick the pair of cluster Ci and Cj where D(Ci,Cj) is minimized and form a new cluster Ck by merging Ci and Cj, i.e. Ck = Ci U Cj. Make a new internal node Ck in tree T with children corresponding to Ci and Cj. Place internal node Ck at height D(Ci,Cj) / 2. Step 3: Update distance matrix D such that distance between the new cluster Ck and all remaining clusters Cl are computed as: Step 4: Repeat Step 2 and 3 until only one cluster remains. Running time? Space consumption?

23 Hierarchical clustering (UPGMA) Input: A set of n species (1,2,...,n) and a distance matrix d giving the pairwise distances. Output: A rooted binary tree T with edge lengths and leaves labelled 1, 2,..., n such that similar species (according to the distance matrix) are grouped in the same subtree, and the tree distances correspond to the distance matrix. Algorithm: Step 1: Let the n species 1,2,...,n be n clusters C1,..,Cn of size 1. Let each cluster correspond to a leaf in a tree T. The distance D(Ci,Cj) between cluster Ci and Cj is d(i,j). Step 2: Pick the pair of cluster Ci and Cj where D(Ci,Cj) is minimized and form a new cluster Ck by merging Ci and Cj, i.e. Ck = Ci U Cj. Make a new internal node Ck in tree T with children corresponding to Ci and Cj. Place internal node Ck at height D(Ci,Cj) / 2. Step 3: Update distance matrix D such that distance between the new cluster Ck and all remaining clusters Cl are computed as: Step 4: Repeat Step 2 and 3 until only one cluster remains. Running time O(n3). Space consumption O(n2)

24 An implementation of UPGMA in Python

25 Proporties of an UPGMA tree

26 Trees reflecting a molecular clock A clocklike tree is a rooted tree, in which the total edge length from the root to any leaf is equal, i.e there is a molecular clock that ticks in a constant pace (the mutation rate is identical for all species), and all the observed species are at an equal number of ticks from the root... A clocklike tree induces an ultrametric distance matrix...

27 Ultrametric matrices and trees Formally: A matrix dij is ultrametric iff it is metric and any three species x, y, z satisfy that d(x,y) max{d(x,z), d(y,z)}, i.e there is a tie for the maximum of d(x,y), d(x,z), and d(y,z) Theorem (Gusfield): A symmetric matrix D has an ultra-metric tree iff D is an ultrametrix matrix Theorem (Gusfield): If D is ultrametric then the unique ultra-metric tree T for D can be constructed in time O(n2)

28 Two examples Ultrametric doesn t imply molecular clock a b 3 a a b c b c c 0 Molecular clock implies ultrametric 1 1 a 1 b 2 c a b c a b c

29 Improving the running time of UPGMA Input: A set of n species (1,2,...,n) and a distance matrix d giving the pairwise distances. Output: A rooted binary tree T with edge lengths and leaves labelled 1, 2,..., n such that similar species (according to the distance matrix) are grouped in the same subtree, and the tree distances correspond to the distance matrix. Algorithm: Step 1: Let the n species 1,2,...,n be n clusters C1,..,Cn of size 1. Let each cluster correspond to a leaf in a tree T. The distance D(Ci,Cj) between cluster Ci and Cj is d(i,j). Step 2: Pick the pair of cluster Ci and Cj where D(Ci,Cj) is minimized and form a new cluster Ck by merging Ci and Cj, i.e. Ck = Ci U Cj. Make a new internal node Ck in tree T with children corresponding to Ci and Cj. Place internal node Ck at height D(Ci,Cj) / 2. Step 3: Update distance matrix D such that distance between the new cluster Ck and all remaining clusters Cl are computed as: Step 4: Repeat Step 2 and 3 until only one cluster remains. Running time O(n3). Space consumption O(n2)

30 Improving the running time of UPGMA Input: A set of n species (1,2,...,n) and a distance matrix d giving the pairwise distances. Output: A rooted binary tree T with edge lengths and leaves labelled 1, 2,..., n such that similar species (according to the distance matrix) are grouped in the same subtree, and the tree distances correspond to the distance matrix. Algorithm: Step 1: Let the n species 1,2,...,n be n clusters C1,..,Cn of size 1. Let each cluster correspond to a leaf in a tree T. The distance D(Ci,Cj) between cluster Ci and Cj is d(i,j). Step 2: Pick the pair of cluster Ci and Cj where D(Ci,Cj) is minimized and form a new cluster Ck by merging Ci and Cj, i.e. Ck = Ci U Cj. Make a new internal node Ck in tree T with children corresponding to Ci and Cj. Place internal node Ck at height D(Ci,Cj) / 2. Step 3: Update distance matrix D such that distance between the new cluster Ck and all remaining clusters Cl are computed as: Step 4: Repeat Step 2 and 3 until only one cluster remains. Running time O(n3). Space consumption O(n2)

31 Improving the running time of UPGMA Input: A set of n species (1,2,...,n) and a distance matrix d giving the pairwise distances. Output: A rooted binary tree T with edge lengths and leaves labelled 1, 2,..., n such that similar species (according to the distance matrix) are grouped in the same subtree, and the tree distances correspond to the distance matrix. Algorithm: Step 1: Let the n species 1,2,...,n be n clusters C1,..,Cn of size 1. Let each cluster 2 correspond to a leaf in a tree T. The distance D(CTakes,C ) between cluster and Cj is d(i,j). O(n ) time. CanCbe improved to O(n). i j i Yields a total running time of O(n2). Step 2: Pick the pair of cluster Ci and Cj where D(Ci,Cj) is minimized and form a new cluster Ck by merging Ci and Cj, i.e. Ck = Ci U Cj. Make a new internal node Ck in tree T with children corresponding to Ci and Cj. Place internal node Ck at height D(Ci,Cj) / 2. Step 3: Update distance matrix D such that distance between the new cluster Ck and all remaining clusters Cl are computed as: Step 4: Repeat Step 2 and 3 until only one cluster remains. Running time O(n3). Space consumption O(n2)

32 Storing matrix D in a quad tree Building the quad tree for a nxn matrix takes time O(n2), when built we can pick the minimum in constant time O(1). Not surprising.

33 Storing matrix D in a quad tree Building the quad tree for a nxn matrix takes time O(n2), when built we can pick the minimum in constant time O(1). Not surprising. Question: How fast can we rebuilt the quad-tree to reflect the modified distance matrix we built in Step 3?

34 Updating the quad-tree Question: How fast can we rebuilt the quad-tree to reflect the modified distance matrix we built in Step 3? In Step 3, we only change 2 row/columns in D.

35 Updating the quad-tree Question: How fast can we rebuilt the quad-tree to reflect the modified distance matrix we built in Step 3? In Step 3, we only change 2 row/columns in D.

36 Updating the quad-tree Question: How fast can be rebuilt the quad-tree to reflect the modified distance matrix we built in Step 3? In Step 3, we only change 2 row/columns in D.

37 An implementation of a Quad Tree in Python

38 An implementation of a Quad Tree in Python

39 Improving the running time of UPGMA Input: A set of n species (1,2,...,n) and a distance matrix d giving the pairwise distances. Output: A rooted binary tree T with edge lengths and leaves labelled 1, 2,..., n such that similar species (according to the distance matrix) are grouped in the same subtree, and the tree distances correspond to the distance matrix. Algorithm: Step 1: Let the n species 1,2,...,n be n clusters C1,..,Cn of size 1. Let each cluster correspond to a leaf in a tree T. The distance D(Ci,Cj) between cluster Ci and Cj is d(i,j). Step 2: Pick the pair of cluster Ci and Cj where D(Ci,Cj) is minimized and form a new cluster Ck by merging Ci and Cj, i.e. Ck = Ci U Cj. Make a new internal node Ck in tree T with children corresponding to Ci and Cj. Place internal node Ck at height D(Ci,Cj) / 2. Step 3: Update distance matrix D such that distance between the new cluster Ck and all remaining clusters Cl are computed as: Step 4: Repeat Step 2 and 3 until only one cluster remains. Running time O(n2) using a quad-tree to store D. Space consumption O(n2)

40 Improving the running time of UPGMA Input: A set of n species (1,2,...,n) and a distance matrix d giving the pairwise distances. Output: A rooted binary tree T with edge lengths and leaves labelled 1, 2,..., n such that similar species (according to the distance matrix) are grouped in the same subtree, and the tree distances correspond to the distance matrix. Algorithm: O(n) Step 1: Let the n species 1,2,...,n be n clusters C1,..,Cn of size 1. Let each cluster correspond to a leaf in a tree T. The distance D(Ci,Cj) between cluster Ci and Cj is d(i,j). O(1) using quad-tree, O(n2) without quad-tree Step 2: Pick the pair of cluster Ci and Cj where D(Ci,Cj) is minimized and form a new cluster Ck by merging Ci and Cj, i.e. Ck = Ci U Cj. Make a new internal node Ck in tree T with children corresponding to Ci and Cj. Place internal node Ck at height D(Ci,Cj) / 2. O(n) for computing new distance and rebuilding quad-tree Step 3: Update distance matrix D such that distance between the new cluster Ck and all remaining clusters Cl are computed as: Step 4: Repeat Step 2 and 3 until only one cluster remains. Running time O(n2) using a quad-tree to store D. Space consumption O(n2)

41 Does it matter in practice? Tbasic(n) / TQuad(n) TBasic(n) TQuad(n) Tquad(n) / n2 Tbasic(n) / n3

42 Neighbor Joining

43 Neighbor Joining Input: A set of n species (1,2,...,n) and a distance matrix d giving the pairwise distances. Output: An unrooted binary tree T with edge lengths and leaves labelled 1, 2,..., n such that similar species (according to the distance matrix) are grouped in the same subtree, and the tree distances correspond to the distance matrix. Popular distance based phylogenetc inference method with good accuracy. Introduced by Saitou and Nei in Reformulated by Studier and Keppler in Critcized by many but stll widely used. Often used as a benchmark for other methods.

44 Neighbor Joining Input: A set of n species (1,2,...,n) and a distance matrix d giving the pairwise distances. Output: An unrooted binary tree T with edge lengths and leaves labelled 1, 2,..., n such that similar species (according to the distance matrix) are grouped in the same subtree, and the tree distances correspond to the distance matrix. Let us assume that the distance matrix is additive, i.e. corresponds to tree distances from some tree with unknown topology and edge length, then this is the tree we want to infer If we from the distance matrix can infer which leaves (and sub trees) are neighbors in the unknown (true) tree, we can reconstruct the tree iteratively...

45 Neighbor Joining

46 Neighbor Joining Not easy to infer neighboring leaves!!

47 Neighbor Joining Here D(i,j) = D(k,l) = D(j,l) =13, and D(j,k) =12, so the leaves with minimum distance in the distance matrix are not necessarily neighbors in the tree, i.e. UPGMA will not reconstruct the tree correctly. Not easy to infer neighboring leaves!!

48 Neighbor Joining

49 Neighbor Joining

50 Neighbor Joining Input: A set of n species (1,2,...,n) and a distance matrix d giving the pairwise distances. Output: An unrooted binary tree T with edge lengths and leaves labelled 1, 2,..., n such that similar species (according to the distance matrix) are grouped in the same subtree, and the tree distances correspond to the distance matrix. Algorithm:

51 Neighbor Joining Input: A set of n species (1,2,...,n) and a distance matrix d giving the pairwise distances. Output: An unrooted binary tree T with edge lengths and leaves labelled 1, 2,..., n such that similar species (according to the distance matrix) are grouped in the same subtree, and the tree distances correspond to the distance matrix. Algorithm:

52 Neighbor Joining Input: A set of n species (1,2,...,n) and a distance matrix d giving the pairwise distances. Output: An unrooted binary tree T with edge lengths and leaves labelled 1, 2,..., n such that similar species (according to the distance matrix) are grouped in the same subtree, and the tree distances correspond to the distance matrix. Algorithm:

53 Neighbor Joining Input: A set of n species (1,2,...,n) and a distance matrix d giving the pairwise distances. Output: An unrooted binary tree T with edge lengths and leaves labelled 1, 2,..., n such that similar species (according to the distance matrix) are grouped in the same subtree, and the tree distances correspond to the distance matrix. Algorithm:

54 Running time and space consumption

55 Running time and space consumption Can we improve the running time of Step 1 by using a quad-tree? Why? Why not?

56 Properties of Neighbor Joining

57 How 'deterministic' is NJ? An experiment To see how much this matters in practice, I made a small experiment. I took 37 distance matrices based on Pfam sequences, with between 100 and 200 taxa. [ ] For each matrix I constructed ten equivalent distance matrices by randomly permuting the order of taxa. Then I used these ten matrices to construct ten neighbourjoining trees and fnally computed the Robinson-Foulds between all pairs of these trees. [...] As you can see, you can get quite diferent trees out of equivalent distance matrices and on realistic data too. I didn t check if the diferences between the trees are correlated with how much support the splits have in the data. Looking at bootstrap values for the trees might tell us something about that. Anyway, that is not the take home message I want to give here. That is that it doesn t necessarily make sense to talk the neighbourjoining tree for a distance matrix. Neighbour-joining is less deterministic than you might think!

58 How 'deterministic' is NJ?

59 RapidNJ RapidNJ is an implementation of NJ which uses heuristics to avoid searching large parts of the D matrix. No guarantees are given on how much of the matrix is avoided. RapidNJ was compared with three fast Neighbour-Joining implementatons: QuickTree A fast implementaton of the canonical Neighbour-Joining method. QuickJoin Uses an advanced data structure to reduce the search space. Clearcut Relaxed Neighbour-Joining. R8s was used to simulate phylogenetc trees from which distance matrices were constructed. Alignments from Pfam were used to infer distance matrices using QuickTree. Also work on how to use external memory to allow construction oflarge trees (> leaves).

60 Results on Simulated Data

61 Results on small Pfam Data

62 Results on large Pfam Data

63 Running times on Pfam data

Molecular Evolution & Phylogenetics Complexity of the search space, distance matrix methods, maximum parsimony

Molecular Evolution & Phylogenetics Complexity of the search space, distance matrix methods, maximum parsimony Molecular Evolution & Phylogenetics Complexity of the search space, distance matrix methods, maximum parsimony Basic Bioinformatics Workshop, ILRI Addis Ababa, 12 December 2017 Learning Objectives understand

More information

Rapid Neighbour-Joining

Rapid Neighbour-Joining Rapid Neighbour-Joining Martin Simonsen, Thomas Mailund and Christian N. S. Pedersen Bioinformatics Research Center (BIRC), University of Aarhus, C. F. Møllers Allé, Building 1110, DK-8000 Århus C, Denmark.

More information

Evolutionary tree reconstruction (Chapter 10)

Evolutionary tree reconstruction (Chapter 10) Evolutionary tree reconstruction (Chapter 10) Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships between species since Darwin till early

More information

Rapid Neighbour-Joining

Rapid Neighbour-Joining Rapid Neighbour-Joining Martin Simonsen, Thomas Mailund, and Christian N.S. Pedersen Bioinformatics Research Center (BIRC), University of Aarhus, C. F. Møllers Allé, Building 1110, DK-8000 Århus C, Denmark

More information

What is a phylogenetic tree? Algorithms for Computational Biology. Phylogenetics Summary. Di erent types of phylogenetic trees

What is a phylogenetic tree? Algorithms for Computational Biology. Phylogenetics Summary. Di erent types of phylogenetic trees What is a phylogenetic tree? Algorithms for Computational Biology Zsuzsanna Lipták speciation events Masters in Molecular and Medical Biotechnology a.a. 25/6, fall term Phylogenetics Summary wolf cat lion

More information

Clustering of Proteins

Clustering of Proteins Melroy Saldanha saldanha@stanford.edu CS 273 Project Report Clustering of Proteins Introduction Numerous genome-sequencing projects have led to a huge growth in the size of protein databases. Manual annotation

More information

Algorithms for Bioinformatics

Algorithms for Bioinformatics Adapted from slides by Leena Salmena and Veli Mäkinen, which are partly from http: //bix.ucsd.edu/bioalgorithms/slides.php. 582670 Algorithms for Bioinformatics Lecture 6: Distance based clustering and

More information

Phylogenetic Trees Lecture 12. Section 7.4, in Durbin et al., 6.5 in Setubal et al. Shlomo Moran, Ilan Gronau

Phylogenetic Trees Lecture 12. Section 7.4, in Durbin et al., 6.5 in Setubal et al. Shlomo Moran, Ilan Gronau Phylogenetic Trees Lecture 12 Section 7.4, in Durbin et al., 6.5 in Setubal et al. Shlomo Moran, Ilan Gronau. Maximum Parsimony. Last week we presented Fitch algorithm for (unweighted) Maximum Parsimony:

More information

EVOLUTIONARY DISTANCES INFERRING PHYLOGENIES

EVOLUTIONARY DISTANCES INFERRING PHYLOGENIES EVOLUTIONARY DISTANCES INFERRING PHYLOGENIES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 28 th November 2007 OUTLINE 1 INFERRING

More information

11/17/2009 Comp 590/Comp Fall

11/17/2009 Comp 590/Comp Fall Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 Problem Set #5 will be available tonight 11/17/2009 Comp 590/Comp 790-90 Fall 2009 1 Clique Graphs A clique is a graph with every vertex connected

More information

Scaling species tree estimation methods to large datasets using NJMerge

Scaling species tree estimation methods to large datasets using NJMerge Scaling species tree estimation methods to large datasets using NJMerge Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana Champaign 2018 Phylogenomics Software

More information

FastJoin, an improved neighbor-joining algorithm

FastJoin, an improved neighbor-joining algorithm Methodology FastJoin, an improved neighbor-joining algorithm J. Wang, M.-Z. Guo and L.L. Xing School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, P.R. China

More information

Phylogenetics. Introduction to Bioinformatics Dortmund, Lectures: Sven Rahmann. Exercises: Udo Feldkamp, Michael Wurst

Phylogenetics. Introduction to Bioinformatics Dortmund, Lectures: Sven Rahmann. Exercises: Udo Feldkamp, Michael Wurst Phylogenetics Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1 Phylogenetics phylum = tree phylogenetics: reconstruction of evolutionary

More information

Introduction to Trees

Introduction to Trees Introduction to Trees Tandy Warnow December 28, 2016 Introduction to Trees Tandy Warnow Clades of a rooted tree Every node v in a leaf-labelled rooted tree defines a subset of the leafset that is below

More information

Codon models. In reality we use codon model Amino acid substitution rates meet nucleotide models Codon(nucleotide triplet)

Codon models. In reality we use codon model Amino acid substitution rates meet nucleotide models Codon(nucleotide triplet) Phylogeny Codon models Last lecture: poor man s way of calculating dn/ds (Ka/Ks) Tabulate synonymous/non- synonymous substitutions Normalize by the possibilities Transform to genetic distance K JC or K

More information

of the Balanced Minimum Evolution Polytope Ruriko Yoshida

of the Balanced Minimum Evolution Polytope Ruriko Yoshida Optimality of the Neighbor Joining Algorithm and Faces of the Balanced Minimum Evolution Polytope Ruriko Yoshida Figure 19.1 Genomes 3 ( Garland Science 2007) Origins of Species Tree (or web) of life eukarya

More information

Sequence clustering. Introduction. Clustering basics. Hierarchical clustering

Sequence clustering. Introduction. Clustering basics. Hierarchical clustering Sequence clustering Introduction Data clustering is one of the key tools used in various incarnations of data-mining - trying to make sense of large datasets. It is, thus, natural to ask whether clustering

More information

Recent Research Results. Evolutionary Trees Distance Methods

Recent Research Results. Evolutionary Trees Distance Methods Recent Research Results Evolutionary Trees Distance Methods Indo-European Languages After Tandy Warnow What is the purpose? Understand evolutionary history (relationship between species). Uderstand how

More information

On the Optimality of the Neighbor Joining Algorithm

On the Optimality of the Neighbor Joining Algorithm On the Optimality of the Neighbor Joining Algorithm Ruriko Yoshida Dept. of Statistics University of Kentucky Joint work with K. Eickmeyer, P. Huggins, and L. Pachter www.ms.uky.edu/ ruriko Louisville

More information

DIMACS Tutorial on Phylogenetic Trees and Rapidly Evolving Pathogens. Katherine St. John City University of New York 1

DIMACS Tutorial on Phylogenetic Trees and Rapidly Evolving Pathogens. Katherine St. John City University of New York 1 DIMACS Tutorial on Phylogenetic Trees and Rapidly Evolving Pathogens Katherine St. John City University of New York 1 Thanks to the DIMACS Staff Linda Casals Walter Morris Nicole Clark Katherine St. John

More information

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology 9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example

More information

Lecture 20: Clustering and Evolution

Lecture 20: Clustering and Evolution Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 11/11/2014 Comp 555 Bioalgorithms (Fall 2014) 1 Clique Graphs A clique is a graph where every vertex is connected via an edge to every other

More information

DISTANCE BASED METHODS IN PHYLOGENTIC TREE CONSTRUCTION

DISTANCE BASED METHODS IN PHYLOGENTIC TREE CONSTRUCTION DISTANCE BASED METHODS IN PHYLOGENTIC TREE CONSTRUCTION CHUANG PENG DEPARTMENT OF MATHEMATICS MOREHOUSE COLLEGE ATLANTA, GA 30314 Abstract. One of the most fundamental aspects of bioinformatics in understanding

More information

4/4/16 Comp 555 Spring

4/4/16 Comp 555 Spring 4/4/16 Comp 555 Spring 2016 1 A clique is a graph where every vertex is connected via an edge to every other vertex A clique graph is a graph where each connected component is a clique The concept of clustering

More information

Lecture 20: Clustering and Evolution

Lecture 20: Clustering and Evolution Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 11/12/2013 Comp 465 Fall 2013 1 Clique Graphs A clique is a graph where every vertex is connected via an edge to every other vertex A clique

More information

Gene expression & Clustering (Chapter 10)

Gene expression & Clustering (Chapter 10) Gene expression & Clustering (Chapter 10) Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species Dynamic programming Approximate pattern matching

More information

Distance-based Phylogenetic Methods Near a Polytomy

Distance-based Phylogenetic Methods Near a Polytomy Distance-based Phylogenetic Methods Near a Polytomy Ruth Davidson and Seth Sullivant NCSU UIUC May 21, 2014 2 Phylogenetic trees model the common evolutionary history of a group of species Leaves = extant

More information

Terminology. A phylogeny is the evolutionary history of an organism

Terminology. A phylogeny is the evolutionary history of an organism Phylogeny Terminology A phylogeny is the evolutionary history of an organism A taxon (plural: taxa) is a group of (one or more) organisms, which a taxonomist adjudges to be a unit. A definition? from Wikipedia

More information

A fast neighbor joining method

A fast neighbor joining method Methodology A fast neighbor joining method J.F. Li School of Computer Science and Technology, Civil Aviation University of China, Tianjin, China Corresponding author: J.F. Li E-mail: jfli@cauc.edu.cn Genet.

More information

Designing parallel algorithms for constructing large phylogenetic trees on Blue Waters

Designing parallel algorithms for constructing large phylogenetic trees on Blue Waters Designing parallel algorithms for constructing large phylogenetic trees on Blue Waters Erin Molloy University of Illinois at Urbana Champaign General Allocation (PI: Tandy Warnow) Exploratory Allocation

More information

CSE 549: Computational Biology

CSE 549: Computational Biology CSE 549: Computational Biology Phylogenomics 1 slides marked with * by Carl Kingsford Tree of Life 2 * H5N1 Influenza Strains Salzberg, Kingsford, et al., 2007 3 * H5N1 Influenza Strains The 2007 outbreak

More information

Olivier Gascuel Arbres formels et Arbre de la Vie Conférence ENS Cachan, septembre Arbres formels et Arbre de la Vie.

Olivier Gascuel Arbres formels et Arbre de la Vie Conférence ENS Cachan, septembre Arbres formels et Arbre de la Vie. Arbres formels et Arbre de la Vie Olivier Gascuel Centre National de la Recherche Scientifique LIRMM, Montpellier, France www.lirmm.fr/gascuel 10 permanent researchers 2 technical staff 3 postdocs, 10

More information

Distance Methods. "PRINCIPLES OF PHYLOGENETICS" Spring 2006

Distance Methods. PRINCIPLES OF PHYLOGENETICS Spring 2006 Integrative Biology 200A University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS" Spring 2006 Distance Methods Due at the end of class: - Distance matrices and trees for two different distance

More information

Study of a Simple Pruning Strategy with Days Algorithm

Study of a Simple Pruning Strategy with Days Algorithm Study of a Simple Pruning Strategy with ays Algorithm Thomas G. Kristensen Abstract We wish to calculate all pairwise Robinson Foulds distances in a set of trees. Traditional algorithms for doing this

More information

Dynamic Programming for Phylogenetic Estimation

Dynamic Programming for Phylogenetic Estimation 1 / 45 Dynamic Programming for Phylogenetic Estimation CS598AGB Pranjal Vachaspati University of Illinois at Urbana-Champaign 2 / 45 Coalescent-based Species Tree Estimation Find evolutionary tree for

More information

The worst case complexity of Maximum Parsimony

The worst case complexity of Maximum Parsimony he worst case complexity of Maximum Parsimony mir armel Noa Musa-Lempel Dekel sur Michal Ziv-Ukelson Ben-urion University June 2, 20 / 2 What s a phylogeny Phylogenies: raph-like structures whose topology

More information

Reconciliation Problems for Duplication, Loss and Horizontal Gene Transfer Pawel Górecki. Presented by Connor Magill November 20, 2008

Reconciliation Problems for Duplication, Loss and Horizontal Gene Transfer Pawel Górecki. Presented by Connor Magill November 20, 2008 Reconciliation Problems for Duplication, Loss and Horizontal Gene Transfer Pawel Górecki Presented by Connor Magill November 20, 2008 Introduction Problem: Relationships between species cannot always be

More information

human chimp mouse rat

human chimp mouse rat Michael rudno These notes are based on earlier notes by Tomas abak Phylogenetic Trees Phylogenetic Trees demonstrate the amoun of evolution, and order of divergence for several genomes. Phylogenetic trees

More information

Special course in Computer Science: Advanced Text Algorithms

Special course in Computer Science: Advanced Text Algorithms Special course in Computer Science: Advanced Text Algorithms Lecture 8: Multiple alignments Elena Czeizler and Ion Petre Department of IT, Abo Akademi Computational Biomodelling Laboratory http://www.users.abo.fi/ipetre/textalg

More information

Improvement of Distance-Based Phylogenetic Methods by a Local Maximum Likelihood Approach Using Triplets

Improvement of Distance-Based Phylogenetic Methods by a Local Maximum Likelihood Approach Using Triplets Improvement of Distance-Based Phylogenetic Methods by a Local Maximum Likelihood Approach Using Triplets Vincent Ranwez and Olivier Gascuel Département Informatique Fondamentale et Applications, LIRMM,

More information

A Lookahead Branch-and-Bound Algorithm for the Maximum Quartet Consistency Problem

A Lookahead Branch-and-Bound Algorithm for the Maximum Quartet Consistency Problem A Lookahead Branch-and-Bound Algorithm for the Maximum Quartet Consistency Problem Gang Wu Jia-Huai You Guohui Lin January 17, 2005 Abstract A lookahead branch-and-bound algorithm is proposed for solving

More information

Computing the All-Pairs Quartet Distance on a set of Evolutionary Trees

Computing the All-Pairs Quartet Distance on a set of Evolutionary Trees Journal of Bioinformatics and Computational Biology c Imperial College Press Computing the All-Pairs Quartet Distance on a set of Evolutionary Trees M. Stissing, T. Mailund, C. N. S. Pedersen and G. S.

More information

CLUSTERING IN BIOINFORMATICS

CLUSTERING IN BIOINFORMATICS CLUSTERING IN BIOINFORMATICS CSE/BIMM/BENG 8 MAY 4, 0 OVERVIEW Define the clustering problem Motivation: gene expression and microarrays Types of clustering Clustering algorithms Other applications of

More information

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction Computational Methods for Data Analysis Massimo Poesio UNSUPERVISED LEARNING Clustering Unsupervised learning introduction 1 Supervised learning Training set: Unsupervised learning Training set: 2 Clustering

More information

Lesson 13 Molecular Evolution

Lesson 13 Molecular Evolution Sequence Analysis Spring 2000 Dr. Richard Friedman (212)305-6901 (76901) friedman@cuccfa.ccc.columbia.edu 130BB Lesson 13 Molecular Evolution In this class we learn how to draw molecular evolutionary trees

More information

Sequence length requirements. Tandy Warnow Department of Computer Science The University of Texas at Austin

Sequence length requirements. Tandy Warnow Department of Computer Science The University of Texas at Austin Sequence length requirements Tandy Warnow Department of Computer Science The University of Texas at Austin Part 1: Absolute Fast Convergence DNA Sequence Evolution AAGGCCT AAGACTT TGGACTT -3 mil yrs -2

More information

A practical O(n log 2 n) time algorithm for computing the triplet distance on binary trees

A practical O(n log 2 n) time algorithm for computing the triplet distance on binary trees A practical O(n log 2 n) time algorithm for computing the triplet distance on binary trees Andreas Sand 1,2, Gerth Stølting Brodal 2,3, Rolf Fagerberg 4, Christian N. S. Pedersen 1,2 and Thomas Mailund

More information

Parsimony Least squares Minimum evolution Balanced minimum evolution Maximum likelihood (later in the course)

Parsimony Least squares Minimum evolution Balanced minimum evolution Maximum likelihood (later in the course) Tree Searching We ve discussed how we rank trees Parsimony Least squares Minimum evolution alanced minimum evolution Maximum likelihood (later in the course) So we have ways of deciding what a good tree

More information

Lecture: Bioinformatics

Lecture: Bioinformatics Lecture: Bioinformatics ENS Sacley, 2018 Some slides graciously provided by Daniel Huson & Celine Scornavacca Phylogenetic Trees - Motivation 2 / 31 2 / 31 Phylogenetic Trees - Motivation Motivation -

More information

Answer Set Programming or Hypercleaning: Where does the Magic Lie in Solving Maximum Quartet Consistency?

Answer Set Programming or Hypercleaning: Where does the Magic Lie in Solving Maximum Quartet Consistency? Answer Set Programming or Hypercleaning: Where does the Magic Lie in Solving Maximum Quartet Consistency? Fathiyeh Faghih and Daniel G. Brown David R. Cheriton School of Computer Science, University of

More information

Construction of a distance tree using clustering with the Unweighted Pair Group Method with Arithmatic Mean (UPGMA).

Construction of a distance tree using clustering with the Unweighted Pair Group Method with Arithmatic Mean (UPGMA). Construction of a distance tree using clustering with the Unweighted Pair Group Method with Arithmatic Mean (UPGMA). The UPGMA is the simplest method of tree construction. It was originally developed for

More information

Figure 4.1: The evolution of a rooted tree.

Figure 4.1: The evolution of a rooted tree. 106 CHAPTER 4. INDUCTION, RECURSION AND RECURRENCES 4.6 Rooted Trees 4.6.1 The idea of a rooted tree We talked about how a tree diagram helps us visualize merge sort or other divide and conquer algorithms.

More information

ABOUT THE LARGEST SUBTREE COMMON TO SEVERAL PHYLOGENETIC TREES Alain Guénoche 1, Henri Garreta 2 and Laurent Tichit 3

ABOUT THE LARGEST SUBTREE COMMON TO SEVERAL PHYLOGENETIC TREES Alain Guénoche 1, Henri Garreta 2 and Laurent Tichit 3 The XIII International Conference Applied Stochastic Models and Data Analysis (ASMDA-2009) June 30-July 3, 2009, Vilnius, LITHUANIA ISBN 978-9955-28-463-5 L. Sakalauskas, C. Skiadas and E. K. Zavadskas

More information

CLC Phylogeny Module User manual

CLC Phylogeny Module User manual CLC Phylogeny Module User manual User manual for Phylogeny Module 1.0 Windows, Mac OS X and Linux September 13, 2013 This software is for research purposes only. CLC bio Silkeborgvej 2 Prismet DK-8000

More information

CS 581. Tandy Warnow

CS 581. Tandy Warnow CS 581 Tandy Warnow This week Maximum parsimony: solving it on small datasets Maximum Likelihood optimization problem Felsenstein s pruning algorithm Bayesian MCMC methods Research opportunities Maximum

More information

M-ary Search Tree. B-Trees. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. Maximum branching factor of M Complete tree has height =

M-ary Search Tree. B-Trees. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. Maximum branching factor of M Complete tree has height = M-ary Search Tree B-Trees Section 4.7 in Weiss Maximum branching factor of M Complete tree has height = # disk accesses for find: Runtime of find: 2 Solution: B-Trees specialized M-ary search trees Each

More information

Exercises: Disjoint Sets/ Union-find [updated Feb. 3]

Exercises: Disjoint Sets/ Union-find [updated Feb. 3] Exercises: Disjoint Sets/ Union-find [updated Feb. 3] Questions 1) Suppose you have an implementation of union that is by-size and an implementation of find that does not use path compression. Give the

More information

Clustering Techniques

Clustering Techniques Clustering Techniques Bioinformatics: Issues and Algorithms CSE 308-408 Fall 2007 Lecture 16 Lopresti Fall 2007 Lecture 16-1 - Administrative notes Your final project / paper proposal is due on Friday,

More information

Introduction to Triangulated Graphs. Tandy Warnow

Introduction to Triangulated Graphs. Tandy Warnow Introduction to Triangulated Graphs Tandy Warnow Topics for today Triangulated graphs: theorems and algorithms (Chapters 11.3 and 11.9) Examples of triangulated graphs in phylogeny estimation (Chapters

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

SPR-BASED TREE RECONCILIATION: NON-BINARY TREES AND MULTIPLE SOLUTIONS

SPR-BASED TREE RECONCILIATION: NON-BINARY TREES AND MULTIPLE SOLUTIONS 1 SPR-BASED TREE RECONCILIATION: NON-BINARY TREES AND MULTIPLE SOLUTIONS C. THAN and L. NAKHLEH Department of Computer Science Rice University 6100 Main Street, MS 132 Houston, TX 77005, USA Email: {cvthan,nakhleh}@cs.rice.edu

More information

Seeing the wood for the trees: Analysing multiple alternative phylogenies

Seeing the wood for the trees: Analysing multiple alternative phylogenies Seeing the wood for the trees: Analysing multiple alternative phylogenies Tom M. W. Nye, Newcastle University tom.nye@ncl.ac.uk Isaac Newton Institute, 17 December 2007 Multiple alternative phylogenies

More information

M-ary Search Tree. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. B-Trees (4.7 in Weiss)

M-ary Search Tree. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. B-Trees (4.7 in Weiss) M-ary Search Tree B-Trees (4.7 in Weiss) Maximum branching factor of M Tree with N values has height = # disk accesses for find: Runtime of find: 1/21/2011 1 1/21/2011 2 Solution: B-Trees specialized M-ary

More information

Lecture 5 Finding meaningful clusters in data. 5.1 Kleinberg s axiomatic framework for clustering

Lecture 5 Finding meaningful clusters in data. 5.1 Kleinberg s axiomatic framework for clustering CSE 291: Unsupervised learning Spring 2008 Lecture 5 Finding meaningful clusters in data So far we ve been in the vector quantization mindset, where we want to approximate a data set by a small number

More information

Comparing Implementations of Optimal Binary Search Trees

Comparing Implementations of Optimal Binary Search Trees Introduction Comparing Implementations of Optimal Binary Search Trees Corianna Jacoby and Alex King Tufts University May 2017 In this paper we sought to put together a practical comparison of the optimality

More information

Generalized Neighbor-Joining: More Reliable Phylogenetic Tree Reconstruction

Generalized Neighbor-Joining: More Reliable Phylogenetic Tree Reconstruction Generalized Neighbor-Joining: More Reliable Phylogenetic Tree Reconstruction William R. Pearson, Gabriel Robins,* and Tongtong Zhang* *Department of Computer Science and Department of Biochemistry, University

More information

Lecture 5. Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs

Lecture 5. Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs Lecture 5 Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs Reading: Randomized Search Trees by Aragon & Seidel, Algorithmica 1996, http://sims.berkeley.edu/~aragon/pubs/rst96.pdf;

More information

CISC 636 Computational Biology & Bioinformatics (Fall 2016) Phylogenetic Trees (I)

CISC 636 Computational Biology & Bioinformatics (Fall 2016) Phylogenetic Trees (I) CISC 636 Computational iology & ioinformatics (Fall 2016) Phylogenetic Trees (I) Maximum Parsimony CISC636, F16, Lec13, Liao 1 Evolution Mutation, selection, Only the Fittest Survive. Speciation. t one

More information

Clustering analysis of gene expression data

Clustering analysis of gene expression data Clustering analysis of gene expression data Chapter 11 in Jonathan Pevsner, Bioinformatics and Functional Genomics, 3 rd edition (Chapter 9 in 2 nd edition) Human T cell expression data The matrix contains

More information

Computational Biology Lecture 6: Affine gap penalty function, multiple sequence alignment Saad Mneimneh

Computational Biology Lecture 6: Affine gap penalty function, multiple sequence alignment Saad Mneimneh Computational Biology Lecture 6: Affine gap penalty function, multiple sequence alignment Saad Mneimneh We saw earlier how we can use a concave gap penalty function γ, i.e. one that satisfies γ(x+1) γ(x)

More information

A Memetic Algorithm for Phylogenetic Reconstruction with Maximum Parsimony

A Memetic Algorithm for Phylogenetic Reconstruction with Maximum Parsimony A Memetic Algorithm for Phylogenetic Reconstruction with Maximum Parsimony Jean-Michel Richer 1 and Adrien Goëffon 2 and Jin-Kao Hao 1 1 University of Angers, LERIA, 2 Bd Lavoisier, 49045 Anger Cedex 01,

More information

Genetics/MBT 541 Spring, 2002 Lecture 1 Joe Felsenstein Department of Genome Sciences Phylogeny methods, part 1 (Parsimony and such)

Genetics/MBT 541 Spring, 2002 Lecture 1 Joe Felsenstein Department of Genome Sciences Phylogeny methods, part 1 (Parsimony and such) Genetics/MBT 541 Spring, 2002 Lecture 1 Joe Felsenstein Department of Genome Sciences joe@gs Phylogeny methods, part 1 (Parsimony and such) Methods of reconstructing phylogenies (evolutionary trees) Parsimony

More information

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned

More information

Phylogenetics on CUDA (Parallel) Architectures Bradly Alicea

Phylogenetics on CUDA (Parallel) Architectures Bradly Alicea Descent w/modification Descent w/modification Descent w/modification Descent w/modification CPU Descent w/modification Descent w/modification Phylogenetics on CUDA (Parallel) Architectures Bradly Alicea

More information

Prior Distributions on Phylogenetic Trees

Prior Distributions on Phylogenetic Trees Prior Distributions on Phylogenetic Trees Magnus Johansson Masteruppsats i matematisk statistik Master Thesis in Mathematical Statistics Masteruppsats 2011:4 Matematisk statistik Juni 2011 www.math.su.se

More information

Clustering Algorithms. Margareta Ackerman

Clustering Algorithms. Margareta Ackerman Clustering Algorithms Margareta Ackerman A sea of algorithms As we discussed last class, there are MANY clustering algorithms, and new ones are proposed all the time. They are very different from each

More information

Lesson 3. Prof. Enza Messina

Lesson 3. Prof. Enza Messina Lesson 3 Prof. Enza Messina Clustering techniques are generally classified into these classes: PARTITIONING ALGORITHMS Directly divides data points into some prespecified number of clusters without a hierarchical

More information

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other

More information

Tree Models of Similarity and Association. Clustering and Classification Lecture 5

Tree Models of Similarity and Association. Clustering and Classification Lecture 5 Tree Models of Similarity and Association Clustering and Lecture 5 Today s Class Tree models. Hierarchical clustering methods. Fun with ultrametrics. 2 Preliminaries Today s lecture is based on the monograph

More information

Background: disk access vs. main memory access (1/2)

Background: disk access vs. main memory access (1/2) 4.4 B-trees Disk access vs. main memory access: background B-tree concept Node structure Structural properties Insertion operation Deletion operation Running time 66 Background: disk access vs. main memory

More information

Introduction to Computational Phylogenetics

Introduction to Computational Phylogenetics Introduction to Computational Phylogenetics Tandy Warnow The University of Texas at Austin No Institute Given This textbook is a draft, and should not be distributed. Much of what is in this textbook appeared

More information

46 Grundlagen der Bioinformatik, SoSe 11, D. Huson, May 16, 2011

46 Grundlagen der Bioinformatik, SoSe 11, D. Huson, May 16, 2011 46 Grundlagen der Bioinformatik, SoSe 11, D. Huson, May 16, 011 Phylogeny Further reading and sources for parts of this hapter: R. Durbin, S. Eddy,. Krogh & G. Mitchison, Biological sequence analysis,

More information

PHYLOGENETIC RECONSTRUCTION methods attempt to find the evolutionary history of a given set of extant

PHYLOGENETIC RECONSTRUCTION methods attempt to find the evolutionary history of a given set of extant JOURNAL OF COMPUTATIONAL BIOLOGY Volume 14, Number 1, 2007 c Mary Ann Liebert, Inc. Pp. 1 15 DOI: 10.1089/cmb.2006.0115 Neighbor Joining Algorithms for Inferring Phylogenies via LCA Distances ILAN GRONAU

More information

Evolution of Tandemly Repeated Sequences

Evolution of Tandemly Repeated Sequences University of Canterbury Department of Mathematics and Statistics Evolution of Tandemly Repeated Sequences A thesis submitted in partial fulfilment of the requirements of the Degree for Master of Science

More information

Clustering, cont. Genome 373 Genomic Informatics Elhanan Borenstein. Some slides adapted from Jacques van Helden

Clustering, cont. Genome 373 Genomic Informatics Elhanan Borenstein. Some slides adapted from Jacques van Helden Clustering, cont Genome 373 Genomic Informatics Elhanan Borenstein Some slides adapted from Jacques van Helden Improving the search heuristic: Multiple starting points Simulated annealing Genetic algorithms

More information

Hierarchical Clustering Lecture 9

Hierarchical Clustering Lecture 9 Hierarchical Clustering Lecture 9 Marina Santini Acknowledgements Slides borrowed and adapted from: Data Mining by I. H. Witten, E. Frank and M. A. Hall 1 Lecture 9: Required Reading Witten et al. (2011:

More information

Relaxed Neighbor Joining: A Fast Distance-Based Phylogenetic Tree Construction Method

Relaxed Neighbor Joining: A Fast Distance-Based Phylogenetic Tree Construction Method J Mol Evol (2006) 62:785 792 DOI: 10.1007/s00239-005-0176-2 Relaxed Neighbor Joining: A Fast Distance-Based Phylogenetic Tree Construction Method Jason Evans, Luke Sheneman, James Foster Department of

More information

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19 CSE34T/CSE549T /05/04 Lecture 9 Treaps Binary Search Trees (BSTs) Search trees are tree-based data structures that can be used to store and search for items that satisfy a total order. There are many types

More information

Machine learning - HT Clustering

Machine learning - HT Clustering Machine learning - HT 2016 10. Clustering Varun Kanade University of Oxford March 4, 2016 Announcements Practical Next Week - No submission Final Exam: Pick up on Monday Material covered next week is not

More information

Binary Trees

Binary Trees Binary Trees 4-7-2005 Opening Discussion What did we talk about last class? Do you have any code to show? Do you have any questions about the assignment? What is a Tree? You are all familiar with what

More information

Markovian Models of Genetic Inheritance

Markovian Models of Genetic Inheritance Markovian Models of Genetic Inheritance Elchanan Mossel, U.C. Berkeley mossel@stat.berkeley.edu, http://www.cs.berkeley.edu/~mossel/ 6/18/12 1 General plan Define a number of Markovian Inheritance Models

More information

CS521 \ Notes for the Final Exam

CS521 \ Notes for the Final Exam CS521 \ Notes for final exam 1 Ariel Stolerman Asymptotic Notations: CS521 \ Notes for the Final Exam Notation Definition Limit Big-O ( ) Small-o ( ) Big- ( ) Small- ( ) Big- ( ) Notes: ( ) ( ) ( ) ( )

More information

The History Bound and ILP

The History Bound and ILP The History Bound and ILP Julia Matsieva and Dan Gusfield UC Davis March 15, 2017 Bad News for Tree Huggers More Bad News Far more convincingly even than the (also highly convincing) fossil evidence, the

More information

Improved parameterized complexity of the Maximum Agreement Subtree and Maximum Compatible Tree problems LIRMM, Tech.Rep. num 04026

Improved parameterized complexity of the Maximum Agreement Subtree and Maximum Compatible Tree problems LIRMM, Tech.Rep. num 04026 Improved parameterized complexity of the Maximum Agreement Subtree and Maximum Compatible Tree problems LIRMM, Tech.Rep. num 04026 Vincent Berry, François Nicolas Équipe Méthodes et Algorithmes pour la

More information

CS 441 Discrete Mathematics for CS Lecture 26. Graphs. CS 441 Discrete mathematics for CS. Final exam

CS 441 Discrete Mathematics for CS Lecture 26. Graphs. CS 441 Discrete mathematics for CS. Final exam CS 441 Discrete Mathematics for CS Lecture 26 Graphs Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Final exam Saturday, April 26, 2014 at 10:00-11:50am The same classroom as lectures The exam

More information

Indexing in Search Engines based on Pipelining Architecture using Single Link HAC

Indexing in Search Engines based on Pipelining Architecture using Single Link HAC Indexing in Search Engines based on Pipelining Architecture using Single Link HAC Anuradha Tyagi S. V. Subharti University Haridwar Bypass Road NH-58, Meerut, India ABSTRACT Search on the web is a daily

More information

Linked Structures Songs, Games, Movies Part III. Fall 2013 Carola Wenk

Linked Structures Songs, Games, Movies Part III. Fall 2013 Carola Wenk Linked Structures Songs, Games, Movies Part III Fall 2013 Carola Wenk Biological Structures Nature has evolved vascular and nervous systems in a hierarchical manner so that nutrients and signals can quickly

More information

Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms

Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms Padhraic Smyth Department of Computer Science Bren School of Information and Computer Sciences University of California,

More information

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Midterm Examination CS540-2: Introduction to Artificial Intelligence Midterm Examination CS540-2: Introduction to Artificial Intelligence March 15, 2018 LAST NAME: FIRST NAME: Problem Score Max Score 1 12 2 13 3 9 4 11 5 8 6 13 7 9 8 16 9 9 Total 100 Question 1. [12] Search

More information

Trees. Q: Why study trees? A: Many advance ADTs are implemented using tree-based data structures.

Trees. Q: Why study trees? A: Many advance ADTs are implemented using tree-based data structures. Trees Q: Why study trees? : Many advance DTs are implemented using tree-based data structures. Recursive Definition of (Rooted) Tree: Let T be a set with n 0 elements. (i) If n = 0, T is an empty tree,

More information