Phylogenetic Trees Lecture 12. Section 7.4, in Durbin et al., 6.5 in Setubal et al. Shlomo Moran, Ilan Gronau
|
|
- Neil Long
- 5 years ago
- Views:
Transcription
1 Phylogenetic Trees Lecture 12 Section 7.4, in Durbin et al., 6.5 in Setubal et al. Shlomo Moran, Ilan Gronau.
2 Maximum Parsimony. Last week we presented Fitch algorithm for (unweighted) Maximum Parsimony: Input: A rooted binary tree with characters at the leaves Output: Assignment of characters to internal vertices which minimizes the number of mutations. Some mutations may be more probable than others. Hence, a natural generalization of the Maximum Parsimony problem is the Weighted Parsimony, defined next. 2
3 Weighted Parsimony (Sankoff s algorithm) Weighted Parsimony score: Input: Tree with characters at the leaves, and a weight function on the mutations: c(a,b) is the weight of the mutation a b. Output: assignment of characters to internal vertices which minimizes the total weight of the mutations The weighted parsimony score reduces to the parsimony score when c(a,a)=0 and c(a,b)=1 for all b other than a. 3
4 Weighted Parsimony on a Given Tree Each position is independent and computed by itself. Use Dynamic programming. if i is a node with children j and k, then S(i,a) = min b (S(j,b)+c(a,b)) + min b (S(k,b )+c(a,b )) S(j,b) j i S(i,a) k S(k,b ) S(j,b) the optimal score of a subtree rooted at j when j has the character b. 4
5 Evaluating Parsimony Scores Dynamic programming on a given tree Initialization: For each leaf i set S(i,a) = 0 if i is labeled by a, otherwise S(i,a) = Iteration: For each node with children j and k: S(i,a) = min x (S(j,x)+c(a,x)) + min y (S(k,y)+c(a,y)) Termination: cost of tree is min x S(r,x) where r is the root Comment: To reconstruct an optimal assignment, we need to keep in each node i and for each character a two characters x, y that minimize the cost when i has character a. 5
6 Cost of Evaluating Parsimony for binary trees For a tree with n nodes and a single character with k values, the complexity is O(nk 2 ). When there are m such characters, it is O(nmk 2 ). 6
7 1 st problem with the Maximum Parsimony Approach: Inconsistency Maximum Parsimony/Perfect phylogeny are reasonable assumptions for evolution of significant characters (ie characters whose states are unlikely to be created twice during the evolution process) They are less reasonable when the characters are DNA (or Protein) residues, as depicted next. 7
8 A possible DNA Sequence Evolution AAGGCCT AAGACTT TGGACTT -3 mil yrs -2 mil yrs Source: Tandy Warnow AGGGCAT TAGCCCT AGCACTT AGGGCAT TAGCCCA TAGACTT AGCACAA AGCGCTT -1 mil yrs today 8
9 Reversal and convergence may happen AAGGCCT AAGACTT TGGACTT -3 mil yrs -2 mil yrs Source: Tandy Warnow AGGGCAT TAGCCCT AGCACTT AGGGCAT TAGCCCA TAGACTT AGCACAA AGCGCTT -1 mil yrs today 9
10 The reconstruction task U V W X Y AGGGCAT TAGCCCA TAGACTT TGCACAA TGCGCTT Source: Tandy Warnow U X Y V W 10
11 2 nd problem with Maximum Parsimony (and other Character Based Algorithms): Efficiency There are no efficient algorithms for solving the big problem for maximum parsimony/perfect phylogeny (both are known to be NP hard). Mainly for this reason, the most used approaches for solving the big problem are distance based methods. 11
12 Distance-based Methods for Constructing Phylogenies This approach attempts to overcome the two weaknesses of maximum parsimony: 1. The distances are derived from a well defined statistical model of evolution (which allows reversals/convergences) 2. It provides efficient algorithms for the big problem. Basic idea: The differences between species (usually represented by sequences of characters) are transformed to numerical distances, and a tree realizing these distances is constructed. 12
13 Distance-Based Reconstruction Compute distances between all taxon-pairs Find a tree (edge-weighted) best-describing the distances D =
14 Data Distances Trees 1. Modeling question: given the data (eg DNA sequences of the taxa), how do we define distances between taxa? 2. Algorithmic question: Decide if the distances define a tree (ultrametric or additive to be defined later), and if so, construct that tree. 3. In reality, the computed distances are noisy. So we need the algorithm to return a tree which approximates the distances of the input data. In the following we shall study items 2 and 1, and briefly discuss item 3. 15
15 Ultrametric and Tree Metric A distance metric on a set M of L objects is a function d: M M R + (represented by a symmetric matrix) satisfying: d(i,i)=0, and for i j, d(i,j)>0 d(i,j)=d(j,i). For all i,j,k it holds that d(i,k) d(i,j)+d(j,k). A metric is ultrametric if it corresponds to distances between leaves of a tree which admits molecular clock. It is a tree metric, or additive, if it corresponds to distances between nodes in a weighted tree. 16
16 1 st model: Molecular Clock Ultrametric Trees molecular clock assumes a constant rate of evolution. Namely, the distance from a speciation event to the formation of current species is proportional to time, hence is identical for all paths (wrong assumption in reality). A directed tree satisfying this property is called ultrametric. 17
17 Ultrametric trees Definition: An ultrametric tree is a rooted weighted tree all of whose leaves are at the same depth. Basic property: Define the height of the leaves to be 0. Then edge weights can be represented by the heights of internal vertices. Edge weights: 5 Internal-vertices heights: 3 0: A E D B C
18 Least Common Ancestor and distances in Ultrametric Tree Let LCA(i,j) denote the least common ancestor of leaves i and j. Let height(lca(i, j)) be its distance from the leaves, and dist(i,j) be the distance from i to j. Observation: For any pair of leaves i, j in an ultrametric tree: height(lca(i,j)) = 0.5 dist(i,j). A B C D E 8 A B C D 0 5 A E D B C E 0 19
19 Ultrametric Matrices Definition: A distances matrix* U of dimension L L is ultrametric iff for each 3 indices i, j, k : U(i,j) max {U(i,k),U(j,k)}. j k Theorem: The following conditions are equivalent for an L L distance matrix U: 1. U is an ultrametric matrix. i j There is an ultrametric tree with L leaves such that for each pair of leaves i,j: U(i,j) = height(lca(i,j)) = ½ dist(i,j). * Recall: distance matrix is a symmetric matrix with positive non-diagonal entries, 0 diagonal entries, which satisfies the triangle inequality. 20
20 Ultrametric tree Ultrametric matrix There is an ultrametric tree s.t. U(i,j)=½dist(i,j). U is an ultrametric matrix: By properties of Least Common Ancestors in trees U(k,i) = U(j,i) U(k,j) k j i 21
21 Ultrametric matrix Ultrametric tree: We start with two observations: Definition: Let U be an L L matrix, and let S {1,...,L}. U[S] is the submatrix of U consisting of the rows and columns with indices from S. Observation 1: U is ultrametric iff for every S {1,...,L}, U[S] is ultrametric. Observation 2: If U is ultrametric and max i,j U(i,j)=M,, then M appears in every row of U. j k i?? One of the? Must be M j M 22
22 Ultrametric matrix Ultrametric tree: Proof by induction U is an ultrametric matrix U has an ultrametric tree : By induction on L, the size of U. i Basis: L= 1: T is a leaf i 0 i L= 2: T is a tree with two leaves i j i j 0 i j 23
23 Induction step Induction step: L>2. Use the 1 st row to split the set {1,,L} to two subsets: S 1 ={i: U(1,i) =M}, S 2 ={1,..,L}-S (note: 0< S i <L) S 1 ={2,4}, S 2 ={1,3,5} 24
24 Induction step By Observation 1, U[S 1 ] and U[S 2 ] are ultrametric. By induction, tree T 1 for S 1, with a root labeled M 1 M, and a tree T 2 for S 2 with root labeled M 2 < M (M 2 is the 2 nd largest element in row 1; if M 2 =0 then T 2 is a leaf). Join T 1 and T 2 to T with a root labeled M. M - M 2 M=M 1 [The construction when M 1 = M] M 2 < M T 2 T 1 25
25 Proof (end) Need to prove: T is an ultrametric tree for U ie, U(i,j) is the label of the LCA of i and j in T. If i and j are in the same subtree, this holds by induction. Else LCA(i,j) = M (since they are in different subtrees). Also, [U(1,i)= M and U(1,j) M] U(i,j) = M. i M j l M=M 2 M 1 T 1 T 2 i M 26
26 Efficient Algorithms for Constructing Ultrametric Trees Input: A distance matrix over a set S. Output: an ultrametric tree on the objects in S. (Note: we want our algorithm to be defined for all input metrics). Requirements: Consistency: If the input matrix is ultrametric, then the algorithm should return the corresponding tree (there is only one). Robustness: if the input matrix is not ultrametric, the algorithm should return an ultrametric close to it. In this course we ll concentrate on the 1 st requirement. 27
27 Reconstructing Ultrametrics: UPGMA Clustering Unweighted Pair Group Method using Averages Input: distance matrix over a set of species S. Output: an ultrametric phylogenetic tree on S. Outline: Initialization: Each object is a cluster. Place all clusters at height zero. At each iteration combine two closest clusters to get a new one, update distances to the new cluster and continue. This clustering algorithm is used in many other applications, such as data mining. 28
28 UPGMA While(#clusters > 1) do: Choose cluster pair i,j as neighbors, s.t. D(i,j) = min i j { D(i,j ) } Connect i,j to new cluster v Replace in D the pair i,j by v, and reduce D: For k v, D(v,k) = αd(i,k) + (1-α)D(j,k) α = C i Ci + C j Note: this reduction formula guarantees that the distance between clusters C i and C j is the average of distances between the elements in each cluster: 1 dc ( i, Cj) = d( pq, ) Ci Cj p Ci q Cj 29
29 Reduction Formulas in Closest Pair Clustering Algorithms The reduction formula (computing distances from new clusters) of UPGMA has several variants, for instance: For k v, D(v,k) = ½( D(i,k) + D(j,k) ) WPGMA or D(v,k) = min{d(i,k),d(j,k)} Single linkage Both these reduction keep the consistency of the algorithm. It is known (by simulations) that the chosen reduction formula may have a significant effect on the robustness of the algorithms to noise. 31
30 Example UPGMA construction on five objects. The length of an edge = its (vertical) height. d(i,j) is the distance between the leaves of C i and C j A B C D E A B C D d( H, G) d( F, G) + d( D, G) 3 3 F H I G E B C D E A 32
31 Consistency of UPGMA Proposition: If the input distances are ultrametric, then UPGMA will reconstruct the corresponding ultrametric tree T. Proof sketch: By induction on the number of iterations, show that the distance between two clusters is twice the height of the LCA of the corresponding subtrees. 33
32 Complexity of UPGMA Naïve implementation: n iterations, O(n 2 ) time for each iteration (to find a closest pair) O(n 3 ) total. Constructing heaps for each row and updating them each iteration O(n 2 log n) total Optimal implementation: O(n 2 ) total time. One such implementation, using mutually nearest neighbors is presented next. 34
33 The Nearest Neighbor Algorithm Let D be a distance metric. j is a nearest neighbor (NN) of i if [ j i] &[ d( i, j) = min{ d( i, k): k i}] (i, j) are mutual nearest neighbors if: i is NN of j and j is NN of i. In other words, if: di (, j) min{ dik (, ), d( jk, ): k i, j} 35
34 Ultrametric Reconstruction by Nearest Neighbor Chains algorithm n-1 neighbor-joining iterations While(#clusters > 1) do: Choose cluster pair i,j which are mutual nearest neighbors Connect i,j to new cluster v Replace i,j with the cluster v, and reduce the distance matrice D: For k v, D(v,k) = αd(i,k) + (1-α)D(j,k) Ci α = in UPGMA, but the algorithm is consistent for any 0 α 1. C + C i j (i.e., if the reduction is convex) 36
35 θ(n 2 ) implementation of NN chains Finding mutual nearest neighbors in O(n 2 ) total time: D: i 0 i 1 i 1 i Complete NN chain:,i r+1 is a Nearest Neighbour of i r Final pair (i l-1,i l ) are mutual nearest neighbors. Find minimal entry (i r,i r+1 ) in row i r. i r+1 is a nearest neighbour of i r. Stop if (i r,i r+1 ) is also minimal in row i r+1 (i.e., (i r,i r+1 ) are mutual nearest neighbours) Otherwise, continue. i 0 i i 2 Mutual NN 37
36 θ(n 2 ) implementation of NN chains (cont.) An θ(n 2 ) implementation using Nearest Neighbors Chains: - Extend a chain until it is complete. - Select final pair for joining, and remove them from chain. Note: If the reduction formula is convex, then for each v, if before the reduction NN(v) {i,j}, then after the reduction NN(v) k. Hence the remaining chain is still NN chain - Mutual NN 38
37 O(n 2 ) implementation of NN chains (cont.) Complexity Analysis: Count number of row-minimum calculations (each taking O(n) time) : - n-1 terminations throughout the execution - 2(n-1) Edge deletions 2(n-1) extensions - Total for NN chains operations: O(n 2 ). - Updates: O(n) each iteration, total O(n 2 ). - Altogether O(n 2 ). 39
38 Consistency of NN Chains Proposition: If the input distances are ultrametric, then NN chains will reconstruct the corresponding ultrametric tree T. Proof sketch: similar to that of UPGMA Note: It can be shown that on all inputs, Nearest Neighbor will produce the same output as UPGMA. 40
39 Ultrametric vs. general trees NN chain (and UPGMA) construct ultrametric trees even if the distances are defined by non-ultrametric trees. 2 3 NN chain
40 Tree Metric (aka Additive Distances) A distance metric on a set M of L objects is a function d: M M R + (represented by a symmetric matrix) satisfying: d(i,i)=0, and for i j, d(i,j)>0 d(i,j)=d(j,i). For all i,j,k it holds that d(i,k) d(i,j)+d(j,k). If there is a weighted tree which realizes these distances, then the distance form a tree-metric. 42
41 Additive Distances (cont) Definition: A distance metric on a set M with L objects is additive if there is a tree T, L of its nodes correspond to the L objects, with positive weights on the edges, such that for all i,j, d(i,j) = d T (i,j), the length of the path from i to j in T. Note: Sometimes the tree is required to be binary, and then the edge weights are required to be non-negative. 43
42 Distances on three objects are additive: For L=3: There is always a (unique) tree with one internal node. i j k k i 0 a+b a+c j 0 b+c c k 0 b a m j di (, j) = a+ b i dik (, ) = a+ c d( j, k) = b+ c For instance 1 c = d( k, m) = [ d( i, k) + d( j, k) d( i, j)]
43 How about four objects? L=4: Not all distance metrics on 4 objects are additive: eg, there is no tree which realizes the below distances. i j k l i j k 0 3 l 0 45
44 The Four Points Condition A necessary condition for distances on four objects to be additive: its objects can be labeled i,j,k,l so that: d(i,k) + d(j,l) = d(i,l) +d(k,j) d(i,j) + d(k,l) i k {{i,j},{k,l}} is a split of {i,j,k,l}. j Proof: By the figure... l 46
45 The Four Points Condition Definition: A distance metric satisfies the four points condition iff any subset of four objects can be labeled i,j,k,l so that: d(i,k) + d(j,l) = d(i,l) +d(k,j) d(i,j) + d(k,l) i k j l 47
46 The Four Points Condition Theorem: The following 3 conditions are equivalent for a distance matrix D on a set M of L objects 1. D is additive 2. D satisfies the four points condition for all quartets in M. 3. There is an object r in M, s.t. D satisfies the 4 points condition for all quartets that include r. i k j l 48
47 The Four Points Condition Proof: we ll show that Additivity 4P Condition satisfied by al quartets: By the figure... k i j l 2 3: trivial 49
48 Proof that 3 1 4PC on all quartets which include r additivity Induction on the number of objects, L. For L 3 the condition is trivially true and a tree exists. For L=4: Consider 4 points which satisfy d(i,k) +d(j,l) = d(i,l) +d(j,k) d(i,j) + d(k,l) k c f l We will construct a tree T with 4 leaves, s.t. d T (,x,y) = d(x,y) for each pair x,y in {i,j,k,l}, a n m y b i j 50
49 Tree construction for L=4 Assume split {{i,j},{k,l}}: d (i,j)+d (k,l) d(j,k)+d (i,l) 1. Construct a tree for {i, j,k}, with internal vertex m 2. Construct a tree for {i,k,l}, by adding the vertex n and the edge (n,l). k l n m j The construction guarantees that d T (,x,y)=d(x,y)for all (x,y) except (j,l). i 51
50 Tree construction for L=4 d T (,x,y)=d(x,y)for all (x,y) except (j,l). Thus, since d T (i,j) + d T (k,l) d T (j,k) + d T (i,l), {{i,j},{k,l}} is a split of the tree T. k l By the proof that 1 2, we have for the tree T: d(j,l) = d(i,l)+ d(j,k)- d(i,k)= d T (i,l)+ d T (j,k)- d T (i,k)= d T (j,l) And hence d T (x,y)=d(x,y) for all x,y. n m j i 52
51 Corollary from the construction Corollary F: If d(i,k) +d(j,l) = d(i,l) +d(j,k) d(i,j) + d(k,l), then there is a unique tree which realizes all the distances except d(j,l), and this tree realizes also the distance d(j,l).* k l j i *(j,l) can be replaced by any pair in {i,j} {k,l}. 53
52 Induction step for L>4: For each pair of labeled nodes (i,j) in T, let c ij be defined by the following figure: r c ij 1 cij = [ d ( i, r ) + d ( j, r ) d ( i, j )] 2 m ij j i Pick i and j that maximize c ij. 54
53 Induction step: Construct (by induction) T on M \{i}. Add i (and possibly m ij ) to T, as in the figure. Then d(i,r) = d T (i,l) and d(j,r) = d T (j,r) Remains to prove: For each k {r,j} it holds that : d(i,k) = d T (i,k). r c ij m ij T j i 55
54 Induction step (cont.) Let k i,r be an arbitrary node in T. The maximality of c ij means that {{r,k},{i,j}} is a split of {i,j,k,r}. Thus, by Corollary F, since d(x,y)=d T (x,y) for each x,y in {i,j,k,r}, except d(k,i), we have also that d(k,i)=d T (k,i) too. r k c ij m ij T j i 56
Algorithms for Bioinformatics
Adapted from slides by Leena Salmena and Veli Mäkinen, which are partly from http: //bix.ucsd.edu/bioalgorithms/slides.php. 582670 Algorithms for Bioinformatics Lecture 6: Distance based clustering and
More informationSequence length requirements. Tandy Warnow Department of Computer Science The University of Texas at Austin
Sequence length requirements Tandy Warnow Department of Computer Science The University of Texas at Austin Part 1: Absolute Fast Convergence DNA Sequence Evolution AAGGCCT AAGACTT TGGACTT -3 mil yrs -2
More informationEvolutionary tree reconstruction (Chapter 10)
Evolutionary tree reconstruction (Chapter 10) Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships between species since Darwin till early
More informationCS 581. Tandy Warnow
CS 581 Tandy Warnow This week Maximum parsimony: solving it on small datasets Maximum Likelihood optimization problem Felsenstein s pruning algorithm Bayesian MCMC methods Research opportunities Maximum
More informationIntroduction to Trees
Introduction to Trees Tandy Warnow December 28, 2016 Introduction to Trees Tandy Warnow Clades of a rooted tree Every node v in a leaf-labelled rooted tree defines a subset of the leafset that is below
More information11/17/2009 Comp 590/Comp Fall
Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 Problem Set #5 will be available tonight 11/17/2009 Comp 590/Comp 790-90 Fall 2009 1 Clique Graphs A clique is a graph with every vertex connected
More informationDistance based tree reconstruction. Hierarchical clustering (UPGMA) Neighbor-Joining (NJ)
Distance based tree reconstruction Hierarchical clustering (UPGMA) Neighbor-Joining (NJ) All organisms have evolved from a common ancestor. Infer the evolutionary tree (tree topology and edge lengths)
More informationRecent Research Results. Evolutionary Trees Distance Methods
Recent Research Results Evolutionary Trees Distance Methods Indo-European Languages After Tandy Warnow What is the purpose? Understand evolutionary history (relationship between species). Uderstand how
More informationSequence clustering. Introduction. Clustering basics. Hierarchical clustering
Sequence clustering Introduction Data clustering is one of the key tools used in various incarnations of data-mining - trying to make sense of large datasets. It is, thus, natural to ask whether clustering
More informationWhat is a phylogenetic tree? Algorithms for Computational Biology. Phylogenetics Summary. Di erent types of phylogenetic trees
What is a phylogenetic tree? Algorithms for Computational Biology Zsuzsanna Lipták speciation events Masters in Molecular and Medical Biotechnology a.a. 25/6, fall term Phylogenetics Summary wolf cat lion
More informationLecture 20: Clustering and Evolution
Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 11/11/2014 Comp 555 Bioalgorithms (Fall 2014) 1 Clique Graphs A clique is a graph where every vertex is connected via an edge to every other
More informationLecture 20: Clustering and Evolution
Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 11/12/2013 Comp 465 Fall 2013 1 Clique Graphs A clique is a graph where every vertex is connected via an edge to every other vertex A clique
More informationCSE 549: Computational Biology
CSE 549: Computational Biology Phylogenomics 1 slides marked with * by Carl Kingsford Tree of Life 2 * H5N1 Influenza Strains Salzberg, Kingsford, et al., 2007 3 * H5N1 Influenza Strains The 2007 outbreak
More information4/4/16 Comp 555 Spring
4/4/16 Comp 555 Spring 2016 1 A clique is a graph where every vertex is connected via an edge to every other vertex A clique graph is a graph where each connected component is a clique The concept of clustering
More informationPhylogenetics. Introduction to Bioinformatics Dortmund, Lectures: Sven Rahmann. Exercises: Udo Feldkamp, Michael Wurst
Phylogenetics Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1 Phylogenetics phylum = tree phylogenetics: reconstruction of evolutionary
More informationEVOLUTIONARY DISTANCES INFERRING PHYLOGENIES
EVOLUTIONARY DISTANCES INFERRING PHYLOGENIES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 28 th November 2007 OUTLINE 1 INFERRING
More informationMolecular Evolution & Phylogenetics Complexity of the search space, distance matrix methods, maximum parsimony
Molecular Evolution & Phylogenetics Complexity of the search space, distance matrix methods, maximum parsimony Basic Bioinformatics Workshop, ILRI Addis Ababa, 12 December 2017 Learning Objectives understand
More informationTerminology. A phylogeny is the evolutionary history of an organism
Phylogeny Terminology A phylogeny is the evolutionary history of an organism A taxon (plural: taxa) is a group of (one or more) organisms, which a taxonomist adjudges to be a unit. A definition? from Wikipedia
More informationClustering of Proteins
Melroy Saldanha saldanha@stanford.edu CS 273 Project Report Clustering of Proteins Introduction Numerous genome-sequencing projects have led to a huge growth in the size of protein databases. Manual annotation
More informationPHYLOGENETIC RECONSTRUCTION methods attempt to find the evolutionary history of a given set of extant
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 14, Number 1, 2007 c Mary Ann Liebert, Inc. Pp. 1 15 DOI: 10.1089/cmb.2006.0115 Neighbor Joining Algorithms for Inferring Phylogenies via LCA Distances ILAN GRONAU
More informationParsimony-Based Approaches to Inferring Phylogenetic Trees
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 www.biostat.wisc.edu/bmi576.html Mark Craven craven@biostat.wisc.edu Fall 0 Phylogenetic tree approaches! three general types! distance:
More informationCISC 889 Bioinformatics (Spring 2003) Multiple Sequence Alignment
CISC 889 Bioinformatics (Spring 2003) Multiple Sequence Alignment Courtesy of jalview 1 Motivations Collective statistic Protein families Identification and representation of conserved sequence features
More informationOn the Optimality of the Neighbor Joining Algorithm
On the Optimality of the Neighbor Joining Algorithm Ruriko Yoshida Dept. of Statistics University of Kentucky Joint work with K. Eickmeyer, P. Huggins, and L. Pachter www.ms.uky.edu/ ruriko Louisville
More informationDistance-based Phylogenetic Methods Near a Polytomy
Distance-based Phylogenetic Methods Near a Polytomy Ruth Davidson and Seth Sullivant NCSU UIUC May 21, 2014 2 Phylogenetic trees model the common evolutionary history of a group of species Leaves = extant
More informationOlivier Gascuel Arbres formels et Arbre de la Vie Conférence ENS Cachan, septembre Arbres formels et Arbre de la Vie.
Arbres formels et Arbre de la Vie Olivier Gascuel Centre National de la Recherche Scientifique LIRMM, Montpellier, France www.lirmm.fr/gascuel 10 permanent researchers 2 technical staff 3 postdocs, 10
More informationGLOBEX Bioinformatics (Summer 2015) Multiple Sequence Alignment
GLOBEX Bioinformatics (Summer 2015) Multiple Sequence Alignment Scoring Dynamic Programming algorithms Heuristic algorithms CLUSTAL W Courtesy of jalview Motivations Collective (or aggregate) statistic
More informationAlgorithms and Data Structures: Minimum Spanning Trees I and II - Prim s Algorithm. ADS: lects 14 & 15 slide 1
Algorithms and Data Structures: Minimum Spanning Trees I and II - Prim s Algorithm ADS: lects 14 & 15 slide 1 Weighted Graphs Definition 1 A weighted (directed or undirected graph) is a pair (G, W ) consisting
More informationIntroduction to Triangulated Graphs. Tandy Warnow
Introduction to Triangulated Graphs Tandy Warnow Topics for today Triangulated graphs: theorems and algorithms (Chapters 11.3 and 11.9) Examples of triangulated graphs in phylogeny estimation (Chapters
More informationThe worst case complexity of Maximum Parsimony
he worst case complexity of Maximum Parsimony mir armel Noa Musa-Lempel Dekel sur Michal Ziv-Ukelson Ben-urion University June 2, 20 / 2 What s a phylogeny Phylogenies: raph-like structures whose topology
More informationIntroduction to Computational Phylogenetics
Introduction to Computational Phylogenetics Tandy Warnow The University of Texas at Austin No Institute Given This textbook is a draft, and should not be distributed. Much of what is in this textbook appeared
More informationFixed-Parameter Algorithms, IA166
Fixed-Parameter Algorithms, IA166 Sebastian Ordyniak Faculty of Informatics Masaryk University Brno Spring Semester 2013 Introduction Outline 1 Introduction Algorithms on Locally Bounded Treewidth Layer
More informationIntroduction to Graph Theory
Introduction to Graph Theory Tandy Warnow January 20, 2017 Graphs Tandy Warnow Graphs A graph G = (V, E) is an object that contains a vertex set V and an edge set E. We also write V (G) to denote the vertex
More informationHigh Dimensional Indexing by Clustering
Yufei Tao ITEE University of Queensland Recall that, our discussion so far has assumed that the dimensionality d is moderately high, such that it can be regarded as a constant. This means that d should
More informationMarkovian Models of Genetic Inheritance
Markovian Models of Genetic Inheritance Elchanan Mossel, U.C. Berkeley mossel@stat.berkeley.edu, http://www.cs.berkeley.edu/~mossel/ 6/18/12 1 General plan Define a number of Markovian Inheritance Models
More informationCluster analysis. Agnieszka Nowak - Brzezinska
Cluster analysis Agnieszka Nowak - Brzezinska Outline of lecture What is cluster analysis? Clustering algorithms Measures of Cluster Validity What is Cluster Analysis? Finding groups of objects such that
More informationCodon models. In reality we use codon model Amino acid substitution rates meet nucleotide models Codon(nucleotide triplet)
Phylogeny Codon models Last lecture: poor man s way of calculating dn/ds (Ka/Ks) Tabulate synonymous/non- synonymous substitutions Normalize by the possibilities Transform to genetic distance K JC or K
More informationCLUSTERING IN BIOINFORMATICS
CLUSTERING IN BIOINFORMATICS CSE/BIMM/BENG 8 MAY 4, 0 OVERVIEW Define the clustering problem Motivation: gene expression and microarrays Types of clustering Clustering algorithms Other applications of
More information1 Matching in Non-Bipartite Graphs
CS 369P: Polyhedral techniques in combinatorial optimization Instructor: Jan Vondrák Lecture date: September 30, 2010 Scribe: David Tobin 1 Matching in Non-Bipartite Graphs There are several differences
More informationDISTANCE BASED METHODS IN PHYLOGENTIC TREE CONSTRUCTION
DISTANCE BASED METHODS IN PHYLOGENTIC TREE CONSTRUCTION CHUANG PENG DEPARTMENT OF MATHEMATICS MOREHOUSE COLLEGE ATLANTA, GA 30314 Abstract. One of the most fundamental aspects of bioinformatics in understanding
More informationDynamic Programming for Phylogenetic Estimation
1 / 45 Dynamic Programming for Phylogenetic Estimation CS598AGB Pranjal Vachaspati University of Illinois at Urbana-Champaign 2 / 45 Coalescent-based Species Tree Estimation Find evolutionary tree for
More informationA New Algorithm for the Reconstruction of Near-Perfect Binary Phylogenetic Trees
A New Algorithm for the Reconstruction of Near-Perfect Binary Phylogenetic Trees Kedar Dhamdhere, Srinath Sridhar, Guy E. Blelloch, Eran Halperin R. Ravi and Russell Schwartz March 17, 2005 CMU-CS-05-119
More informationSpecial course in Computer Science: Advanced Text Algorithms
Special course in Computer Science: Advanced Text Algorithms Lecture 8: Multiple alignments Elena Czeizler and Ion Petre Department of IT, Abo Akademi Computational Biomodelling Laboratory http://www.users.abo.fi/ipetre/textalg
More information9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology
9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example
More information1 Matchings in Graphs
Matchings in Graphs J J 2 J 3 J 4 J 5 J J J 6 8 7 C C 2 C 3 C 4 C 5 C C 7 C 8 6 J J 2 J 3 J 4 J 5 J J J 6 8 7 C C 2 C 3 C 4 C 5 C C 7 C 8 6 Definition Two edges are called independent if they are not adjacent
More informationNotes 4 : Approximating Maximum Parsimony
Notes 4 : Approximating Maximum Parsimony MATH 833 - Fall 2012 Lecturer: Sebastien Roch References: [SS03, Chapters 2, 5], [DPV06, Chapters 5, 9] 1 Coping with NP-completeness Local search heuristics.
More informationCS 361 Data Structures & Algs Lecture 11. Prof. Tom Hayes University of New Mexico
CS 361 Data Structures & Algs Lecture 11 Prof. Tom Hayes University of New Mexico 09-28-2010 1 Last Time Priority Queues & Heaps Heapify (up and down) 1: Preserve shape of tree 2: Swaps restore heap order
More informationShortest path problems
Next... Shortest path problems Single-source shortest paths in weighted graphs Shortest-Path Problems Properties of Shortest Paths, Relaxation Dijkstra s Algorithm Bellman-Ford Algorithm Shortest-Paths
More informationLecture: Bioinformatics
Lecture: Bioinformatics ENS Sacley, 2018 Some slides graciously provided by Daniel Huson & Celine Scornavacca Phylogenetic Trees - Motivation 2 / 31 2 / 31 Phylogenetic Trees - Motivation Motivation -
More informationChapter 3 Trees. Theorem A graph T is a tree if, and only if, every two distinct vertices of T are joined by a unique path.
Chapter 3 Trees Section 3. Fundamental Properties of Trees Suppose your city is planning to construct a rapid rail system. They want to construct the most economical system possible that will meet the
More informationEvolution of Tandemly Repeated Sequences
University of Canterbury Department of Mathematics and Statistics Evolution of Tandemly Repeated Sequences A thesis submitted in partial fulfilment of the requirements of the Degree for Master of Science
More informationAnswer Set Programming or Hypercleaning: Where does the Magic Lie in Solving Maximum Quartet Consistency?
Answer Set Programming or Hypercleaning: Where does the Magic Lie in Solving Maximum Quartet Consistency? Fathiyeh Faghih and Daniel G. Brown David R. Cheriton School of Computer Science, University of
More informationCS 231: Algorithmic Problem Solving
CS 231: Algorithmic Problem Solving Naomi Nishimura Module 5 Date of this version: June 14, 2018 WARNING: Drafts of slides are made available prior to lecture for your convenience. After lecture, slides
More information4 Basics of Trees. Petr Hliněný, FI MU Brno 1 FI: MA010: Trees and Forests
4 Basics of Trees Trees, actually acyclic connected simple graphs, are among the simplest graph classes. Despite their simplicity, they still have rich structure and many useful application, such as in
More informationDIMACS Tutorial on Phylogenetic Trees and Rapidly Evolving Pathogens. Katherine St. John City University of New York 1
DIMACS Tutorial on Phylogenetic Trees and Rapidly Evolving Pathogens Katherine St. John City University of New York 1 Thanks to the DIMACS Staff Linda Casals Walter Morris Nicole Clark Katherine St. John
More informationLecture 2 Pairwise sequence alignment. Principles Computational Biology Teresa Przytycka, PhD
Lecture 2 Pairwise sequence alignment. Principles Computational Biology Teresa Przytycka, PhD Assumptions: Biological sequences evolved by evolution. Micro scale changes: For short sequences (e.g. one
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationLecture 6: Analysis of Algorithms (CS )
Lecture 6: Analysis of Algorithms (CS583-002) Amarda Shehu October 08, 2014 1 Outline of Today s Class 2 Traversals Querying Insertion and Deletion Sorting with BSTs 3 Red-black Trees Height of a Red-black
More informationAlgebraic method for Shortest Paths problems
Lecture 1 (06.03.2013) Author: Jaros law B lasiok Algebraic method for Shortest Paths problems 1 Introduction In the following lecture we will see algebraic algorithms for various shortest-paths problems.
More informationDESIGN AND ANALYSIS OF ALGORITHMS (DAA 2017)
DESIGN AND ANALYSIS OF ALGORITHMS (DAA 2017) Veli Mäkinen Design and Analysis of Algorithms 2017 week 4 11.8.2017 1 Dynamic Programming Week 4 2 Design and Analysis of Algorithms 2017 week 4 11.8.2017
More informationImproved parameterized complexity of the Maximum Agreement Subtree and Maximum Compatible Tree problems LIRMM, Tech.Rep. num 04026
Improved parameterized complexity of the Maximum Agreement Subtree and Maximum Compatible Tree problems LIRMM, Tech.Rep. num 04026 Vincent Berry, François Nicolas Équipe Méthodes et Algorithmes pour la
More informationMultiple Sequence Alignment Sum-of-Pairs and ClustalW. Ulf Leser
Multiple Sequence Alignment Sum-of-Pairs and ClustalW Ulf Leser This Lecture Multiple Sequence Alignment The problem Theoretical approach: Sum-of-Pairs scores Practical approach: ClustalW Ulf Leser: Bioinformatics,
More informationTopic: Local Search: Max-Cut, Facility Location Date: 2/13/2007
CS880: Approximations Algorithms Scribe: Chi Man Liu Lecturer: Shuchi Chawla Topic: Local Search: Max-Cut, Facility Location Date: 2/3/2007 In previous lectures we saw how dynamic programming could be
More informationFast and Reliable Reconstruction of Phylogenetic Trees with Very Short Edges Extended Abstract
Fast and Reliable Reconstruction of Phylogenetic Trees with Very Short Edges Extended Abstract Ilan Gronau Shlomo Moran Sagi Snir Abstract Phylogenetic reconstruction is the problem of reconstructing an
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationMachine learning - HT Clustering
Machine learning - HT 2016 10. Clustering Varun Kanade University of Oxford March 4, 2016 Announcements Practical Next Week - No submission Final Exam: Pick up on Monday Material covered next week is not
More informationMATH 423 Linear Algebra II Lecture 17: Reduced row echelon form (continued). Determinant of a matrix.
MATH 423 Linear Algebra II Lecture 17: Reduced row echelon form (continued). Determinant of a matrix. Row echelon form A matrix is said to be in the row echelon form if the leading entries shift to the
More informationData Mining in Bioinformatics Day 1: Classification
Data Mining in Bioinformatics Day 1: Classification Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institute Tübingen and Eberhard Karls
More informationSolutions to Exam Data structures (X and NV)
Solutions to Exam Data structures X and NV 2005102. 1. a Insert the keys 9, 6, 2,, 97, 1 into a binary search tree BST. Draw the final tree. See Figure 1. b Add NIL nodes to the tree of 1a and color it
More informationGenetics/MBT 541 Spring, 2002 Lecture 1 Joe Felsenstein Department of Genome Sciences Phylogeny methods, part 1 (Parsimony and such)
Genetics/MBT 541 Spring, 2002 Lecture 1 Joe Felsenstein Department of Genome Sciences joe@gs Phylogeny methods, part 1 (Parsimony and such) Methods of reconstructing phylogenies (evolutionary trees) Parsimony
More informationThe Matrix-Tree Theorem and Its Applications to Complete and Complete Bipartite Graphs
The Matrix-Tree Theorem and Its Applications to Complete and Complete Bipartite Graphs Frankie Smith Nebraska Wesleyan University fsmith@nebrwesleyan.edu May 11, 2015 Abstract We will look at how to represent
More information1 Minimum Cut Problem
CS 6 Lecture 6 Min Cut and Karger s Algorithm Scribes: Peng Hui How, Virginia Williams (05) Date: November 7, 07 Anthony Kim (06), Mary Wootters (07) Adapted from Virginia Williams lecture notes Minimum
More informationCSE 417 Dynamic Programming (pt 4) Sub-problems on Trees
CSE 417 Dynamic Programming (pt 4) Sub-problems on Trees Reminders > HW4 is due today > HW5 will be posted shortly Dynamic Programming Review > Apply the steps... 1. Describe solution in terms of solution
More informationNear Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri
Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri Scene Completion Problem The Bare Data Approach High Dimensional Data Many real-world problems Web Search and Text Mining Billions
More information3 No-Wait Job Shops with Variable Processing Times
3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select
More informationRapid Neighbour-Joining
Rapid Neighbour-Joining Martin Simonsen, Thomas Mailund and Christian N. S. Pedersen Bioinformatics Research Center (BIRC), University of Aarhus, C. F. Møllers Allé, Building 1110, DK-8000 Århus C, Denmark.
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationDiscrete mathematics , Fall Instructor: prof. János Pach
Discrete mathematics 2016-2017, Fall Instructor: prof. János Pach - covered material - Lecture 1. Counting problems To read: [Lov]: 1.2. Sets, 1.3. Number of subsets, 1.5. Sequences, 1.6. Permutations,
More informationWrite an algorithm to find the maximum value that can be obtained by an appropriate placement of parentheses in the expression
Chapter 5 Dynamic Programming Exercise 5.1 Write an algorithm to find the maximum value that can be obtained by an appropriate placement of parentheses in the expression x 1 /x /x 3 /... x n 1 /x n, where
More information3. Cluster analysis Overview
Université Laval Multivariate analysis - February 2006 1 3.1. Overview 3. Cluster analysis Clustering requires the recognition of discontinuous subsets in an environment that is sometimes discrete (as
More informationExercise set 2 Solutions
Exercise set 2 Solutions Let H and H be the two components of T e and let F E(T ) consist of the edges of T with one endpoint in V (H), the other in V (H ) Since T is connected, F Furthermore, since T
More informationMa/CS 6b Class 13: Counting Spanning Trees
Ma/CS 6b Class 13: Counting Spanning Trees By Adam Sheffer Reminder: Spanning Trees A spanning tree is a tree that contains all of the vertices of the graph. A graph can contain many distinct spanning
More informationScaling species tree estimation methods to large datasets using NJMerge
Scaling species tree estimation methods to large datasets using NJMerge Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana Champaign 2018 Phylogenomics Software
More information15-451/651: Design & Analysis of Algorithms October 11, 2018 Lecture #13: Linear Programming I last changed: October 9, 2018
15-451/651: Design & Analysis of Algorithms October 11, 2018 Lecture #13: Linear Programming I last changed: October 9, 2018 In this lecture, we describe a very general problem called linear programming
More informationPhylogenetics on CUDA (Parallel) Architectures Bradly Alicea
Descent w/modification Descent w/modification Descent w/modification Descent w/modification CPU Descent w/modification Descent w/modification Phylogenetics on CUDA (Parallel) Architectures Bradly Alicea
More informationTreewidth and graph minors
Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under
More informationMining Social Network Graphs
Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be
More informationGraph Representations and Traversal
COMPSCI 330: Design and Analysis of Algorithms 02/08/2017-02/20/2017 Graph Representations and Traversal Lecturer: Debmalya Panigrahi Scribe: Tianqi Song, Fred Zhang, Tianyu Wang 1 Overview This lecture
More informationNeighbour Joining. Algorithms in BioInformatics 2 Mandatory Project 1 Magnus Erik Hvass Pedersen (971055) November 2004, Daimi, University of Aarhus
Neighbour Joining Algorithms in BioInformatics 2 Mandatory Project 1 Magnus Erik Hvass Pedersen (971055) November 2004, Daimi, University of Aarhus 1 Introduction The purpose of this report is to verify
More informationAlgorithms for Ultra-large Multiple Sequence Alignment and Phylogeny Estimation
Algorithms for Ultra-large Multiple Sequence Alignment and Phylogeny Estimation Tandy Warnow Department of Computer Science The University of Texas at Austin Phylogeny (evolutionary tree) Orangutan Gorilla
More informationCS473 - Algorithms I
CS473 - Algorithms I Lecture 4 The Divide-and-Conquer Design Paradigm View in slide-show mode 1 Reminder: Merge Sort Input array A sort this half sort this half Divide Conquer merge two sorted halves Combine
More informationLecture 5: Matrices. Dheeraj Kumar Singh 07CS1004 Teacher: Prof. Niloy Ganguly Department of Computer Science and Engineering IIT Kharagpur
Lecture 5: Matrices Dheeraj Kumar Singh 07CS1004 Teacher: Prof. Niloy Ganguly Department of Computer Science and Engineering IIT Kharagpur 29 th July, 2008 Types of Matrices Matrix Addition and Multiplication
More informationK-Anonymity. Definitions. How do you publicly release a database without compromising individual privacy?
K-Anonymity How do you publicly release a database without compromising individual privacy? The Wrong Approach: REU Summer 2007 Advisors: Ryan Williams and Manuel Blum Just leave out any unique identifiers
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationCS473-Algorithms I. Lecture 10. Dynamic Programming. Cevdet Aykanat - Bilkent University Computer Engineering Department
CS473-Algorithms I Lecture 1 Dynamic Programming 1 Introduction An algorithm design paradigm like divide-and-conquer Programming : A tabular method (not writing computer code) Divide-and-Conquer (DAC):
More information/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18
601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 22.1 Introduction We spent the last two lectures proving that for certain problems, we can
More informationGreedy Approximations
CS 787: Advanced Algorithms Instructor: Dieter van Melkebeek Greedy Approximations Approximation algorithms give a solution to a problem in polynomial time, at most a given factor away from the correct
More informationA Lookahead Branch-and-Bound Algorithm for the Maximum Quartet Consistency Problem
A Lookahead Branch-and-Bound Algorithm for the Maximum Quartet Consistency Problem Gang Wu Jia-Huai You Guohui Lin January 17, 2005 Abstract A lookahead branch-and-bound algorithm is proposed for solving
More informationPACKING DIGRAPHS WITH DIRECTED CLOSED TRAILS
PACKING DIGRAPHS WITH DIRECTED CLOSED TRAILS PAUL BALISTER Abstract It has been shown [Balister, 2001] that if n is odd and m 1,, m t are integers with m i 3 and t i=1 m i = E(K n) then K n can be decomposed
More informationDiscrete Optimization 2010 Lecture 5 Min-Cost Flows & Total Unimodularity
Discrete Optimization 2010 Lecture 5 Min-Cost Flows & Total Unimodularity Marc Uetz University of Twente m.uetz@utwente.nl Lecture 5: sheet 1 / 26 Marc Uetz Discrete Optimization Outline 1 Min-Cost Flows
More informationMemoization/Dynamic Programming. The String reconstruction problem. CS124 Lecture 11 Spring 2018
CS124 Lecture 11 Spring 2018 Memoization/Dynamic Programming Today s lecture discusses memoization, which is a method for speeding up algorithms based on recursion, by using additional memory to remember
More information