Phylogenetic Trees Lecture 12. Section 7.4, in Durbin et al., 6.5 in Setubal et al. Shlomo Moran, Ilan Gronau

Size: px
Start display at page:

Download "Phylogenetic Trees Lecture 12. Section 7.4, in Durbin et al., 6.5 in Setubal et al. Shlomo Moran, Ilan Gronau"

Transcription

1 Phylogenetic Trees Lecture 12 Section 7.4, in Durbin et al., 6.5 in Setubal et al. Shlomo Moran, Ilan Gronau.

2 Maximum Parsimony. Last week we presented Fitch algorithm for (unweighted) Maximum Parsimony: Input: A rooted binary tree with characters at the leaves Output: Assignment of characters to internal vertices which minimizes the number of mutations. Some mutations may be more probable than others. Hence, a natural generalization of the Maximum Parsimony problem is the Weighted Parsimony, defined next. 2

3 Weighted Parsimony (Sankoff s algorithm) Weighted Parsimony score: Input: Tree with characters at the leaves, and a weight function on the mutations: c(a,b) is the weight of the mutation a b. Output: assignment of characters to internal vertices which minimizes the total weight of the mutations The weighted parsimony score reduces to the parsimony score when c(a,a)=0 and c(a,b)=1 for all b other than a. 3

4 Weighted Parsimony on a Given Tree Each position is independent and computed by itself. Use Dynamic programming. if i is a node with children j and k, then S(i,a) = min b (S(j,b)+c(a,b)) + min b (S(k,b )+c(a,b )) S(j,b) j i S(i,a) k S(k,b ) S(j,b) the optimal score of a subtree rooted at j when j has the character b. 4

5 Evaluating Parsimony Scores Dynamic programming on a given tree Initialization: For each leaf i set S(i,a) = 0 if i is labeled by a, otherwise S(i,a) = Iteration: For each node with children j and k: S(i,a) = min x (S(j,x)+c(a,x)) + min y (S(k,y)+c(a,y)) Termination: cost of tree is min x S(r,x) where r is the root Comment: To reconstruct an optimal assignment, we need to keep in each node i and for each character a two characters x, y that minimize the cost when i has character a. 5

6 Cost of Evaluating Parsimony for binary trees For a tree with n nodes and a single character with k values, the complexity is O(nk 2 ). When there are m such characters, it is O(nmk 2 ). 6

7 1 st problem with the Maximum Parsimony Approach: Inconsistency Maximum Parsimony/Perfect phylogeny are reasonable assumptions for evolution of significant characters (ie characters whose states are unlikely to be created twice during the evolution process) They are less reasonable when the characters are DNA (or Protein) residues, as depicted next. 7

8 A possible DNA Sequence Evolution AAGGCCT AAGACTT TGGACTT -3 mil yrs -2 mil yrs Source: Tandy Warnow AGGGCAT TAGCCCT AGCACTT AGGGCAT TAGCCCA TAGACTT AGCACAA AGCGCTT -1 mil yrs today 8

9 Reversal and convergence may happen AAGGCCT AAGACTT TGGACTT -3 mil yrs -2 mil yrs Source: Tandy Warnow AGGGCAT TAGCCCT AGCACTT AGGGCAT TAGCCCA TAGACTT AGCACAA AGCGCTT -1 mil yrs today 9

10 The reconstruction task U V W X Y AGGGCAT TAGCCCA TAGACTT TGCACAA TGCGCTT Source: Tandy Warnow U X Y V W 10

11 2 nd problem with Maximum Parsimony (and other Character Based Algorithms): Efficiency There are no efficient algorithms for solving the big problem for maximum parsimony/perfect phylogeny (both are known to be NP hard). Mainly for this reason, the most used approaches for solving the big problem are distance based methods. 11

12 Distance-based Methods for Constructing Phylogenies This approach attempts to overcome the two weaknesses of maximum parsimony: 1. The distances are derived from a well defined statistical model of evolution (which allows reversals/convergences) 2. It provides efficient algorithms for the big problem. Basic idea: The differences between species (usually represented by sequences of characters) are transformed to numerical distances, and a tree realizing these distances is constructed. 12

13 Distance-Based Reconstruction Compute distances between all taxon-pairs Find a tree (edge-weighted) best-describing the distances D =

14 Data Distances Trees 1. Modeling question: given the data (eg DNA sequences of the taxa), how do we define distances between taxa? 2. Algorithmic question: Decide if the distances define a tree (ultrametric or additive to be defined later), and if so, construct that tree. 3. In reality, the computed distances are noisy. So we need the algorithm to return a tree which approximates the distances of the input data. In the following we shall study items 2 and 1, and briefly discuss item 3. 15

15 Ultrametric and Tree Metric A distance metric on a set M of L objects is a function d: M M R + (represented by a symmetric matrix) satisfying: d(i,i)=0, and for i j, d(i,j)>0 d(i,j)=d(j,i). For all i,j,k it holds that d(i,k) d(i,j)+d(j,k). A metric is ultrametric if it corresponds to distances between leaves of a tree which admits molecular clock. It is a tree metric, or additive, if it corresponds to distances between nodes in a weighted tree. 16

16 1 st model: Molecular Clock Ultrametric Trees molecular clock assumes a constant rate of evolution. Namely, the distance from a speciation event to the formation of current species is proportional to time, hence is identical for all paths (wrong assumption in reality). A directed tree satisfying this property is called ultrametric. 17

17 Ultrametric trees Definition: An ultrametric tree is a rooted weighted tree all of whose leaves are at the same depth. Basic property: Define the height of the leaves to be 0. Then edge weights can be represented by the heights of internal vertices. Edge weights: 5 Internal-vertices heights: 3 0: A E D B C

18 Least Common Ancestor and distances in Ultrametric Tree Let LCA(i,j) denote the least common ancestor of leaves i and j. Let height(lca(i, j)) be its distance from the leaves, and dist(i,j) be the distance from i to j. Observation: For any pair of leaves i, j in an ultrametric tree: height(lca(i,j)) = 0.5 dist(i,j). A B C D E 8 A B C D 0 5 A E D B C E 0 19

19 Ultrametric Matrices Definition: A distances matrix* U of dimension L L is ultrametric iff for each 3 indices i, j, k : U(i,j) max {U(i,k),U(j,k)}. j k Theorem: The following conditions are equivalent for an L L distance matrix U: 1. U is an ultrametric matrix. i j There is an ultrametric tree with L leaves such that for each pair of leaves i,j: U(i,j) = height(lca(i,j)) = ½ dist(i,j). * Recall: distance matrix is a symmetric matrix with positive non-diagonal entries, 0 diagonal entries, which satisfies the triangle inequality. 20

20 Ultrametric tree Ultrametric matrix There is an ultrametric tree s.t. U(i,j)=½dist(i,j). U is an ultrametric matrix: By properties of Least Common Ancestors in trees U(k,i) = U(j,i) U(k,j) k j i 21

21 Ultrametric matrix Ultrametric tree: We start with two observations: Definition: Let U be an L L matrix, and let S {1,...,L}. U[S] is the submatrix of U consisting of the rows and columns with indices from S. Observation 1: U is ultrametric iff for every S {1,...,L}, U[S] is ultrametric. Observation 2: If U is ultrametric and max i,j U(i,j)=M,, then M appears in every row of U. j k i?? One of the? Must be M j M 22

22 Ultrametric matrix Ultrametric tree: Proof by induction U is an ultrametric matrix U has an ultrametric tree : By induction on L, the size of U. i Basis: L= 1: T is a leaf i 0 i L= 2: T is a tree with two leaves i j i j 0 i j 23

23 Induction step Induction step: L>2. Use the 1 st row to split the set {1,,L} to two subsets: S 1 ={i: U(1,i) =M}, S 2 ={1,..,L}-S (note: 0< S i <L) S 1 ={2,4}, S 2 ={1,3,5} 24

24 Induction step By Observation 1, U[S 1 ] and U[S 2 ] are ultrametric. By induction, tree T 1 for S 1, with a root labeled M 1 M, and a tree T 2 for S 2 with root labeled M 2 < M (M 2 is the 2 nd largest element in row 1; if M 2 =0 then T 2 is a leaf). Join T 1 and T 2 to T with a root labeled M. M - M 2 M=M 1 [The construction when M 1 = M] M 2 < M T 2 T 1 25

25 Proof (end) Need to prove: T is an ultrametric tree for U ie, U(i,j) is the label of the LCA of i and j in T. If i and j are in the same subtree, this holds by induction. Else LCA(i,j) = M (since they are in different subtrees). Also, [U(1,i)= M and U(1,j) M] U(i,j) = M. i M j l M=M 2 M 1 T 1 T 2 i M 26

26 Efficient Algorithms for Constructing Ultrametric Trees Input: A distance matrix over a set S. Output: an ultrametric tree on the objects in S. (Note: we want our algorithm to be defined for all input metrics). Requirements: Consistency: If the input matrix is ultrametric, then the algorithm should return the corresponding tree (there is only one). Robustness: if the input matrix is not ultrametric, the algorithm should return an ultrametric close to it. In this course we ll concentrate on the 1 st requirement. 27

27 Reconstructing Ultrametrics: UPGMA Clustering Unweighted Pair Group Method using Averages Input: distance matrix over a set of species S. Output: an ultrametric phylogenetic tree on S. Outline: Initialization: Each object is a cluster. Place all clusters at height zero. At each iteration combine two closest clusters to get a new one, update distances to the new cluster and continue. This clustering algorithm is used in many other applications, such as data mining. 28

28 UPGMA While(#clusters > 1) do: Choose cluster pair i,j as neighbors, s.t. D(i,j) = min i j { D(i,j ) } Connect i,j to new cluster v Replace in D the pair i,j by v, and reduce D: For k v, D(v,k) = αd(i,k) + (1-α)D(j,k) α = C i Ci + C j Note: this reduction formula guarantees that the distance between clusters C i and C j is the average of distances between the elements in each cluster: 1 dc ( i, Cj) = d( pq, ) Ci Cj p Ci q Cj 29

29 Reduction Formulas in Closest Pair Clustering Algorithms The reduction formula (computing distances from new clusters) of UPGMA has several variants, for instance: For k v, D(v,k) = ½( D(i,k) + D(j,k) ) WPGMA or D(v,k) = min{d(i,k),d(j,k)} Single linkage Both these reduction keep the consistency of the algorithm. It is known (by simulations) that the chosen reduction formula may have a significant effect on the robustness of the algorithms to noise. 31

30 Example UPGMA construction on five objects. The length of an edge = its (vertical) height. d(i,j) is the distance between the leaves of C i and C j A B C D E A B C D d( H, G) d( F, G) + d( D, G) 3 3 F H I G E B C D E A 32

31 Consistency of UPGMA Proposition: If the input distances are ultrametric, then UPGMA will reconstruct the corresponding ultrametric tree T. Proof sketch: By induction on the number of iterations, show that the distance between two clusters is twice the height of the LCA of the corresponding subtrees. 33

32 Complexity of UPGMA Naïve implementation: n iterations, O(n 2 ) time for each iteration (to find a closest pair) O(n 3 ) total. Constructing heaps for each row and updating them each iteration O(n 2 log n) total Optimal implementation: O(n 2 ) total time. One such implementation, using mutually nearest neighbors is presented next. 34

33 The Nearest Neighbor Algorithm Let D be a distance metric. j is a nearest neighbor (NN) of i if [ j i] &[ d( i, j) = min{ d( i, k): k i}] (i, j) are mutual nearest neighbors if: i is NN of j and j is NN of i. In other words, if: di (, j) min{ dik (, ), d( jk, ): k i, j} 35

34 Ultrametric Reconstruction by Nearest Neighbor Chains algorithm n-1 neighbor-joining iterations While(#clusters > 1) do: Choose cluster pair i,j which are mutual nearest neighbors Connect i,j to new cluster v Replace i,j with the cluster v, and reduce the distance matrice D: For k v, D(v,k) = αd(i,k) + (1-α)D(j,k) Ci α = in UPGMA, but the algorithm is consistent for any 0 α 1. C + C i j (i.e., if the reduction is convex) 36

35 θ(n 2 ) implementation of NN chains Finding mutual nearest neighbors in O(n 2 ) total time: D: i 0 i 1 i 1 i Complete NN chain:,i r+1 is a Nearest Neighbour of i r Final pair (i l-1,i l ) are mutual nearest neighbors. Find minimal entry (i r,i r+1 ) in row i r. i r+1 is a nearest neighbour of i r. Stop if (i r,i r+1 ) is also minimal in row i r+1 (i.e., (i r,i r+1 ) are mutual nearest neighbours) Otherwise, continue. i 0 i i 2 Mutual NN 37

36 θ(n 2 ) implementation of NN chains (cont.) An θ(n 2 ) implementation using Nearest Neighbors Chains: - Extend a chain until it is complete. - Select final pair for joining, and remove them from chain. Note: If the reduction formula is convex, then for each v, if before the reduction NN(v) {i,j}, then after the reduction NN(v) k. Hence the remaining chain is still NN chain - Mutual NN 38

37 O(n 2 ) implementation of NN chains (cont.) Complexity Analysis: Count number of row-minimum calculations (each taking O(n) time) : - n-1 terminations throughout the execution - 2(n-1) Edge deletions 2(n-1) extensions - Total for NN chains operations: O(n 2 ). - Updates: O(n) each iteration, total O(n 2 ). - Altogether O(n 2 ). 39

38 Consistency of NN Chains Proposition: If the input distances are ultrametric, then NN chains will reconstruct the corresponding ultrametric tree T. Proof sketch: similar to that of UPGMA Note: It can be shown that on all inputs, Nearest Neighbor will produce the same output as UPGMA. 40

39 Ultrametric vs. general trees NN chain (and UPGMA) construct ultrametric trees even if the distances are defined by non-ultrametric trees. 2 3 NN chain

40 Tree Metric (aka Additive Distances) A distance metric on a set M of L objects is a function d: M M R + (represented by a symmetric matrix) satisfying: d(i,i)=0, and for i j, d(i,j)>0 d(i,j)=d(j,i). For all i,j,k it holds that d(i,k) d(i,j)+d(j,k). If there is a weighted tree which realizes these distances, then the distance form a tree-metric. 42

41 Additive Distances (cont) Definition: A distance metric on a set M with L objects is additive if there is a tree T, L of its nodes correspond to the L objects, with positive weights on the edges, such that for all i,j, d(i,j) = d T (i,j), the length of the path from i to j in T. Note: Sometimes the tree is required to be binary, and then the edge weights are required to be non-negative. 43

42 Distances on three objects are additive: For L=3: There is always a (unique) tree with one internal node. i j k k i 0 a+b a+c j 0 b+c c k 0 b a m j di (, j) = a+ b i dik (, ) = a+ c d( j, k) = b+ c For instance 1 c = d( k, m) = [ d( i, k) + d( j, k) d( i, j)]

43 How about four objects? L=4: Not all distance metrics on 4 objects are additive: eg, there is no tree which realizes the below distances. i j k l i j k 0 3 l 0 45

44 The Four Points Condition A necessary condition for distances on four objects to be additive: its objects can be labeled i,j,k,l so that: d(i,k) + d(j,l) = d(i,l) +d(k,j) d(i,j) + d(k,l) i k {{i,j},{k,l}} is a split of {i,j,k,l}. j Proof: By the figure... l 46

45 The Four Points Condition Definition: A distance metric satisfies the four points condition iff any subset of four objects can be labeled i,j,k,l so that: d(i,k) + d(j,l) = d(i,l) +d(k,j) d(i,j) + d(k,l) i k j l 47

46 The Four Points Condition Theorem: The following 3 conditions are equivalent for a distance matrix D on a set M of L objects 1. D is additive 2. D satisfies the four points condition for all quartets in M. 3. There is an object r in M, s.t. D satisfies the 4 points condition for all quartets that include r. i k j l 48

47 The Four Points Condition Proof: we ll show that Additivity 4P Condition satisfied by al quartets: By the figure... k i j l 2 3: trivial 49

48 Proof that 3 1 4PC on all quartets which include r additivity Induction on the number of objects, L. For L 3 the condition is trivially true and a tree exists. For L=4: Consider 4 points which satisfy d(i,k) +d(j,l) = d(i,l) +d(j,k) d(i,j) + d(k,l) k c f l We will construct a tree T with 4 leaves, s.t. d T (,x,y) = d(x,y) for each pair x,y in {i,j,k,l}, a n m y b i j 50

49 Tree construction for L=4 Assume split {{i,j},{k,l}}: d (i,j)+d (k,l) d(j,k)+d (i,l) 1. Construct a tree for {i, j,k}, with internal vertex m 2. Construct a tree for {i,k,l}, by adding the vertex n and the edge (n,l). k l n m j The construction guarantees that d T (,x,y)=d(x,y)for all (x,y) except (j,l). i 51

50 Tree construction for L=4 d T (,x,y)=d(x,y)for all (x,y) except (j,l). Thus, since d T (i,j) + d T (k,l) d T (j,k) + d T (i,l), {{i,j},{k,l}} is a split of the tree T. k l By the proof that 1 2, we have for the tree T: d(j,l) = d(i,l)+ d(j,k)- d(i,k)= d T (i,l)+ d T (j,k)- d T (i,k)= d T (j,l) And hence d T (x,y)=d(x,y) for all x,y. n m j i 52

51 Corollary from the construction Corollary F: If d(i,k) +d(j,l) = d(i,l) +d(j,k) d(i,j) + d(k,l), then there is a unique tree which realizes all the distances except d(j,l), and this tree realizes also the distance d(j,l).* k l j i *(j,l) can be replaced by any pair in {i,j} {k,l}. 53

52 Induction step for L>4: For each pair of labeled nodes (i,j) in T, let c ij be defined by the following figure: r c ij 1 cij = [ d ( i, r ) + d ( j, r ) d ( i, j )] 2 m ij j i Pick i and j that maximize c ij. 54

53 Induction step: Construct (by induction) T on M \{i}. Add i (and possibly m ij ) to T, as in the figure. Then d(i,r) = d T (i,l) and d(j,r) = d T (j,r) Remains to prove: For each k {r,j} it holds that : d(i,k) = d T (i,k). r c ij m ij T j i 55

54 Induction step (cont.) Let k i,r be an arbitrary node in T. The maximality of c ij means that {{r,k},{i,j}} is a split of {i,j,k,r}. Thus, by Corollary F, since d(x,y)=d T (x,y) for each x,y in {i,j,k,r}, except d(k,i), we have also that d(k,i)=d T (k,i) too. r k c ij m ij T j i 56

Algorithms for Bioinformatics

Algorithms for Bioinformatics Adapted from slides by Leena Salmena and Veli Mäkinen, which are partly from http: //bix.ucsd.edu/bioalgorithms/slides.php. 582670 Algorithms for Bioinformatics Lecture 6: Distance based clustering and

More information

Sequence length requirements. Tandy Warnow Department of Computer Science The University of Texas at Austin

Sequence length requirements. Tandy Warnow Department of Computer Science The University of Texas at Austin Sequence length requirements Tandy Warnow Department of Computer Science The University of Texas at Austin Part 1: Absolute Fast Convergence DNA Sequence Evolution AAGGCCT AAGACTT TGGACTT -3 mil yrs -2

More information

Evolutionary tree reconstruction (Chapter 10)

Evolutionary tree reconstruction (Chapter 10) Evolutionary tree reconstruction (Chapter 10) Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships between species since Darwin till early

More information

CS 581. Tandy Warnow

CS 581. Tandy Warnow CS 581 Tandy Warnow This week Maximum parsimony: solving it on small datasets Maximum Likelihood optimization problem Felsenstein s pruning algorithm Bayesian MCMC methods Research opportunities Maximum

More information

Introduction to Trees

Introduction to Trees Introduction to Trees Tandy Warnow December 28, 2016 Introduction to Trees Tandy Warnow Clades of a rooted tree Every node v in a leaf-labelled rooted tree defines a subset of the leafset that is below

More information

11/17/2009 Comp 590/Comp Fall

11/17/2009 Comp 590/Comp Fall Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 Problem Set #5 will be available tonight 11/17/2009 Comp 590/Comp 790-90 Fall 2009 1 Clique Graphs A clique is a graph with every vertex connected

More information

Distance based tree reconstruction. Hierarchical clustering (UPGMA) Neighbor-Joining (NJ)

Distance based tree reconstruction. Hierarchical clustering (UPGMA) Neighbor-Joining (NJ) Distance based tree reconstruction Hierarchical clustering (UPGMA) Neighbor-Joining (NJ) All organisms have evolved from a common ancestor. Infer the evolutionary tree (tree topology and edge lengths)

More information

Recent Research Results. Evolutionary Trees Distance Methods

Recent Research Results. Evolutionary Trees Distance Methods Recent Research Results Evolutionary Trees Distance Methods Indo-European Languages After Tandy Warnow What is the purpose? Understand evolutionary history (relationship between species). Uderstand how

More information

Sequence clustering. Introduction. Clustering basics. Hierarchical clustering

Sequence clustering. Introduction. Clustering basics. Hierarchical clustering Sequence clustering Introduction Data clustering is one of the key tools used in various incarnations of data-mining - trying to make sense of large datasets. It is, thus, natural to ask whether clustering

More information

What is a phylogenetic tree? Algorithms for Computational Biology. Phylogenetics Summary. Di erent types of phylogenetic trees

What is a phylogenetic tree? Algorithms for Computational Biology. Phylogenetics Summary. Di erent types of phylogenetic trees What is a phylogenetic tree? Algorithms for Computational Biology Zsuzsanna Lipták speciation events Masters in Molecular and Medical Biotechnology a.a. 25/6, fall term Phylogenetics Summary wolf cat lion

More information

Lecture 20: Clustering and Evolution

Lecture 20: Clustering and Evolution Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 11/11/2014 Comp 555 Bioalgorithms (Fall 2014) 1 Clique Graphs A clique is a graph where every vertex is connected via an edge to every other

More information

Lecture 20: Clustering and Evolution

Lecture 20: Clustering and Evolution Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 11/12/2013 Comp 465 Fall 2013 1 Clique Graphs A clique is a graph where every vertex is connected via an edge to every other vertex A clique

More information

CSE 549: Computational Biology

CSE 549: Computational Biology CSE 549: Computational Biology Phylogenomics 1 slides marked with * by Carl Kingsford Tree of Life 2 * H5N1 Influenza Strains Salzberg, Kingsford, et al., 2007 3 * H5N1 Influenza Strains The 2007 outbreak

More information

4/4/16 Comp 555 Spring

4/4/16 Comp 555 Spring 4/4/16 Comp 555 Spring 2016 1 A clique is a graph where every vertex is connected via an edge to every other vertex A clique graph is a graph where each connected component is a clique The concept of clustering

More information

Phylogenetics. Introduction to Bioinformatics Dortmund, Lectures: Sven Rahmann. Exercises: Udo Feldkamp, Michael Wurst

Phylogenetics. Introduction to Bioinformatics Dortmund, Lectures: Sven Rahmann. Exercises: Udo Feldkamp, Michael Wurst Phylogenetics Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1 Phylogenetics phylum = tree phylogenetics: reconstruction of evolutionary

More information

EVOLUTIONARY DISTANCES INFERRING PHYLOGENIES

EVOLUTIONARY DISTANCES INFERRING PHYLOGENIES EVOLUTIONARY DISTANCES INFERRING PHYLOGENIES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 28 th November 2007 OUTLINE 1 INFERRING

More information

Molecular Evolution & Phylogenetics Complexity of the search space, distance matrix methods, maximum parsimony

Molecular Evolution & Phylogenetics Complexity of the search space, distance matrix methods, maximum parsimony Molecular Evolution & Phylogenetics Complexity of the search space, distance matrix methods, maximum parsimony Basic Bioinformatics Workshop, ILRI Addis Ababa, 12 December 2017 Learning Objectives understand

More information

Terminology. A phylogeny is the evolutionary history of an organism

Terminology. A phylogeny is the evolutionary history of an organism Phylogeny Terminology A phylogeny is the evolutionary history of an organism A taxon (plural: taxa) is a group of (one or more) organisms, which a taxonomist adjudges to be a unit. A definition? from Wikipedia

More information

Clustering of Proteins

Clustering of Proteins Melroy Saldanha saldanha@stanford.edu CS 273 Project Report Clustering of Proteins Introduction Numerous genome-sequencing projects have led to a huge growth in the size of protein databases. Manual annotation

More information

PHYLOGENETIC RECONSTRUCTION methods attempt to find the evolutionary history of a given set of extant

PHYLOGENETIC RECONSTRUCTION methods attempt to find the evolutionary history of a given set of extant JOURNAL OF COMPUTATIONAL BIOLOGY Volume 14, Number 1, 2007 c Mary Ann Liebert, Inc. Pp. 1 15 DOI: 10.1089/cmb.2006.0115 Neighbor Joining Algorithms for Inferring Phylogenies via LCA Distances ILAN GRONAU

More information

Parsimony-Based Approaches to Inferring Phylogenetic Trees

Parsimony-Based Approaches to Inferring Phylogenetic Trees Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 www.biostat.wisc.edu/bmi576.html Mark Craven craven@biostat.wisc.edu Fall 0 Phylogenetic tree approaches! three general types! distance:

More information

CISC 889 Bioinformatics (Spring 2003) Multiple Sequence Alignment

CISC 889 Bioinformatics (Spring 2003) Multiple Sequence Alignment CISC 889 Bioinformatics (Spring 2003) Multiple Sequence Alignment Courtesy of jalview 1 Motivations Collective statistic Protein families Identification and representation of conserved sequence features

More information

On the Optimality of the Neighbor Joining Algorithm

On the Optimality of the Neighbor Joining Algorithm On the Optimality of the Neighbor Joining Algorithm Ruriko Yoshida Dept. of Statistics University of Kentucky Joint work with K. Eickmeyer, P. Huggins, and L. Pachter www.ms.uky.edu/ ruriko Louisville

More information

Distance-based Phylogenetic Methods Near a Polytomy

Distance-based Phylogenetic Methods Near a Polytomy Distance-based Phylogenetic Methods Near a Polytomy Ruth Davidson and Seth Sullivant NCSU UIUC May 21, 2014 2 Phylogenetic trees model the common evolutionary history of a group of species Leaves = extant

More information

Olivier Gascuel Arbres formels et Arbre de la Vie Conférence ENS Cachan, septembre Arbres formels et Arbre de la Vie.

Olivier Gascuel Arbres formels et Arbre de la Vie Conférence ENS Cachan, septembre Arbres formels et Arbre de la Vie. Arbres formels et Arbre de la Vie Olivier Gascuel Centre National de la Recherche Scientifique LIRMM, Montpellier, France www.lirmm.fr/gascuel 10 permanent researchers 2 technical staff 3 postdocs, 10

More information

GLOBEX Bioinformatics (Summer 2015) Multiple Sequence Alignment

GLOBEX Bioinformatics (Summer 2015) Multiple Sequence Alignment GLOBEX Bioinformatics (Summer 2015) Multiple Sequence Alignment Scoring Dynamic Programming algorithms Heuristic algorithms CLUSTAL W Courtesy of jalview Motivations Collective (or aggregate) statistic

More information

Algorithms and Data Structures: Minimum Spanning Trees I and II - Prim s Algorithm. ADS: lects 14 & 15 slide 1

Algorithms and Data Structures: Minimum Spanning Trees I and II - Prim s Algorithm. ADS: lects 14 & 15 slide 1 Algorithms and Data Structures: Minimum Spanning Trees I and II - Prim s Algorithm ADS: lects 14 & 15 slide 1 Weighted Graphs Definition 1 A weighted (directed or undirected graph) is a pair (G, W ) consisting

More information

Introduction to Triangulated Graphs. Tandy Warnow

Introduction to Triangulated Graphs. Tandy Warnow Introduction to Triangulated Graphs Tandy Warnow Topics for today Triangulated graphs: theorems and algorithms (Chapters 11.3 and 11.9) Examples of triangulated graphs in phylogeny estimation (Chapters

More information

The worst case complexity of Maximum Parsimony

The worst case complexity of Maximum Parsimony he worst case complexity of Maximum Parsimony mir armel Noa Musa-Lempel Dekel sur Michal Ziv-Ukelson Ben-urion University June 2, 20 / 2 What s a phylogeny Phylogenies: raph-like structures whose topology

More information

Introduction to Computational Phylogenetics

Introduction to Computational Phylogenetics Introduction to Computational Phylogenetics Tandy Warnow The University of Texas at Austin No Institute Given This textbook is a draft, and should not be distributed. Much of what is in this textbook appeared

More information

Fixed-Parameter Algorithms, IA166

Fixed-Parameter Algorithms, IA166 Fixed-Parameter Algorithms, IA166 Sebastian Ordyniak Faculty of Informatics Masaryk University Brno Spring Semester 2013 Introduction Outline 1 Introduction Algorithms on Locally Bounded Treewidth Layer

More information

Introduction to Graph Theory

Introduction to Graph Theory Introduction to Graph Theory Tandy Warnow January 20, 2017 Graphs Tandy Warnow Graphs A graph G = (V, E) is an object that contains a vertex set V and an edge set E. We also write V (G) to denote the vertex

More information

High Dimensional Indexing by Clustering

High Dimensional Indexing by Clustering Yufei Tao ITEE University of Queensland Recall that, our discussion so far has assumed that the dimensionality d is moderately high, such that it can be regarded as a constant. This means that d should

More information

Markovian Models of Genetic Inheritance

Markovian Models of Genetic Inheritance Markovian Models of Genetic Inheritance Elchanan Mossel, U.C. Berkeley mossel@stat.berkeley.edu, http://www.cs.berkeley.edu/~mossel/ 6/18/12 1 General plan Define a number of Markovian Inheritance Models

More information

Cluster analysis. Agnieszka Nowak - Brzezinska

Cluster analysis. Agnieszka Nowak - Brzezinska Cluster analysis Agnieszka Nowak - Brzezinska Outline of lecture What is cluster analysis? Clustering algorithms Measures of Cluster Validity What is Cluster Analysis? Finding groups of objects such that

More information

Codon models. In reality we use codon model Amino acid substitution rates meet nucleotide models Codon(nucleotide triplet)

Codon models. In reality we use codon model Amino acid substitution rates meet nucleotide models Codon(nucleotide triplet) Phylogeny Codon models Last lecture: poor man s way of calculating dn/ds (Ka/Ks) Tabulate synonymous/non- synonymous substitutions Normalize by the possibilities Transform to genetic distance K JC or K

More information

CLUSTERING IN BIOINFORMATICS

CLUSTERING IN BIOINFORMATICS CLUSTERING IN BIOINFORMATICS CSE/BIMM/BENG 8 MAY 4, 0 OVERVIEW Define the clustering problem Motivation: gene expression and microarrays Types of clustering Clustering algorithms Other applications of

More information

1 Matching in Non-Bipartite Graphs

1 Matching in Non-Bipartite Graphs CS 369P: Polyhedral techniques in combinatorial optimization Instructor: Jan Vondrák Lecture date: September 30, 2010 Scribe: David Tobin 1 Matching in Non-Bipartite Graphs There are several differences

More information

DISTANCE BASED METHODS IN PHYLOGENTIC TREE CONSTRUCTION

DISTANCE BASED METHODS IN PHYLOGENTIC TREE CONSTRUCTION DISTANCE BASED METHODS IN PHYLOGENTIC TREE CONSTRUCTION CHUANG PENG DEPARTMENT OF MATHEMATICS MOREHOUSE COLLEGE ATLANTA, GA 30314 Abstract. One of the most fundamental aspects of bioinformatics in understanding

More information

Dynamic Programming for Phylogenetic Estimation

Dynamic Programming for Phylogenetic Estimation 1 / 45 Dynamic Programming for Phylogenetic Estimation CS598AGB Pranjal Vachaspati University of Illinois at Urbana-Champaign 2 / 45 Coalescent-based Species Tree Estimation Find evolutionary tree for

More information

A New Algorithm for the Reconstruction of Near-Perfect Binary Phylogenetic Trees

A New Algorithm for the Reconstruction of Near-Perfect Binary Phylogenetic Trees A New Algorithm for the Reconstruction of Near-Perfect Binary Phylogenetic Trees Kedar Dhamdhere, Srinath Sridhar, Guy E. Blelloch, Eran Halperin R. Ravi and Russell Schwartz March 17, 2005 CMU-CS-05-119

More information

Special course in Computer Science: Advanced Text Algorithms

Special course in Computer Science: Advanced Text Algorithms Special course in Computer Science: Advanced Text Algorithms Lecture 8: Multiple alignments Elena Czeizler and Ion Petre Department of IT, Abo Akademi Computational Biomodelling Laboratory http://www.users.abo.fi/ipetre/textalg

More information

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology 9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example

More information

1 Matchings in Graphs

1 Matchings in Graphs Matchings in Graphs J J 2 J 3 J 4 J 5 J J J 6 8 7 C C 2 C 3 C 4 C 5 C C 7 C 8 6 J J 2 J 3 J 4 J 5 J J J 6 8 7 C C 2 C 3 C 4 C 5 C C 7 C 8 6 Definition Two edges are called independent if they are not adjacent

More information

Notes 4 : Approximating Maximum Parsimony

Notes 4 : Approximating Maximum Parsimony Notes 4 : Approximating Maximum Parsimony MATH 833 - Fall 2012 Lecturer: Sebastien Roch References: [SS03, Chapters 2, 5], [DPV06, Chapters 5, 9] 1 Coping with NP-completeness Local search heuristics.

More information

CS 361 Data Structures & Algs Lecture 11. Prof. Tom Hayes University of New Mexico

CS 361 Data Structures & Algs Lecture 11. Prof. Tom Hayes University of New Mexico CS 361 Data Structures & Algs Lecture 11 Prof. Tom Hayes University of New Mexico 09-28-2010 1 Last Time Priority Queues & Heaps Heapify (up and down) 1: Preserve shape of tree 2: Swaps restore heap order

More information

Shortest path problems

Shortest path problems Next... Shortest path problems Single-source shortest paths in weighted graphs Shortest-Path Problems Properties of Shortest Paths, Relaxation Dijkstra s Algorithm Bellman-Ford Algorithm Shortest-Paths

More information

Lecture: Bioinformatics

Lecture: Bioinformatics Lecture: Bioinformatics ENS Sacley, 2018 Some slides graciously provided by Daniel Huson & Celine Scornavacca Phylogenetic Trees - Motivation 2 / 31 2 / 31 Phylogenetic Trees - Motivation Motivation -

More information

Chapter 3 Trees. Theorem A graph T is a tree if, and only if, every two distinct vertices of T are joined by a unique path.

Chapter 3 Trees. Theorem A graph T is a tree if, and only if, every two distinct vertices of T are joined by a unique path. Chapter 3 Trees Section 3. Fundamental Properties of Trees Suppose your city is planning to construct a rapid rail system. They want to construct the most economical system possible that will meet the

More information

Evolution of Tandemly Repeated Sequences

Evolution of Tandemly Repeated Sequences University of Canterbury Department of Mathematics and Statistics Evolution of Tandemly Repeated Sequences A thesis submitted in partial fulfilment of the requirements of the Degree for Master of Science

More information

Answer Set Programming or Hypercleaning: Where does the Magic Lie in Solving Maximum Quartet Consistency?

Answer Set Programming or Hypercleaning: Where does the Magic Lie in Solving Maximum Quartet Consistency? Answer Set Programming or Hypercleaning: Where does the Magic Lie in Solving Maximum Quartet Consistency? Fathiyeh Faghih and Daniel G. Brown David R. Cheriton School of Computer Science, University of

More information

CS 231: Algorithmic Problem Solving

CS 231: Algorithmic Problem Solving CS 231: Algorithmic Problem Solving Naomi Nishimura Module 5 Date of this version: June 14, 2018 WARNING: Drafts of slides are made available prior to lecture for your convenience. After lecture, slides

More information

4 Basics of Trees. Petr Hliněný, FI MU Brno 1 FI: MA010: Trees and Forests

4 Basics of Trees. Petr Hliněný, FI MU Brno 1 FI: MA010: Trees and Forests 4 Basics of Trees Trees, actually acyclic connected simple graphs, are among the simplest graph classes. Despite their simplicity, they still have rich structure and many useful application, such as in

More information

DIMACS Tutorial on Phylogenetic Trees and Rapidly Evolving Pathogens. Katherine St. John City University of New York 1

DIMACS Tutorial on Phylogenetic Trees and Rapidly Evolving Pathogens. Katherine St. John City University of New York 1 DIMACS Tutorial on Phylogenetic Trees and Rapidly Evolving Pathogens Katherine St. John City University of New York 1 Thanks to the DIMACS Staff Linda Casals Walter Morris Nicole Clark Katherine St. John

More information

Lecture 2 Pairwise sequence alignment. Principles Computational Biology Teresa Przytycka, PhD

Lecture 2 Pairwise sequence alignment. Principles Computational Biology Teresa Przytycka, PhD Lecture 2 Pairwise sequence alignment. Principles Computational Biology Teresa Przytycka, PhD Assumptions: Biological sequences evolved by evolution. Micro scale changes: For short sequences (e.g. one

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

Lecture 6: Analysis of Algorithms (CS )

Lecture 6: Analysis of Algorithms (CS ) Lecture 6: Analysis of Algorithms (CS583-002) Amarda Shehu October 08, 2014 1 Outline of Today s Class 2 Traversals Querying Insertion and Deletion Sorting with BSTs 3 Red-black Trees Height of a Red-black

More information

Algebraic method for Shortest Paths problems

Algebraic method for Shortest Paths problems Lecture 1 (06.03.2013) Author: Jaros law B lasiok Algebraic method for Shortest Paths problems 1 Introduction In the following lecture we will see algebraic algorithms for various shortest-paths problems.

More information

DESIGN AND ANALYSIS OF ALGORITHMS (DAA 2017)

DESIGN AND ANALYSIS OF ALGORITHMS (DAA 2017) DESIGN AND ANALYSIS OF ALGORITHMS (DAA 2017) Veli Mäkinen Design and Analysis of Algorithms 2017 week 4 11.8.2017 1 Dynamic Programming Week 4 2 Design and Analysis of Algorithms 2017 week 4 11.8.2017

More information

Improved parameterized complexity of the Maximum Agreement Subtree and Maximum Compatible Tree problems LIRMM, Tech.Rep. num 04026

Improved parameterized complexity of the Maximum Agreement Subtree and Maximum Compatible Tree problems LIRMM, Tech.Rep. num 04026 Improved parameterized complexity of the Maximum Agreement Subtree and Maximum Compatible Tree problems LIRMM, Tech.Rep. num 04026 Vincent Berry, François Nicolas Équipe Méthodes et Algorithmes pour la

More information

Multiple Sequence Alignment Sum-of-Pairs and ClustalW. Ulf Leser

Multiple Sequence Alignment Sum-of-Pairs and ClustalW. Ulf Leser Multiple Sequence Alignment Sum-of-Pairs and ClustalW Ulf Leser This Lecture Multiple Sequence Alignment The problem Theoretical approach: Sum-of-Pairs scores Practical approach: ClustalW Ulf Leser: Bioinformatics,

More information

Topic: Local Search: Max-Cut, Facility Location Date: 2/13/2007

Topic: Local Search: Max-Cut, Facility Location Date: 2/13/2007 CS880: Approximations Algorithms Scribe: Chi Man Liu Lecturer: Shuchi Chawla Topic: Local Search: Max-Cut, Facility Location Date: 2/3/2007 In previous lectures we saw how dynamic programming could be

More information

Fast and Reliable Reconstruction of Phylogenetic Trees with Very Short Edges Extended Abstract

Fast and Reliable Reconstruction of Phylogenetic Trees with Very Short Edges Extended Abstract Fast and Reliable Reconstruction of Phylogenetic Trees with Very Short Edges Extended Abstract Ilan Gronau Shlomo Moran Sagi Snir Abstract Phylogenetic reconstruction is the problem of reconstructing an

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

Machine learning - HT Clustering

Machine learning - HT Clustering Machine learning - HT 2016 10. Clustering Varun Kanade University of Oxford March 4, 2016 Announcements Practical Next Week - No submission Final Exam: Pick up on Monday Material covered next week is not

More information

MATH 423 Linear Algebra II Lecture 17: Reduced row echelon form (continued). Determinant of a matrix.

MATH 423 Linear Algebra II Lecture 17: Reduced row echelon form (continued). Determinant of a matrix. MATH 423 Linear Algebra II Lecture 17: Reduced row echelon form (continued). Determinant of a matrix. Row echelon form A matrix is said to be in the row echelon form if the leading entries shift to the

More information

Data Mining in Bioinformatics Day 1: Classification

Data Mining in Bioinformatics Day 1: Classification Data Mining in Bioinformatics Day 1: Classification Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institute Tübingen and Eberhard Karls

More information

Solutions to Exam Data structures (X and NV)

Solutions to Exam Data structures (X and NV) Solutions to Exam Data structures X and NV 2005102. 1. a Insert the keys 9, 6, 2,, 97, 1 into a binary search tree BST. Draw the final tree. See Figure 1. b Add NIL nodes to the tree of 1a and color it

More information

Genetics/MBT 541 Spring, 2002 Lecture 1 Joe Felsenstein Department of Genome Sciences Phylogeny methods, part 1 (Parsimony and such)

Genetics/MBT 541 Spring, 2002 Lecture 1 Joe Felsenstein Department of Genome Sciences Phylogeny methods, part 1 (Parsimony and such) Genetics/MBT 541 Spring, 2002 Lecture 1 Joe Felsenstein Department of Genome Sciences joe@gs Phylogeny methods, part 1 (Parsimony and such) Methods of reconstructing phylogenies (evolutionary trees) Parsimony

More information

The Matrix-Tree Theorem and Its Applications to Complete and Complete Bipartite Graphs

The Matrix-Tree Theorem and Its Applications to Complete and Complete Bipartite Graphs The Matrix-Tree Theorem and Its Applications to Complete and Complete Bipartite Graphs Frankie Smith Nebraska Wesleyan University fsmith@nebrwesleyan.edu May 11, 2015 Abstract We will look at how to represent

More information

1 Minimum Cut Problem

1 Minimum Cut Problem CS 6 Lecture 6 Min Cut and Karger s Algorithm Scribes: Peng Hui How, Virginia Williams (05) Date: November 7, 07 Anthony Kim (06), Mary Wootters (07) Adapted from Virginia Williams lecture notes Minimum

More information

CSE 417 Dynamic Programming (pt 4) Sub-problems on Trees

CSE 417 Dynamic Programming (pt 4) Sub-problems on Trees CSE 417 Dynamic Programming (pt 4) Sub-problems on Trees Reminders > HW4 is due today > HW5 will be posted shortly Dynamic Programming Review > Apply the steps... 1. Describe solution in terms of solution

More information

Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri

Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri Scene Completion Problem The Bare Data Approach High Dimensional Data Many real-world problems Web Search and Text Mining Billions

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

Rapid Neighbour-Joining

Rapid Neighbour-Joining Rapid Neighbour-Joining Martin Simonsen, Thomas Mailund and Christian N. S. Pedersen Bioinformatics Research Center (BIRC), University of Aarhus, C. F. Møllers Allé, Building 1110, DK-8000 Århus C, Denmark.

More information

10701 Machine Learning. Clustering

10701 Machine Learning. Clustering 171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among

More information

Discrete mathematics , Fall Instructor: prof. János Pach

Discrete mathematics , Fall Instructor: prof. János Pach Discrete mathematics 2016-2017, Fall Instructor: prof. János Pach - covered material - Lecture 1. Counting problems To read: [Lov]: 1.2. Sets, 1.3. Number of subsets, 1.5. Sequences, 1.6. Permutations,

More information

Write an algorithm to find the maximum value that can be obtained by an appropriate placement of parentheses in the expression

Write an algorithm to find the maximum value that can be obtained by an appropriate placement of parentheses in the expression Chapter 5 Dynamic Programming Exercise 5.1 Write an algorithm to find the maximum value that can be obtained by an appropriate placement of parentheses in the expression x 1 /x /x 3 /... x n 1 /x n, where

More information

3. Cluster analysis Overview

3. Cluster analysis Overview Université Laval Multivariate analysis - February 2006 1 3.1. Overview 3. Cluster analysis Clustering requires the recognition of discontinuous subsets in an environment that is sometimes discrete (as

More information

Exercise set 2 Solutions

Exercise set 2 Solutions Exercise set 2 Solutions Let H and H be the two components of T e and let F E(T ) consist of the edges of T with one endpoint in V (H), the other in V (H ) Since T is connected, F Furthermore, since T

More information

Ma/CS 6b Class 13: Counting Spanning Trees

Ma/CS 6b Class 13: Counting Spanning Trees Ma/CS 6b Class 13: Counting Spanning Trees By Adam Sheffer Reminder: Spanning Trees A spanning tree is a tree that contains all of the vertices of the graph. A graph can contain many distinct spanning

More information

Scaling species tree estimation methods to large datasets using NJMerge

Scaling species tree estimation methods to large datasets using NJMerge Scaling species tree estimation methods to large datasets using NJMerge Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana Champaign 2018 Phylogenomics Software

More information

15-451/651: Design & Analysis of Algorithms October 11, 2018 Lecture #13: Linear Programming I last changed: October 9, 2018

15-451/651: Design & Analysis of Algorithms October 11, 2018 Lecture #13: Linear Programming I last changed: October 9, 2018 15-451/651: Design & Analysis of Algorithms October 11, 2018 Lecture #13: Linear Programming I last changed: October 9, 2018 In this lecture, we describe a very general problem called linear programming

More information

Phylogenetics on CUDA (Parallel) Architectures Bradly Alicea

Phylogenetics on CUDA (Parallel) Architectures Bradly Alicea Descent w/modification Descent w/modification Descent w/modification Descent w/modification CPU Descent w/modification Descent w/modification Phylogenetics on CUDA (Parallel) Architectures Bradly Alicea

More information

Treewidth and graph minors

Treewidth and graph minors Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under

More information

Mining Social Network Graphs

Mining Social Network Graphs Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be

More information

Graph Representations and Traversal

Graph Representations and Traversal COMPSCI 330: Design and Analysis of Algorithms 02/08/2017-02/20/2017 Graph Representations and Traversal Lecturer: Debmalya Panigrahi Scribe: Tianqi Song, Fred Zhang, Tianyu Wang 1 Overview This lecture

More information

Neighbour Joining. Algorithms in BioInformatics 2 Mandatory Project 1 Magnus Erik Hvass Pedersen (971055) November 2004, Daimi, University of Aarhus

Neighbour Joining. Algorithms in BioInformatics 2 Mandatory Project 1 Magnus Erik Hvass Pedersen (971055) November 2004, Daimi, University of Aarhus Neighbour Joining Algorithms in BioInformatics 2 Mandatory Project 1 Magnus Erik Hvass Pedersen (971055) November 2004, Daimi, University of Aarhus 1 Introduction The purpose of this report is to verify

More information

Algorithms for Ultra-large Multiple Sequence Alignment and Phylogeny Estimation

Algorithms for Ultra-large Multiple Sequence Alignment and Phylogeny Estimation Algorithms for Ultra-large Multiple Sequence Alignment and Phylogeny Estimation Tandy Warnow Department of Computer Science The University of Texas at Austin Phylogeny (evolutionary tree) Orangutan Gorilla

More information

CS473 - Algorithms I

CS473 - Algorithms I CS473 - Algorithms I Lecture 4 The Divide-and-Conquer Design Paradigm View in slide-show mode 1 Reminder: Merge Sort Input array A sort this half sort this half Divide Conquer merge two sorted halves Combine

More information

Lecture 5: Matrices. Dheeraj Kumar Singh 07CS1004 Teacher: Prof. Niloy Ganguly Department of Computer Science and Engineering IIT Kharagpur

Lecture 5: Matrices. Dheeraj Kumar Singh 07CS1004 Teacher: Prof. Niloy Ganguly Department of Computer Science and Engineering IIT Kharagpur Lecture 5: Matrices Dheeraj Kumar Singh 07CS1004 Teacher: Prof. Niloy Ganguly Department of Computer Science and Engineering IIT Kharagpur 29 th July, 2008 Types of Matrices Matrix Addition and Multiplication

More information

K-Anonymity. Definitions. How do you publicly release a database without compromising individual privacy?

K-Anonymity. Definitions. How do you publicly release a database without compromising individual privacy? K-Anonymity How do you publicly release a database without compromising individual privacy? The Wrong Approach: REU Summer 2007 Advisors: Ryan Williams and Manuel Blum Just leave out any unique identifiers

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

CS473-Algorithms I. Lecture 10. Dynamic Programming. Cevdet Aykanat - Bilkent University Computer Engineering Department

CS473-Algorithms I. Lecture 10. Dynamic Programming. Cevdet Aykanat - Bilkent University Computer Engineering Department CS473-Algorithms I Lecture 1 Dynamic Programming 1 Introduction An algorithm design paradigm like divide-and-conquer Programming : A tabular method (not writing computer code) Divide-and-Conquer (DAC):

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 22.1 Introduction We spent the last two lectures proving that for certain problems, we can

More information

Greedy Approximations

Greedy Approximations CS 787: Advanced Algorithms Instructor: Dieter van Melkebeek Greedy Approximations Approximation algorithms give a solution to a problem in polynomial time, at most a given factor away from the correct

More information

A Lookahead Branch-and-Bound Algorithm for the Maximum Quartet Consistency Problem

A Lookahead Branch-and-Bound Algorithm for the Maximum Quartet Consistency Problem A Lookahead Branch-and-Bound Algorithm for the Maximum Quartet Consistency Problem Gang Wu Jia-Huai You Guohui Lin January 17, 2005 Abstract A lookahead branch-and-bound algorithm is proposed for solving

More information

PACKING DIGRAPHS WITH DIRECTED CLOSED TRAILS

PACKING DIGRAPHS WITH DIRECTED CLOSED TRAILS PACKING DIGRAPHS WITH DIRECTED CLOSED TRAILS PAUL BALISTER Abstract It has been shown [Balister, 2001] that if n is odd and m 1,, m t are integers with m i 3 and t i=1 m i = E(K n) then K n can be decomposed

More information

Discrete Optimization 2010 Lecture 5 Min-Cost Flows & Total Unimodularity

Discrete Optimization 2010 Lecture 5 Min-Cost Flows & Total Unimodularity Discrete Optimization 2010 Lecture 5 Min-Cost Flows & Total Unimodularity Marc Uetz University of Twente m.uetz@utwente.nl Lecture 5: sheet 1 / 26 Marc Uetz Discrete Optimization Outline 1 Min-Cost Flows

More information

Memoization/Dynamic Programming. The String reconstruction problem. CS124 Lecture 11 Spring 2018

Memoization/Dynamic Programming. The String reconstruction problem. CS124 Lecture 11 Spring 2018 CS124 Lecture 11 Spring 2018 Memoization/Dynamic Programming Today s lecture discusses memoization, which is a method for speeding up algorithms based on recursion, by using additional memory to remember

More information