Lecture: Bioinformatics

Size: px
Start display at page:

Download "Lecture: Bioinformatics"

Transcription

1 Lecture: Bioinformatics ENS Sacley, 2018 Some slides graciously provided by Daniel Huson & Celine Scornavacca

2 Phylogenetic Trees - Motivation 2 / 31

3 2 / 31 Phylogenetic Trees - Motivation Motivation - study relation between species - evolution of characteristics - co-evolution (host-parasite) - geological migration - genetic development of viruses/diseases

4 2 / 31 Phylogenetic Trees - Motivation Motivation - study relation between species - evolution of characteristics - co-evolution (host-parasite) - geological migration - genetic development of viruses/diseases Evolution genetic material changes over time new species branch off tree of life

5 Phylogenetic Trees - Motivation 2 / 31

6 Phylogenetic Trees - Motivation 2 / 31

7 3 / 31 Rooted Phylogenetic Trees evolution of species over time, leaves extant, hypothetical ancestors possibly branch lengths (time) Notation taxon, cluster, triplet

8 3 / 31 Rooted Phylogenetic Trees evolution of species over time, leaves extant, hypothetical ancestors possibly branch lengths (time) Notation taxon, cluster, triplet

9 3 / 31 Rooted Phylogenetic Trees evolution of species over time, leaves extant, hypothetical ancestors possibly branch lengths (time) Notation taxon, cluster, triplet

10 3 / 31 Rooted Phylogenetic Trees evolution of species over time, leaves extant, hypothetical ancestors possibly branch lengths (time) Notation taxon, cluster, triplet

11 Rooted Phylogenetic Trees evolution of species over time, leaves extant, hypothetical ancestors possibly branch lengths (time) Exercise Time Notation Polytomies taxon, cluster, triplet history not clear soft known fan out hard Exercise: use xy z LCA(xz)=LCA(yz)>LCA(xy) to prove ab c + bc d ac d 3 / 31

12 4 / 31 Unrooted Phylogenetic Trees similarity between genomes, leaves extant, internal vertices have no meaning possibly branch lengths (amount of change) Notation taxon, split, quartet

13 4 / 31 Unrooted Phylogenetic Trees similarity between genomes, leaves extant, internal vertices have no meaning possibly branch lengths (amount of change) Notation taxon, split, quartet

14 4 / 31 Unrooted Phylogenetic Trees similarity between genomes, leaves extant, internal vertices have no meaning possibly branch lengths (amount of change) Notation taxon, split, quartet

15 4 / 31 Unrooted Phylogenetic Trees similarity between genomes, leaves extant, internal vertices have no meaning possibly branch lengths (amount of change) Notation taxon, split, quartet

16 4 / 31 Unrooted Phylogenetic Trees similarity between genomes, leaves extant, internal vertices have no meaning possibly branch lengths (amount of change) Notation taxon, split, quartet

17 5 / 31 Reconstructing Phylogenetic Trees Group Species By... - morphology - behavior - geography Diptera = two wings

18 5 / 31 Reconstructing Phylogenetic Trees Group Species By... - morphology - behavior - geography - distance of sequences - genetic distance - etc. Diptera = two wings

19 6 / 31 Reconstructing Phylogenetic Trees Vertebrata has backbone

20 6 / 31 Reconstructing Phylogenetic Trees Vertebrata has backbone Tetrapoda has 4 legs

21 6 / 31 Reconstructing Phylogenetic Trees Vertebrata has backbone Tetrapoda has 4 legs Mammalia breast feeding

22 6 / 31 Reconstructing Phylogenetic Trees Vertebrata has backbone Tetrapoda has 4 legs Salientia can leap Mammalia breast feeding

23 6 / 31 Reconstructing Phylogenetic Trees Vertebrata has backbone Tetrapoda has 4 legs Salientia can leap Mammalia breast feeding Elephantidae elephants

24 6 / 31 Reconstructing Phylogenetic Trees Vertebrata has backbone Tetrapoda has 4 legs Salientia can leap Mammalia breast feeding Elephantidae elephants Loxodonta africana African Elephant Elephas maximus Asian Elephant

25 Reconstructing Phylogenetic Trees Tetrapoda has 4 legs Vertebrata has backbone Keep in Mind Automatic reconstruction should be - fast (deal with tons of species & genes) - consistent (optimal data evolution correctly reflected) - non-arbitrary Salientia can leap Mammalia breast feeding Elephantidae elephants Loxodonta africana African Elephant Elephas maximus Asian Elephant 6 / 31

26 Distance-Based Reconstruction Idea: cluster hierarchically 7 / 31

27 7 / 31 Distance-Based Reconstruction Idea: cluster hierarchically

28 7 / 31 Distance-Based Reconstruction Idea: cluster hierarchically Idea: merge closest clusters

29 7 / 31 Distance-Based Reconstruction Idea: cluster hierarchically Idea: merge closest clusters

30 7 / 31 Distance-Based Reconstruction Idea: cluster hierarchically Idea: merge closest clusters

31 Distance-Based Reconstruction Idea: cluster hierarchically Idea: merge closest clusters branch lengths?? assume molecular clock ultrametric 7 / 31

32 Distance-Based Reconstruction Idea: cluster hierarchically Idea: merge closest clusters update matrix d X Y,Z = X d X,Z + Y d Y,Z X + Y branch lengths?? assume molecular clock ultrametric / 31

33 Distance-Based Reconstruction Idea: cluster hierarchically Idea: merge closest clusters update matrix d X Y,Z = X d X,Z + Y d Y,Z X + Y branch lengths?? assume molecular clock ultrametric / 31

34 Distance-Based Reconstruction Idea: cluster hierarchically Idea: merge closest clusters update matrix d X Y,Z = X d X,Z + Y d Y,Z X + Y branch lengths?? assume molecular clock ultrametric / 31

35 Distance-Based Reconstruction Idea: cluster hierarchically 6 28/3 28/3 Idea: merge closest clusters update matrix d X Y,Z = X d X,Z + Y d Y,Z X + Y branch lengths?? assume molecular clock ultrametric / 31

36 Distance-Based Reconstruction Idea: cluster hierarchically 6 28/3 28/3 Idea: merge closest clusters update matrix d X Y,Z = X d X,Z + Y d Y,Z X + Y branch lengths?? assume molecular clock ultrametric 7 / 31

37 Distance-Based Reconstruction Idea: cluster hierarchically 28/3 Idea: merge closest clusters update matrix d X Y,Z = X d X,Z + Y d Y,Z X + Y branch lengths?? assume molecular clock ultrametric 7 / 31

38 Distance-Based Reconstruction Idea: cluster hierarchically 28/3 5/3 Idea: merge closest clusters 5/3 update matrix d X Y,Z = X d X,Z + Y d Y,Z X + Y branch lengths?? assume molecular clock ultrametric 7 / 31

39 Distance-Based Reconstruction Idea: cluster hierarchically Unweighted Pair Group Method w/ Avg. - find closest pair - join them - update distances & recurse 28/3 5/3 Idea: merge closest clusters 5/3 update matrix d X Y,Z = X d X,Z + Y d Y,Z X + Y branch lengths?? assume molecular clock ultrametric 7 / 31

40 Distance-Based Reconstruction Idea: cluster hierarchically Unweighted Pair Group Method w/ Avg. - find closest pair - join them - update distances & recurse /3 Idea: merge closest clusters 5/3 update matrix d X Y,Z = X d X,Z + Y d Y,Z X + Y branch lengths?? assume molecular clock ultrametric 7 / 31

41 Distance-Based Reconstruction Idea: cluster hierarchically Unweighted Pair Group Method w/ Avg. - find closest pair - join them - update distances & recurse Idea: merge closest clusters update matrix d X Y,Z = X d X,Z + Y d Y,Z X + Y Exercise Time branch lengths?? assume molecular clock ultrametric 7 / 31

42 Distance-Based Reconstruction Idea: cluster hierarchically Unweighted Pair Group Method w/ Avg. - find closest pair - join them - update distances & recurse Idea: merge closest clusters update matrix d X Y,Z = X d X,Z + Y d Y,Z X + Y branch lengths?? assume molecular clock ultrametric 7 / 31

43 8 / 31 Distance-Based Reconstruction What about unrooted trees?

44 8 / 31 Distance-Based Reconstruction B 9 C 11 8 D A B C What about unrooted trees?

45 8 / 31 Distance-Based Reconstruction Problem: correct pairs may not be closest B 9 C 11 8 D A B C

46 9 / 31 Distance-Based Reconstruction Problem: correct pairs may not be closest B 9 C 11 8 D A B C B A D C

47 9 / 31 Distance-Based Reconstruction Problem: correct pairs may not be closest Neighbor Joining (unrooted) - build new matrix: Q X,Y = Z (d X,Z + d Y,Z d X,Y ) + 2d X,Y - find max in Q - join them - update distances & recurse

48 9 / 31 Distance-Based Reconstruction Problem: correct pairs may not be closest Neighbor Joining (unrooted) - build new matrix: Q X,Y = Z (d X,Z + d Y,Z d X,Y ) + 2d X,Y - find max in Q - join them - update distances & recurse

49 9 / 31 Distance-Based Reconstruction Problem: correct pairs may not be closest Neighbor Joining (unrooted) - build new matrix: Q X,Y = Z (d X,Z + d Y,Z d X,Y ) + 2d X,Y - find max in Q - join them - update distances & recurse Theorem Q X,Y max any tree T yielding Q has cherry (X, Y )

50 9 / 31 Distance-Based Reconstruction Problem: correct pairs may not be closest Neighbor Joining (unrooted) - build new matrix: Q X,Y = Z (d X,Z + d Y,Z d X,Y ) + 2d X,Y - find max in Q - join them - update distances & recurse update distances d X Y,Z = 1 /2 (d X,Z + d Y,Z d X,Y ) branch lengths 2b(X ) = Z / {X,Y } (d X,Z d Y,Z +d X,Y ) n 2

51 9 / 31 Distance-Based Reconstruction Problem: correct pairs may not be closest Neighbor Joining (unrooted) - build new matrix: Q X,Y = Z (d X,Z + d Y,Z d X,Y ) + 2d X,Y - find max in Q - join them - update distances & recurse update distances d X Y,Z = 1 /2 (d X,Z + d Y,Z d X,Y ) branch lengths 2b(X ) = Z / {X,Y } (d X,Z d Y,Z +d X,Y ) n 2 2 3

52 9 / 31 Distance-Based Reconstruction Problem: correct pairs may not be closest Neighbor Joining (unrooted) - build new matrix: Q X,Y = Z (d X,Z + d Y,Z d X,Y ) + 2d X,Y - find max in Q - join them - update distances & recurse update distances d X Y,Z = 1 /2 (d X,Z + d Y,Z d X,Y ) branch lengths 2b(X ) = Z / {X,Y } (d X,Z d Y,Z +d X,Y ) n 2 2 3

53 9 / 31 Distance-Based Reconstruction Problem: correct pairs may not be closest Neighbor Joining (unrooted) - build new matrix: Q X,Y = Z (d X,Z + d Y,Z d X,Y ) + 2d X,Y - find max in Q - join them - update distances & recurse update distances d X Y,Z = 1 /2 (d X,Z + d Y,Z d X,Y ) branch lengths 2b(X ) = Z / {X,Y } (d X,Z d Y,Z +d X,Y ) n 2 2 3

54 9 / 31 Distance-Based Reconstruction Problem: correct pairs may not be closest Neighbor Joining (unrooted) - build new matrix: Q X,Y = Z (d X,Z + d Y,Z d X,Y ) + 2d X,Y - find max in Q - join them - update distances & recurse update distances d X Y,Z = 1 /2 (d X,Z + d Y,Z d X,Y ) branch lengths 2b(X ) = Z / {X,Y } (d X,Z d Y,Z +d X,Y ) n

55 9 / 31 Distance-Based Reconstruction Problem: correct pairs may not be closest Neighbor Joining (unrooted) - build new matrix: Q X,Y = Z (d X,Z + d Y,Z d X,Y ) + 2d X,Y - find max in Q - join them - update distances & recurse update distances d X Y,Z = 1 /2 (d X,Z + d Y,Z d X,Y ) branch lengths 2b(X ) = Z / {X,Y } (d X,Z d Y,Z +d X,Y ) n

56 9 / 31 Distance-Based Reconstruction Problem: correct pairs may not be closest Neighbor Joining (unrooted) - build new matrix: Q X,Y = Z (d X,Z + d Y,Z d X,Y ) + 2d X,Y - find max in Q - join them - update distances & recurse update distances d X Y,Z = 1 /2 (d X,Z + d Y,Z d X,Y ) branch lengths 2b(X ) = Z / {X,Y } (d X,Z d Y,Z +d X,Y ) n

57 Parsimony Reconstructing 10 / 31

58 10 / 31 Parsimony Reconstructing fur AUS pouch land X X X X X X X X X

59 10 / 31 Parsimony Reconstructing fur AUS pouch land X X X X X X X X X

60 10 / 31 Parsimony Reconstructing fur AUS pouch land X X X X X X X 0001 X X

61 Parsimony Reconstructing fur AUS pouch land X X X X X X X 0001 X X sum distance of endpoints of each edge 10 / 31

62 Parsimony Reconstructing fur AUS pouch land X X X X X X X 0001 X X sum distance of endpoints of each edge cost 6 (Hamming) 10 / 31

63 10 / 31 Parsimony Reconstructing ACTACGGACCGCTGCG A_TACAAAC GCG ATTACGGACCGCTGCG Small Parsimony Input: character state matrix M, rooted tree T Task: assign characters to internal nodes minimizing total cost ATTACAAAC GCC TTTACAAAC CC

64 10 / 31 Parsimony Reconstructing ACTACGGACCGCTGCG A_TACAAAC GCG ATTACGGACCGCTGCG Small Parsimony Input: character state matrix M, rooted tree T Task: assign characters to internal nodes minimizing total cost ATTACAAAC GCC TTTACAAAC CC O(nm) time [Fitch 71]

65 10 / 31 Parsimony Reconstructing ACTACGGACCGCTGCG A_TACAAAC GCG ATTACGGACCGCTGCG Small Parsimony Input: character state matrix M, rooted tree T Task: assign characters to internal nodes minimizing total cost ATTACAAAC GCC TTTACAAAC CC Exercise Time O(nm) time [Fitch 71]

66 10 / 31 Parsimony Reconstructing ACTACGGACCGCTGCG A_TACAAAC GCG ATTACGGACCGCTGCG ATTACAAAC GCC TTTACAAAC CC Small Parsimony Input: character state matrix M, rooted tree T Task: assign characters to internal nodes minimizing total cost O(nm) time Large Parsimony Input: character state matrix M [Fitch 71] Task: find tree T & assign characters to internal nodes minimizing total cost

67 10 / 31 Parsimony Reconstructing ACTACGGACCGCTGCG A_TACAAAC GCG ATTACGGACCGCTGCG ATTACAAAC GCC TTTACAAAC CC Small Parsimony Input: character state matrix M, rooted tree T Task: assign characters to internal nodes minimizing total cost O(nm) time Large Parsimony Input: character state matrix M [Fitch 71] Task: find tree T & assign characters to internal nodes minimizing total cost NP-hard

68 10 / 31 Parsimony Reconstructing ACTACGGACCGCTGCG A_TACAAAC GCG ATTACGGACCGCTGCG ATTACAAAC GCC TTTACAAAC CC Small Parsimony Input: character state matrix M, rooted tree T Task: assign characters to internal nodes minimizing total cost O(nm) time Large Parsimony Input: character state matrix M [Fitch 71] Task: find tree T & assign characters to internal nodes minimizing total cost NP-hard Note: alignment is crucial!

69 11 / 31 Maximum Likelihood Reconstruction Idea: find a tree (with branch lengths) under which evolution is most likely to have produced the observed characters/genomes

70 11 / 31 Maximum Likelihood Reconstruction Idea: find a tree (with branch lengths) under which evolution is most likely to have produced the observed characters/genomes need model of evolution

71 11 / 31 Maximum Likelihood Reconstruction Idea: find a tree (with branch lengths) under which evolution is most likely to have produced the observed characters/genomes need model of evolution Jukes & Cantor Model - each base evolves individually - each base occurs with equal frequency in the genome - constant rate µ of mutation - each base is equally likely to be result of mutation

72 11 / 31 Maximum Likelihood Reconstruction Idea: find a tree (with branch lengths) under which evolution is most likely to have produced the observed characters/genomes need model of evolution Jukes & Cantor Model - each base evolves individually - each base occurs with equal frequency in the genome - constant rate µ of mutation - each base is equally likely to be result of mutation Generalized Time Reversible Model - each base evolves individually - each base X has a frequency π X to occur in the genome - each base-substitution has its own rate of occurance

73 11 / 31 Maximum Likelihood Reconstruction Idea: find a tree (with branch lengths) under which evolution is most likely to have produced the observed characters/genomes need model of evolution Jukes & Cantor Model - each base evolves individually - each base occurs with equal frequency in the genome - constant rate µ of mutation - each base is equally likely to be result of mutation Generalized Time Reversible Model - each base evolves individually - each base X has a frequency π X to occur in the genome - each base-substitution has its own rate of occurance compute likelihood, given tree & parameters O(mn) time

74 11 / 31 Maximum Likelihood Reconstruction Idea: find a tree (with branch lengths) under which evolution is most likely to have produced the observed characters/genomes need model of evolution Jukes & Cantor Model - each base evolves individually - each base occurs with equal frequency in the genome - constant rate µ of mutation - each base is equally likely to be result of mutation Generalized Time Reversible Model - each base evolves individually - each base X has a frequency π X to occur in the genome - each base-substitution has its own rate of occurance compute likelihood, given tree & parameters O(mn) time find best tree & parameters NP-hard

75 11 / 31 Maximum Likelihood Reconstruction Idea: find a tree (with branch lengths) under which evolution is most likely to have produced the observed characters/genomes need model of evolution Jukes & Cantor Model - each base evolves individually - each base occurs with equal frequency in the genome - constant rate µ of mutation - each base is equally likely to be result of mutation Generalized Time Reversible Model - each base evolves individually - each base X has a frequency π X to occur in the genome - each base-substitution has its own rate of occurance compute likelihood, given tree & parameters O(mn) time find best tree & parameters NP-hard local search in the tree space

76 12 / 31 ML Reconstruction - Tree Spaces Observe: a tree and a rearrangement operation span a space Nearest Neighbor Interchange change any configuration of 4 (3) neighboring subtrees into another

77 12 / 31 ML Reconstruction - Tree Spaces Observe: a tree and a rearrangement operation span a space Nearest Neighbor Interchange change any configuration of 4 (3) neighboring subtrees into another

78 12 / 31 ML Reconstruction - Tree Spaces Observe: a tree and a rearrangement operation span a space Nearest Neighbor Interchange change any configuration of 4 (3) neighboring subtrees into another

79 12 / 31 ML Reconstruction - Tree Spaces Observe: a tree and a rearrangement operation span a space Nearest Neighbor Interchange change any configuration of 4 (3) neighboring subtrees into another Subtree Prune & Regraft break any edge uv & connect v to any edge of the component of u

80 12 / 31 ML Reconstruction - Tree Spaces Observe: a tree and a rearrangement operation span a space Nearest Neighbor Interchange change any configuration of 4 (3) neighboring subtrees into another Subtree Prune & Regraft break any edge uv & connect v to any edge of the component of u

81 12 / 31 ML Reconstruction - Tree Spaces Observe: a tree and a rearrangement operation span a space Nearest Neighbor Interchange change any configuration of 4 (3) neighboring subtrees into another Subtree Prune & Regraft break any edge uv & connect v to any edge of the component of u

82 12 / 31 ML Reconstruction - Tree Spaces Observe: a tree and a rearrangement operation span a space Nearest Neighbor Interchange change any configuration of 4 (3) neighboring subtrees into another Subtree Prune & Regraft break any edge uv & connect v to any edge of the component of u

83 12 / 31 ML Reconstruction - Tree Spaces Observe: a tree and a rearrangement operation span a space Nearest Neighbor Interchange change any configuration of 4 (3) neighboring subtrees into another Subtree Prune & Regraft break any edge uv & connect v to any edge of the component of u

84 ML Reconstruction - Tree Spaces Observe: a tree and a rearrangement operation span a space Nearest Neighbor Interchange change any configuration of 4 (3) neighboring subtrees into another Subtree Prune & Regraft break any edge uv & connect v to any edge of the component of u Tree Bisection & Reconnection break any edge & insert a new reconnecting edge between any 2 edges 12 / 31

85 ML Reconstruction - Tree Spaces Observe: a tree and a rearrangement operation span a space Nearest Neighbor Interchange change any configuration of 4 (3) neighboring subtrees into another Subtree Prune & Regraft break any edge uv & connect v to any edge of the component of u Tree Bisection & Reconnection break any edge & insert a new reconnecting edge between any 2 edges 12 / 31

86 ML Reconstruction - Tree Spaces Observe: a tree and a rearrangement operation span a space Nearest Neighbor Interchange change any configuration of 4 (3) neighboring subtrees into another Subtree Prune & Regraft break any edge uv & connect v to any edge of the component of u Tree Bisection & Reconnection break any edge & insert a new reconnecting edge between any 2 edges 12 / 31

87 ML Reconstruction - Tree Spaces Observe: a tree and a rearrangement operation span a space Nearest Neighbor Interchange change any configuration of 4 (3) neighboring subtrees into another Subtree Prune & Regraft break any edge uv & connect v to any edge of the component of u Tree Bisection & Reconnection break any edge & insert a new reconnecting edge between any 2 edges 12 / 31

88 ML Reconstruction - Tree Spaces Observe: a tree and a rearrangement operation span a space Nearest Neighbor Interchange change any configuration of 4 (3) neighboring subtrees into another Subtree Prune & Regraft break any edge uv & connect v to any edge of the component of u Tree Bisection & Reconnection break any edge & insert a new reconnecting edge between any 2 edges 12 / 31

89 13 / 31 ML Reconstruction - Tree Spaces Exercise: turn into (any) caterpillar: Exercise Time Exercise: how are the distances related?

90 14 / 31 Checking Robustness Bootstrap Method suppose: method X yields tree T from n m character-state matrix M repeat k times the following experiment: 1. draw m columns from M (with repetition) 2. use X to compute T i Finally, for each branch of T, check how often it occurs in the T i bootstrap value measures robustness ( support ) of each branch

91 Reconstruction by Gene Trees 15 / 31

92 15 / 31 Reconstruction by Gene Trees A Common Method For Reconstructing Trees 1. get genomes of multiple species 2. extract genes using START & STOP codons 3. cluster genes in families of similar genes 4. within each family, infer a gene tree using dissimilarities 5. build a consensus among the gene trees species tree (Note: species tree may differ significantly from individual gene trees) 6. reconcile all gene trees with the species tree to learn the evolution of those genes

93 15 / 31 Reconstruction by Gene Trees A Common Method For Reconstructing Trees 1. get genomes of multiple species 2. extract genes using START & STOP codons 3. cluster genes in families of similar genes 4. within each family, infer a gene tree using dissimilarities 5. build a consensus among the gene trees species tree (Note: species tree may differ significantly from individual gene trees) 6. reconcile all gene trees with the species tree to learn the evolution of those genes

94 15 / 31 Reconstruction by Gene Trees A Common Method For Reconstructing Trees 1. get genomes of multiple species 2. extract genes using START & STOP codons 3. cluster genes in families of similar genes 4. within each family, infer a gene tree using dissimilarities 5. build a consensus among the gene trees species tree (Note: species tree may differ significantly from individual gene trees) 6. reconcile all gene trees with the species tree to learn the evolution of those genes

95 15 / 31 Reconstruction by Gene Trees A Common Method For Reconstructing Trees 1. get genomes of multiple species 2. extract genes using START & STOP codons 3. cluster genes in families of similar genes 4. within each family, infer a gene tree using dissimilarities 5. build a consensus among the gene trees species tree (Note: species tree may differ significantly from individual gene trees) 6. reconcile all gene trees with the species tree to learn the evolution of those genes

96 16 / 31 Supertrees - Build Algorithm Idea: find root partition & recurse (as long as there are 3 taxa) a c b f e a d b c

97 16 / 31 Supertrees - Build Algorithm Idea: find root partition & recurse (as long as there are 3 taxa) b c a c b f e a d b c d a e f

98 16 / 31 Supertrees - Build Algorithm Idea: find root partition & recurse (as long as there are 3 taxa) b c a c b f e a d b c d a e f

99 16 / 31 Supertrees - Build Algorithm Idea: find root partition & recurse (as long as there are 3 taxa) b c a c b f e a d b c d a e f a, b, c, d e f

100 16 / 31 Supertrees - Build Algorithm Idea: find root partition & recurse (as long as there are 3 taxa) b c a c b f e a d b c d a e f a, b, c, d e f

101 16 / 31 Supertrees - Build Algorithm Idea: find root partition & recurse (as long as there are 3 taxa) b c a c b f e a d b c d a e f a, b, c, d e f

102 16 / 31 Supertrees - Build Algorithm Idea: find root partition & recurse (as long as there are 3 taxa) b c a c b f e a d b c d a e f a, c, d b e f

103 16 / 31 Supertrees - Build Algorithm Idea: find root partition & recurse (as long as there are 3 taxa) b c a c b f e a d b c d a e f a, c, d b e f

104 Supertrees - Build Algorithm Idea: find root partition & recurse (as long as there are 3 taxa) b c a c b f e a d b c d a e f b c e f a d 16 / 31

105 Supertrees - Build Algorithm Idea: find root partition & recurse (as long as there are 3 taxa) a c b f e a d b c Note: always works if trees are compatible/consistent b c e f a d 16 / 31

106 Supertrees - Build Algorithm Idea: find root partition & recurse (as long as there are 3 taxa) a c b f e a d b c Note: always works if trees are compatible/consistent incompatible? c b e f - largest compatible subset NP-hard (even for triplets) - voting schemes (each tree votes for their clades) - reinterpret clades as characters, combine into matrix & reconstruct a d 16 / 31

107 17 / 31 Consensi of Non-Agreeing Trees strict consensus consensus subtree majority consensus

108 18 / 31 Reconstruction by Gene Trees A Common Method For Reconstructing Trees 1. get genomes of multiple species 2. extract genes using START & STOP codons 3. cluster genes in families of similar genes 4. within each family, infer a gene tree using dissimilarities 5. build a consensus among the gene trees species tree (Note: species tree may differ significantly from individual gene trees) 6. reconcile all gene trees with the species tree to learn the evolution of those genes

109 18 / 31 Reconstruction by Gene Trees A Common Method For Reconstructing Trees 1. get genomes of multiple species 2. extract genes using START & STOP codons 3. cluster genes in families of similar genes 4. within each family, infer a gene tree using dissimilarities 5. build a consensus among the gene trees species tree (Note: species tree may differ significantly from individual gene trees) 6. reconcile all gene trees with the species tree to learn the evolution of those genes

110 The History of a Gene Family Recall gene = functional element of DNA, clustered into gene-families each family yields a tree depicting its history gene tree consensus of the gene trees yields species tree But: what really happened??? Mouse Bat Dog Rat Mouse Rat Bat Dog 19 / 31

111 The History of a Gene Family Recall gene = functional element of DNA, clustered into gene-families each family yields a tree depicting its history gene tree consensus of the gene trees yields species tree But: what really happened??? Mouse Bat Dog Rat Mouse Rat Bat Dog 19 / 31

112 20 / 31 Reconciliation A B C a a b b c Embedding Rules gene tree G, species tree S - mapping ρ : V (G) V (S) - l is leaf in G ρ(l) corresponds to l (a A, a A, etc.) - u V (G) is called duplication if ρ(u) = ρ(c) for any child c of u in G - all non-leaves of G that not duplications are called speciations - each edge uv of G incurs a loss-cost equal to the number of edges in the ρ(u)-ρ(v)-path in S minus 1 if v is a speciation or 0 if v is a duplication

113 Reconciliation A B C a a b b c DL-model = speciation = duplication = loss a a b b c 20 / 31

114 Reconciliation Goal: embed gene tree into species tree (extant genes must map to their species) Max. Likelihood find most probable embedding (computationally expensive) Parsimony find embedding minimizing #events (possibly weighted) DL-model = speciation = duplication = loss a a b b c 20 / 31

115 Reconciliation Parsimonious Reconciliation Input: species tree S, gene tree G, δ, λ N Task: embed G in S, minimizing the weighted sum of events Result: LCA-assignment solves this optimally in O( S + G ) DL-model = speciation (0) = duplication (δ) = loss (λ) a a b b c 20 / 31

116 20 / 31 Reconciliation Parsimonious Reconciliation Input: species tree S, gene tree G, δ, λ, τ N Task: embed G in S, minimizing the weighted sum of events Result: LCA-assignment solves this optimally in O( S + G ) Exercise Time A B C D a b c d a b

117 Reconciliation Parsimonious Reconciliation Input: species tree S, gene tree G, δ, λ, τ N Task: embed G in S, minimizing the weighted sum of events Result: LCA-assignment solves this optimally in O( S + G ) DTL-model = speciation (0) = duplication (δ) = loss (λ) = transfer (τ) a a b b c c 20 / 31

118 Reconciliation Parsimonious Reconciliation Input: species tree S, gene tree G, δ, λ, τ N Task: embed G in S, minimizing the weighted sum of events Result: LCA-assignment solves this optimally in O( S + G ) T events only between co-existing species time constraints NP-hard DTL-model = speciation (0) = duplication (δ) = loss (λ) = transfer (τ) a a b b c c 20 / 31

119 Reconciliation Parsimonious Reconciliation Input: species tree S, gene tree G, δ, λ, τ N Task: embed G in S, minimizing the weighted sum of events Result: LCA-assignment solves this optimally in O( S + G ) T events only between co-existing species time constraints NP-hard Idea: dated species tree O( S 2 G ) [Doyon et al. 10] DTL-model = speciation (0) = duplication (δ) = loss (λ) = transfer (τ) a a b b c c 20 / 31

120 21 / 31 Comparing Phylogenetic Trees Distance Measures - Nearest Neighbor Interchange - Subtree Prune & Regraft - Tree Bisection & Reconnection

121 21 / 31 Comparing Phylogenetic Trees Distance Measures - Nearest Neighbor Interchange - Subtree Prune & Regraft - Tree Bisection & Reconnection - Robinson-Foulds distance - quartet/triplet distance

122 22 / 31 Agreement Forests Definition A forest F is called agreement forest of trees T 1 and T 2 if F can be obtained from T 1 and T 2 by removing edges. D C B D A E C E F F B A

123 22 / 31 Agreement Forests Definition A forest F is called agreement forest of trees T 1 and T 2 if F can be obtained from T 1 and T 2 by removing edges. D C B D A E C E F F B A

124 22 / 31 Agreement Forests Definition A forest F is called agreement forest of trees T 1 and T 2 if F can be obtained from T 1 and T 2 by removing edges. F C E A F E D C B A D B

125 22 / 31 Agreement Forests Definition A forest F is called agreement forest of trees T 1 and T 2 if F can be obtained from T 1 and T 2 by removing edges. F C E A F E D C B A D B

126 22 / 31 Agreement Forests Definition A forest F is called agreement forest of trees T 1 and T 2 if F can be obtained from T 1 and T 2 by removing edges. Theorem [Allen & Steel, 01] TBR-distance(T 1, T 2 ) = #trees in smallest agreement forest - 1 NP-hard to compute Theorem [Bordewich & Semple, 04] rspr-distance(t 1, T 2) = #trees in smallest rooted agreement forest - 1 NP-hard to compute

127 Robinson-Foulds Distance Definition RF(T 1,T 2 ) = #splits/clusters occuring in exactly one of T 1 and T 2 = edge-contraction distance a common tree Note: observe relation to NNI: RF(T 1,T 2 ) 2 NNI(T 1,T 2 ) E F F C D B A trivial splits, {A,B} {C,D,E,F}, {E,F} {A,B,E,F}, {C,D} {A,B,E,F} A B trivial splits, {A,B} {C,D,E,F} {E,F} {A,B,C,D} {A,B,D} {C,E,F} 23 / 31 D C E

128 Robinson-Foulds Distance Definition RF(T 1,T 2 ) = #splits/clusters occuring in exactly one of T 1 and T 2 = edge-contraction distance a common tree Note: observe relation to NNI: RF(T 1,T 2 ) 2 NNI(T 1,T 2 ) E F F C D B A trivial splits, {A,B} {C,D,E,F}, {E,F} {A,B,E,F}, {C,D} {A,B,E,F} A B trivial splits, {A,B} {C,D,E,F} {E,F} {A,B,C,D} {A,B,D} {C,E,F} 23 / 31 D C E

129 Robinson-Foulds Distance Definition RF(T 1,T 2 ) = #splits/clusters occuring in exactly one of T 1 and T 2 = edge-contraction distance a common tree Note: observe relation to NNI: RF(T 1,T 2 ) 2 NNI(T 1,T 2 ) (C) Leonardo de Oliveira Martins 23 / 31

130 23 / 31 Robinson-Foulds Distance Definition RF(T 1,T 2 ) = #splits/clusters occuring in exactly one of T 1 and T 2 = edge-contraction distance a common tree Note: observe relation to NNI: RF(T 1,T 2 ) 2 NNI(T 1,T 2 ) Note: splits correspond to clusters when rooted at last leaf Day s Algorithm (common clusters in O(n)) [Day 85] 1. relabel all leaves s.t. leaves continuous in T 1 2. each vertex in T 1 knows: L: smallest leaf in cluster R: largest leaf in cluster T 1 s clusters in table [L, R] (O(n) sparse set/perfect hashing) 3. each vertex in T 2 knows L & R & size N of its cluster 4. each vertex in T 2 only checks for [L, R] if R L = N 1 (lookup in T 1 s table in O(1) (average) time)

131 23 / 31 Robinson-Foulds Distance C D B A C D E A B F Day s Algorithm (common clusters in O(n)) [Day 85] F E 1. relabel all leaves s.t. leaves continuous in T 1 2. each vertex in T 1 knows: L: smallest leaf in cluster R: largest leaf in cluster T 1 s clusters in table [L, R] (O(n) sparse set/perfect hashing) 3. each vertex in T 2 knows L & R & size N of its cluster 4. each vertex in T 2 only checks for [L, R] if R L = N 1 (lookup in T 1 s table in O(1) (average) time)

132 23 / 31 Robinson-Foulds Distance C D B A C D E A B F Day s Algorithm (common clusters in O(n)) [Day 85] F E 1. relabel all leaves s.t. leaves continuous in T 1 2. each vertex in T 1 knows: L: smallest leaf in cluster R: largest leaf in cluster T 1 s clusters in table [L, R] (O(n) sparse set/perfect hashing) 3. each vertex in T 2 knows L & R & size N of its cluster 4. each vertex in T 2 only checks for [L, R] if R L = N 1 (lookup in T 1 s table in O(1) (average) time)

133 23 / 31 Robinson-Foulds Distance C D A,E,5 C,D B A,D,4 A,B A,D A,D,3 A C A,B,2 A,E D E A B F Day s Algorithm (common clusters in O(n)) [Day 85] F E 1. relabel all leaves s.t. leaves continuous in T 1 2. each vertex in T 1 knows: L: smallest leaf in cluster R: largest leaf in cluster T 1 s clusters in table [L, R] (O(n) sparse set/perfect hashing) 3. each vertex in T 2 knows L & R & size N of its cluster 4. each vertex in T 2 only checks for [L, R] if R L = N 1 (lookup in T 1 s table in O(1) (average) time)

134 23 / 31 Robinson-Foulds Distance C D A,E,5 C,D B A,D,4 A,B A,D A,D,3 A C A,B,2 A,E D E A B F Day s Algorithm (common clusters in O(n)) [Day 85] F E 1. relabel all leaves s.t. leaves continuous in T 1 2. each vertex in T 1 knows: L: smallest leaf in cluster R: largest leaf in cluster T 1 s clusters in table [L, R] (O(n) sparse set/perfect hashing) 3. each vertex in T 2 knows L & R & size N of its cluster 4. each vertex in T 2 only checks for [L, R] if R L = N 1 (lookup in T 1 s table in O(1) (average) time)

135 24 / 31 Quartet/Triplet Distance Definition Q/T(T 1,T 2 ) = #quartets/triplets occur. in exactly one of T 1 and T 2

136 24 / 31 Quartet/Triplet Distance Definition Q/T(T 1,T 2 ) = #quartets/triplets occur. in exactly one of T 1 and T 2 computing Q-distance (binary trees) [Bryant et al. 00] 1. each edge uv has 4 sets (2 clusters for each of u & v) 2. quartet AB CD belongs to edge e if e splits AB CD & e touches AB-path or CD-path each split is owned exactly once 3. uv T 1 & ab T 2: intersect 4 sets of uv with split of ab in T 2 4. sizes of intersections can be precomputed bottom-up in O(n 2 ) time

137 24 / 31 Quartet/Triplet Distance Definition Q/T(T 1,T 2 ) = #quartets/triplets occur. in exactly one of T 1 and T 2 computing Q-distance (binary trees) [Bryant et al. 00] 1. each edge uv has 4 sets (2 clusters for each of u & v) 2. quartet AB CD belongs to edge e if e splits AB CD & e touches AB-path or CD-path each split is owned exactly once 3. uv T 1 & ab T 2: intersect 4 sets of uv with split of ab in T 2 4. sizes of intersections can be precomputed bottom-up in O(n 2 ) time State of the Art count conflict quartets/triplets O(n log n) time [Brodal et al. 13] enumerate conflict quartets O(n 2 + d) [Bryant et al. 00] enumerate conflict triplets O(n + d) [Weller 17]

138 25 / 31 Phylogenetic Networks Observation Trees cannot capture hybridization

139 25 / 31 Phylogenetic Networks Observation Trees cannot capture hybridization phylogenetic network

140 25 / 31 Phylogenetic Networks Observation Trees cannot capture hybridization phylogenetic network Definition evolutionary network N = rooted DAG, leaves labeled (taxa) reticulations R = vertices of in-degree 2 binary = all inner vertices degree 3 block = component without cut-vertex display T = subdivision of T is a subgraph

141 25 / 31 Phylogenetic Networks Observation Trees cannot capture hybridization phylogenetic network Definition evolutionary network N = rooted DAG, leaves labeled (taxa) reticulations R = vertices of in-degree 2 binary = all inner vertices degree 3 block = component without cut-vertex display T = subdivision of T is a subgraph

142 Split Networks split = bipartition of set of taxa splits A B & X Y incompatible if both A & B intersect both X & Y Convex Hull Algorithm [Holland et al., 04] fur AUS pouch land X X X XX X XX c.f.: Neighbor Net [Bryant & Moulton, 03] (for circular splits) 26 / 31

143 Split Networks split = bipartition of set of taxa splits A B & X Y incompatible if both A & B intersect both X & Y Convex Hull Algorithm [Holland et al., 04] fur AUS pouch land X X X XX X XX vs c.f.: Neighbor Net [Bryant & Moulton, 03] (for circular splits) 26 / 31

144 Split Networks split = bipartition of set of taxa splits A B & X Y incompatible if both A & B intersect both X & Y Convex Hull Algorithm [Holland et al., 04] fur AUS pouch land X X X XX X XX vs c.f.: Neighbor Net [Bryant & Moulton, 03] (for circular splits) 26 / 31

145 Split Networks split = bipartition of set of taxa splits A B & X Y incompatible if both A & B intersect both X & Y Convex Hull Algorithm [Holland et al., 04] fur AUS pouch land X X X XX X XX vs vs c.f.: Neighbor Net [Bryant & Moulton, 03] (for circular splits) 26 / 31

146 Split Networks split = bipartition of set of taxa splits A B & X Y incompatible if both A & B intersect both X & Y Convex Hull Algorithm [Holland et al., 04] fur AUS pouch land X X X XX X XX vs vs c.f.: Neighbor Net [Bryant & Moulton, 03] (for circular splits) 26 / 31

147 Split Networks split = bipartition of set of taxa splits A B & X Y incompatible if both A & B intersect both X & Y Convex Hull Algorithm [Holland et al., 04] fur AUS pouch land X X X XX X XX vs vs c.f.: Neighbor Net [Bryant & Moulton, 03] (for circular splits) 26 / 31

148 Split Networks split = bipartition of set of taxa splits A B & X Y incompatible if both A & B intersect both X & Y Convex Hull Algorithm [Holland et al., 04] fur AUS pouch land X X X XX X XX vs vs vs c.f.: Neighbor Net [Bryant & Moulton, 03] (for circular splits) 26 / 31

149 Split Networks split = bipartition of set of taxa splits A B & X Y incompatible if both A & B intersect both X & Y Convex Hull Algorithm [Holland et al., 04] fur AUS pouch land X X X XX X XX vs vs vs c.f.: Neighbor Net [Bryant & Moulton, 03] (for circular splits) 26 / 31

150 Split Networks split = bipartition of set of taxa splits A B & X Y incompatible if both A & B intersect both X & Y Convex Hull Algorithm [Holland et al., 04] fur AUS pouch land X X X XX X XX vs vs vs vs c.f.: Neighbor Net [Bryant & Moulton, 03] (for circular splits) 26 / 31

151 27 / 31 Consensus Split Networks Strategy 1. list all splits of all input trees 2. extend splits to full taxa using Z-closure 3. build consensus

152 27 / 31 Consensus Split Networks Strategy 1. list all splits of all input trees 2. extend splits to full taxa using Z-closure 3. build consensus Experimental Study 106 gene trees (yeast) [Rokas et al. 03, Holland et al. 04]

153 27 / 31 Consensus Split Networks Strategy 1. list all splits of all input trees 2. extend splits to full taxa using Z-closure 3. build consensus Experimental Study 106 gene trees (yeast) [Rokas et al. 03, Holland et al. 04]

154 27 / 31 Consensus Split Networks Strategy 1. list all splits of all input trees 2. extend splits to full taxa using Z-closure 3. build consensus Experimental Study 106 gene trees (yeast) [Rokas et al. 03, Holland et al. 04]

155 27 / 31 Consensus Split Networks Strategy 1. list all splits of all input trees 2. extend splits to full taxa using Z-closure 3. build consensus Experimental Study 106 gene trees (yeast) [Rokas et al. 03, Holland et al. 04]

156 28 / 31 Rooted Network Reconstruction Observation rooted network: cluster of u cluster of v u v rooted network is hasse diagram of its clusters

157 28 / 31 Rooted Network Reconstruction Observation rooted network: cluster of u cluster of v u v rooted network is hasse diagram of its clusters Example {a,b,c,d},{c,d,e,f,g,h}, {c,d,e,f,g}, {e,f,g,h}, {c,d,e}, {e,f,g}, {a,b}, {c,d}, {f,g} Exercise Time

158 28 / 31 Rooted Network Reconstruction Observation rooted network: cluster of u cluster of v u v rooted network is hasse diagram of its clusters Example {a,b,c,d},{c,d,e,f,g,h}, {c,d,e,f,g}, {e,f,g,h}, {c,d,e}, {e,f,g}, {a,b}, {c,d}, {f,g} a,b,c,d c,d,e,f,g c,d,e,f,g,h e,f,g,h c,d,e e,f,g a,b c,d f,g a b c d e f g h

159 28 / 31 Rooted Network Reconstruction Observation rooted network: cluster of u cluster of v u v rooted network is hasse diagram of its clusters Example {a,b,c,d},{c,d,e,f,g,h}, {c,d,e,f,g}, {e,f,g,h}, {c,d,e}, {e,f,g}, {a,b}, {c,d}, {f,g} a b c d e f g h

160 Rooted Network Reconstruction Observation rooted network: cluster of u cluster of v u v rooted network is hasse diagram of its clusters Example {a,b,c,d},{c,d,e,f,g,h}, {c,d,e,f,g}, {e,f,g,h}, {c,d,e}, {e,f,g}, {a,b}, {c,d}, {f,g} a b c d e c.f. cluster popping [Huson & Rupp, 08] f g h 28 / 31

161 28 / 31 Rooted Network Reconstruction Observation rooted network: cluster of u cluster of v u v rooted network is hasse diagram of its clusters Problem may produce more reticulations than necessary to explain the data

162 28 / 31 Rooted Network Reconstruction Observation rooted network: cluster of u cluster of v u v rooted network is hasse diagram of its clusters Problem may produce more reticulations than necessary to explain the data Hybridization Number Input: set of trees T, int k Question: Is there a network with k reticulations displaying all trees in T?

163 28 / 31 Rooted Network Reconstruction Observation rooted network: cluster of u cluster of v u v rooted network is hasse diagram of its clusters Problem may produce more reticulations than necessary to explain the data Hybridization Number Input: set of trees T, int k Question: Is there a network with k reticulations displaying all trees in T? NP-hard for 2 trees [Bordewich & Semple, 07]

164 28 / 31 Rooted Network Reconstruction Observation rooted network: cluster of u cluster of v u v rooted network is hasse diagram of its clusters Problem may produce more reticulations than necessary to explain the data Hybridization Number Input: set of trees T, int k Question: Is there a network with k reticulations displaying all trees in T? NP-hard for 2 trees [Bordewich & Semple, 07] Note: HN(T 1,T 2 ) = max. acyclic agreement forest - 1 [Baroni et al., 05]

165 29 / 31 Networks Display Trees Observation A network may display up to 2 R trees.

166 29 / 31 Networks Display Trees Observation A network may display up to 2 R trees.

167 29 / 31 Networks Display Trees Observation A network may display up to 2 R trees. But: how to decide if a given tree is displayed?

168 29 / 31 Networks Display Trees Tree Containment Input: a network N, a tree T Question: Does N display T?

169 29 / 31 Networks Display Trees Tree Containment Input: a network N, a tree T Question: Does N display T? NP-hard (from Disjoint Paths) [Kanj et al. 08]

170 29 / 31 Networks Display Trees Tree Containment Input: a network N, a tree T Question: Does N display T? NP-hard (from Disjoint Paths) [Kanj et al. 08] Note: linear time on reticulation visible N [Gunawan, 18][Weller, 18]

171 Networks Display Trees Tree Containment Input: a network N, a tree T Question: Does N display T? NP-hard (from Disjoint Paths) [Kanj et al. 08] Note: linear time on reticulation visible N [Gunawan, 18][Weller, 18] Can you see it? A F G B C E D D E C F A G B 29 / 31

172 Small Taxonomy of Network Classes c.f. Who is Who in Phylogenetic Networks ( 30 / 31

173 Thanks & Enjoy Part III 31 / 31

Olivier Gascuel Arbres formels et Arbre de la Vie Conférence ENS Cachan, septembre Arbres formels et Arbre de la Vie.

Olivier Gascuel Arbres formels et Arbre de la Vie Conférence ENS Cachan, septembre Arbres formels et Arbre de la Vie. Arbres formels et Arbre de la Vie Olivier Gascuel Centre National de la Recherche Scientifique LIRMM, Montpellier, France www.lirmm.fr/gascuel 10 permanent researchers 2 technical staff 3 postdocs, 10

More information

Introduction to Trees

Introduction to Trees Introduction to Trees Tandy Warnow December 28, 2016 Introduction to Trees Tandy Warnow Clades of a rooted tree Every node v in a leaf-labelled rooted tree defines a subset of the leafset that is below

More information

CSE 549: Computational Biology

CSE 549: Computational Biology CSE 549: Computational Biology Phylogenomics 1 slides marked with * by Carl Kingsford Tree of Life 2 * H5N1 Influenza Strains Salzberg, Kingsford, et al., 2007 3 * H5N1 Influenza Strains The 2007 outbreak

More information

Evolutionary tree reconstruction (Chapter 10)

Evolutionary tree reconstruction (Chapter 10) Evolutionary tree reconstruction (Chapter 10) Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships between species since Darwin till early

More information

DIMACS Tutorial on Phylogenetic Trees and Rapidly Evolving Pathogens. Katherine St. John City University of New York 1

DIMACS Tutorial on Phylogenetic Trees and Rapidly Evolving Pathogens. Katherine St. John City University of New York 1 DIMACS Tutorial on Phylogenetic Trees and Rapidly Evolving Pathogens Katherine St. John City University of New York 1 Thanks to the DIMACS Staff Linda Casals Walter Morris Nicole Clark Katherine St. John

More information

Introduction to Computational Phylogenetics

Introduction to Computational Phylogenetics Introduction to Computational Phylogenetics Tandy Warnow The University of Texas at Austin No Institute Given This textbook is a draft, and should not be distributed. Much of what is in this textbook appeared

More information

Phylogenetics. Introduction to Bioinformatics Dortmund, Lectures: Sven Rahmann. Exercises: Udo Feldkamp, Michael Wurst

Phylogenetics. Introduction to Bioinformatics Dortmund, Lectures: Sven Rahmann. Exercises: Udo Feldkamp, Michael Wurst Phylogenetics Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1 Phylogenetics phylum = tree phylogenetics: reconstruction of evolutionary

More information

What is a phylogenetic tree? Algorithms for Computational Biology. Phylogenetics Summary. Di erent types of phylogenetic trees

What is a phylogenetic tree? Algorithms for Computational Biology. Phylogenetics Summary. Di erent types of phylogenetic trees What is a phylogenetic tree? Algorithms for Computational Biology Zsuzsanna Lipták speciation events Masters in Molecular and Medical Biotechnology a.a. 25/6, fall term Phylogenetics Summary wolf cat lion

More information

Dynamic Programming for Phylogenetic Estimation

Dynamic Programming for Phylogenetic Estimation 1 / 45 Dynamic Programming for Phylogenetic Estimation CS598AGB Pranjal Vachaspati University of Illinois at Urbana-Champaign 2 / 45 Coalescent-based Species Tree Estimation Find evolutionary tree for

More information

Codon models. In reality we use codon model Amino acid substitution rates meet nucleotide models Codon(nucleotide triplet)

Codon models. In reality we use codon model Amino acid substitution rates meet nucleotide models Codon(nucleotide triplet) Phylogeny Codon models Last lecture: poor man s way of calculating dn/ds (Ka/Ks) Tabulate synonymous/non- synonymous substitutions Normalize by the possibilities Transform to genetic distance K JC or K

More information

Recent Research Results. Evolutionary Trees Distance Methods

Recent Research Results. Evolutionary Trees Distance Methods Recent Research Results Evolutionary Trees Distance Methods Indo-European Languages After Tandy Warnow What is the purpose? Understand evolutionary history (relationship between species). Uderstand how

More information

CS 581. Tandy Warnow

CS 581. Tandy Warnow CS 581 Tandy Warnow This week Maximum parsimony: solving it on small datasets Maximum Likelihood optimization problem Felsenstein s pruning algorithm Bayesian MCMC methods Research opportunities Maximum

More information

Evolution of Tandemly Repeated Sequences

Evolution of Tandemly Repeated Sequences University of Canterbury Department of Mathematics and Statistics Evolution of Tandemly Repeated Sequences A thesis submitted in partial fulfilment of the requirements of the Degree for Master of Science

More information

The SNPR neighbourhood of tree-child networks

The SNPR neighbourhood of tree-child networks Journal of Graph Algorithms and Applications http://jgaa.info/ vol. 22, no. 2, pp. 29 55 (2018) DOI: 10.7155/jgaa.00472 The SNPR neighbourhood of tree-child networks Jonathan Klawitter Department of Computer

More information

Molecular Evolution & Phylogenetics Complexity of the search space, distance matrix methods, maximum parsimony

Molecular Evolution & Phylogenetics Complexity of the search space, distance matrix methods, maximum parsimony Molecular Evolution & Phylogenetics Complexity of the search space, distance matrix methods, maximum parsimony Basic Bioinformatics Workshop, ILRI Addis Ababa, 12 December 2017 Learning Objectives understand

More information

Improved parameterized complexity of the Maximum Agreement Subtree and Maximum Compatible Tree problems LIRMM, Tech.Rep. num 04026

Improved parameterized complexity of the Maximum Agreement Subtree and Maximum Compatible Tree problems LIRMM, Tech.Rep. num 04026 Improved parameterized complexity of the Maximum Agreement Subtree and Maximum Compatible Tree problems LIRMM, Tech.Rep. num 04026 Vincent Berry, François Nicolas Équipe Méthodes et Algorithmes pour la

More information

of the Balanced Minimum Evolution Polytope Ruriko Yoshida

of the Balanced Minimum Evolution Polytope Ruriko Yoshida Optimality of the Neighbor Joining Algorithm and Faces of the Balanced Minimum Evolution Polytope Ruriko Yoshida Figure 19.1 Genomes 3 ( Garland Science 2007) Origins of Species Tree (or web) of life eukarya

More information

Scaling species tree estimation methods to large datasets using NJMerge

Scaling species tree estimation methods to large datasets using NJMerge Scaling species tree estimation methods to large datasets using NJMerge Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana Champaign 2018 Phylogenomics Software

More information

Phylogenetics on CUDA (Parallel) Architectures Bradly Alicea

Phylogenetics on CUDA (Parallel) Architectures Bradly Alicea Descent w/modification Descent w/modification Descent w/modification Descent w/modification CPU Descent w/modification Descent w/modification Phylogenetics on CUDA (Parallel) Architectures Bradly Alicea

More information

Algorithms for Bioinformatics

Algorithms for Bioinformatics Adapted from slides by Leena Salmena and Veli Mäkinen, which are partly from http: //bix.ucsd.edu/bioalgorithms/slides.php. 582670 Algorithms for Bioinformatics Lecture 6: Distance based clustering and

More information

Trinets encode tree-child and level-2 phylogenetic networks

Trinets encode tree-child and level-2 phylogenetic networks Noname manuscript No. (will be inserted by the editor) Trinets encode tree-child and level-2 phylogenetic networks Leo van Iersel Vincent Moulton the date of receipt and acceptance should be inserted later

More information

Lecture 20: Clustering and Evolution

Lecture 20: Clustering and Evolution Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 11/11/2014 Comp 555 Bioalgorithms (Fall 2014) 1 Clique Graphs A clique is a graph where every vertex is connected via an edge to every other

More information

Lecture 20: Clustering and Evolution

Lecture 20: Clustering and Evolution Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 11/12/2013 Comp 465 Fall 2013 1 Clique Graphs A clique is a graph where every vertex is connected via an edge to every other vertex A clique

More information

arxiv: v2 [q-bio.pe] 8 Aug 2016

arxiv: v2 [q-bio.pe] 8 Aug 2016 Combinatorial Scoring of Phylogenetic Networks Nikita Alexeev and Max A. Alekseyev The George Washington University, Washington, D.C., U.S.A. arxiv:160.0841v [q-bio.pe] 8 Aug 016 Abstract. Construction

More information

Study of a Simple Pruning Strategy with Days Algorithm

Study of a Simple Pruning Strategy with Days Algorithm Study of a Simple Pruning Strategy with ays Algorithm Thomas G. Kristensen Abstract We wish to calculate all pairwise Robinson Foulds distances in a set of trees. Traditional algorithms for doing this

More information

4/4/16 Comp 555 Spring

4/4/16 Comp 555 Spring 4/4/16 Comp 555 Spring 2016 1 A clique is a graph where every vertex is connected via an edge to every other vertex A clique graph is a graph where each connected component is a clique The concept of clustering

More information

Distances Between Phylogenetic Trees: A Survey

Distances Between Phylogenetic Trees: A Survey TSINGHUA SCIENCE AND TECHNOLOGY ISSNll1007-0214ll07/11llpp490-499 Volume 18, Number 5, October 2013 Distances Between Phylogenetic Trees: A Survey Feng Shi, Qilong Feng, Jianer Chen, Lusheng Wang, and

More information

EVOLUTIONARY DISTANCES INFERRING PHYLOGENIES

EVOLUTIONARY DISTANCES INFERRING PHYLOGENIES EVOLUTIONARY DISTANCES INFERRING PHYLOGENIES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 28 th November 2007 OUTLINE 1 INFERRING

More information

Isometric gene tree reconciliation revisited

Isometric gene tree reconciliation revisited DOI 0.86/s05-07-008-x Algorithms for Molecular Biology RESEARCH Open Access Isometric gene tree reconciliation revisited Broňa Brejová *, Askar Gafurov, Dana Pardubská, Michal Sabo and Tomáš Vinař Abstract

More information

Seeing the wood for the trees: Analysing multiple alternative phylogenies

Seeing the wood for the trees: Analysing multiple alternative phylogenies Seeing the wood for the trees: Analysing multiple alternative phylogenies Tom M. W. Nye, Newcastle University tom.nye@ncl.ac.uk Isaac Newton Institute, 17 December 2007 Multiple alternative phylogenies

More information

Answer Set Programming or Hypercleaning: Where does the Magic Lie in Solving Maximum Quartet Consistency?

Answer Set Programming or Hypercleaning: Where does the Magic Lie in Solving Maximum Quartet Consistency? Answer Set Programming or Hypercleaning: Where does the Magic Lie in Solving Maximum Quartet Consistency? Fathiyeh Faghih and Daniel G. Brown David R. Cheriton School of Computer Science, University of

More information

11/17/2009 Comp 590/Comp Fall

11/17/2009 Comp 590/Comp Fall Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 Problem Set #5 will be available tonight 11/17/2009 Comp 590/Comp 790-90 Fall 2009 1 Clique Graphs A clique is a graph with every vertex connected

More information

Reconstructing Reticulate Evolution in Species Theory and Practice

Reconstructing Reticulate Evolution in Species Theory and Practice Reconstructing Reticulate Evolution in Species Theory and Practice Luay Nakhleh Department of Computer Science Rice University Houston, Texas 77005 nakhleh@cs.rice.edu Tandy Warnow Department of Computer

More information

Terminology. A phylogeny is the evolutionary history of an organism

Terminology. A phylogeny is the evolutionary history of an organism Phylogeny Terminology A phylogeny is the evolutionary history of an organism A taxon (plural: taxa) is a group of (one or more) organisms, which a taxonomist adjudges to be a unit. A definition? from Wikipedia

More information

Phylogenetic Trees Lecture 12. Section 7.4, in Durbin et al., 6.5 in Setubal et al. Shlomo Moran, Ilan Gronau

Phylogenetic Trees Lecture 12. Section 7.4, in Durbin et al., 6.5 in Setubal et al. Shlomo Moran, Ilan Gronau Phylogenetic Trees Lecture 12 Section 7.4, in Durbin et al., 6.5 in Setubal et al. Shlomo Moran, Ilan Gronau. Maximum Parsimony. Last week we presented Fitch algorithm for (unweighted) Maximum Parsimony:

More information

ML phylogenetic inference and GARLI. Derrick Zwickl. University of Arizona (and University of Kansas) Workshop on Molecular Evolution 2015

ML phylogenetic inference and GARLI. Derrick Zwickl. University of Arizona (and University of Kansas) Workshop on Molecular Evolution 2015 ML phylogenetic inference and GARLI Derrick Zwickl University of Arizona (and University of Kansas) Workshop on Molecular Evolution 2015 Outline Heuristics and tree searches ML phylogeny inference and

More information

Phylogenetic Trees and Their Analysis

Phylogenetic Trees and Their Analysis City University of New York (CUNY) CUNY Academic Works Dissertations, Theses, and Capstone Projects Graduate Center 2-2014 Phylogenetic Trees and Their Analysis Eric Ford Graduate Center, City University

More information

Reconciliation Problems for Duplication, Loss and Horizontal Gene Transfer Pawel Górecki. Presented by Connor Magill November 20, 2008

Reconciliation Problems for Duplication, Loss and Horizontal Gene Transfer Pawel Górecki. Presented by Connor Magill November 20, 2008 Reconciliation Problems for Duplication, Loss and Horizontal Gene Transfer Pawel Górecki Presented by Connor Magill November 20, 2008 Introduction Problem: Relationships between species cannot always be

More information

Computing the Quartet Distance Between Trees of Arbitrary Degrees

Computing the Quartet Distance Between Trees of Arbitrary Degrees January 22, 2006 University of Aarhus Department of Computer Science Computing the Quartet Distance Between Trees of Arbitrary Degrees Chris Christiansen & Martin Randers Thesis supervisor: Christian Nørgaard

More information

Section 3.1: Nonseparable Graphs Cut vertex of a connected graph G: A vertex x G such that G x is not connected. Theorem 3.1, p. 57: Every connected

Section 3.1: Nonseparable Graphs Cut vertex of a connected graph G: A vertex x G such that G x is not connected. Theorem 3.1, p. 57: Every connected Section 3.1: Nonseparable Graphs Cut vertex of a connected graph G: A vertex x G such that G x is not connected. Theorem 3.1, p. 57: Every connected graph G with at least 2 vertices contains at least 2

More information

SPR-BASED TREE RECONCILIATION: NON-BINARY TREES AND MULTIPLE SOLUTIONS

SPR-BASED TREE RECONCILIATION: NON-BINARY TREES AND MULTIPLE SOLUTIONS 1 SPR-BASED TREE RECONCILIATION: NON-BINARY TREES AND MULTIPLE SOLUTIONS C. THAN and L. NAKHLEH Department of Computer Science Rice University 6100 Main Street, MS 132 Houston, TX 77005, USA Email: {cvthan,nakhleh}@cs.rice.edu

More information

Alignment of Trees and Directed Acyclic Graphs

Alignment of Trees and Directed Acyclic Graphs Alignment of Trees and Directed Acyclic Graphs Gabriel Valiente Algorithms, Bioinformatics, Complexity and Formal Methods Research Group Technical University of Catalonia Computational Biology and Bioinformatics

More information

Main Reference. Marc A. Suchard: Stochastic Models for Horizontal Gene Transfer: Taking a Random Walk through Tree Space Genetics 2005

Main Reference. Marc A. Suchard: Stochastic Models for Horizontal Gene Transfer: Taking a Random Walk through Tree Space Genetics 2005 Stochastic Models for Horizontal Gene Transfer Dajiang Liu Department of Statistics Main Reference Marc A. Suchard: Stochastic Models for Horizontal Gene Transfer: Taing a Random Wal through Tree Space

More information

Applied Mathematics Letters. Graph triangulations and the compatibility of unrooted phylogenetic trees

Applied Mathematics Letters. Graph triangulations and the compatibility of unrooted phylogenetic trees Applied Mathematics Letters 24 (2011) 719 723 Contents lists available at ScienceDirect Applied Mathematics Letters journal homepage: www.elsevier.com/locate/aml Graph triangulations and the compatibility

More information

ABOUT THE LARGEST SUBTREE COMMON TO SEVERAL PHYLOGENETIC TREES Alain Guénoche 1, Henri Garreta 2 and Laurent Tichit 3

ABOUT THE LARGEST SUBTREE COMMON TO SEVERAL PHYLOGENETIC TREES Alain Guénoche 1, Henri Garreta 2 and Laurent Tichit 3 The XIII International Conference Applied Stochastic Models and Data Analysis (ASMDA-2009) June 30-July 3, 2009, Vilnius, LITHUANIA ISBN 978-9955-28-463-5 L. Sakalauskas, C. Skiadas and E. K. Zavadskas

More information

Phylogenetic incongruence through the lens of Monadic Second Order logic

Phylogenetic incongruence through the lens of Monadic Second Order logic Journal of Graph Algorithms and Applications http://jgaa.info/ vol. 20, no. 2, pp. 189 215 (2016) DOI: 10.7155/jgaa.00390 Phylogenetic incongruence through the lens of Monadic Second Order logic Steven

More information

Distance based tree reconstruction. Hierarchical clustering (UPGMA) Neighbor-Joining (NJ)

Distance based tree reconstruction. Hierarchical clustering (UPGMA) Neighbor-Joining (NJ) Distance based tree reconstruction Hierarchical clustering (UPGMA) Neighbor-Joining (NJ) All organisms have evolved from a common ancestor. Infer the evolutionary tree (tree topology and edge lengths)

More information

Notes 4 : Approximating Maximum Parsimony

Notes 4 : Approximating Maximum Parsimony Notes 4 : Approximating Maximum Parsimony MATH 833 - Fall 2012 Lecturer: Sebastien Roch References: [SS03, Chapters 2, 5], [DPV06, Chapters 5, 9] 1 Coping with NP-completeness Local search heuristics.

More information

SEEING THE TREES AND THEIR BRANCHES IN THE NETWORK IS HARD

SEEING THE TREES AND THEIR BRANCHES IN THE NETWORK IS HARD 1 SEEING THE TREES AND THEIR BRANCHES IN THE NETWORK IS HARD I A KANJ School of Computer Science, Telecommunications, and Information Systems, DePaul University, Chicago, IL 60604-2301, USA E-mail: ikanj@csdepauledu

More information

human chimp mouse rat

human chimp mouse rat Michael rudno These notes are based on earlier notes by Tomas abak Phylogenetic Trees Phylogenetic Trees demonstrate the amoun of evolution, and order of divergence for several genomes. Phylogenetic trees

More information

Phylogenetic networks that display a tree twice

Phylogenetic networks that display a tree twice Bulletin of Mathematical Biology manuscript No. (will be inserted by the editor) Phylogenetic networks that display a tree twice Paul Cordue Simone Linz Charles Semple Received: date / Accepted: date Abstract

More information

CISC 636 Computational Biology & Bioinformatics (Fall 2016) Phylogenetic Trees (I)

CISC 636 Computational Biology & Bioinformatics (Fall 2016) Phylogenetic Trees (I) CISC 636 Computational iology & ioinformatics (Fall 2016) Phylogenetic Trees (I) Maximum Parsimony CISC636, F16, Lec13, Liao 1 Evolution Mutation, selection, Only the Fittest Survive. Speciation. t one

More information

Parsimony-Based Approaches to Inferring Phylogenetic Trees

Parsimony-Based Approaches to Inferring Phylogenetic Trees Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 www.biostat.wisc.edu/bmi576.html Mark Craven craven@biostat.wisc.edu Fall 0 Phylogenetic tree approaches! three general types! distance:

More information

On the Optimality of the Neighbor Joining Algorithm

On the Optimality of the Neighbor Joining Algorithm On the Optimality of the Neighbor Joining Algorithm Ruriko Yoshida Dept. of Statistics University of Kentucky Joint work with K. Eickmeyer, P. Huggins, and L. Pachter www.ms.uky.edu/ ruriko Louisville

More information

Distance-based Phylogenetic Methods Near a Polytomy

Distance-based Phylogenetic Methods Near a Polytomy Distance-based Phylogenetic Methods Near a Polytomy Ruth Davidson and Seth Sullivant NCSU UIUC May 21, 2014 2 Phylogenetic trees model the common evolutionary history of a group of species Leaves = extant

More information

Comparison of commonly used methods for combining multiple phylogenetic data sets

Comparison of commonly used methods for combining multiple phylogenetic data sets Comparison of commonly used methods for combining multiple phylogenetic data sets Anne Kupczok, Heiko A. Schmidt and Arndt von Haeseler Center for Integrative Bioinformatics Vienna Max F. Perutz Laboratories

More information

Parsimony Least squares Minimum evolution Balanced minimum evolution Maximum likelihood (later in the course)

Parsimony Least squares Minimum evolution Balanced minimum evolution Maximum likelihood (later in the course) Tree Searching We ve discussed how we rank trees Parsimony Least squares Minimum evolution alanced minimum evolution Maximum likelihood (later in the course) So we have ways of deciding what a good tree

More information

The worst case complexity of Maximum Parsimony

The worst case complexity of Maximum Parsimony he worst case complexity of Maximum Parsimony mir armel Noa Musa-Lempel Dekel sur Michal Ziv-Ukelson Ben-urion University June 2, 20 / 2 What s a phylogeny Phylogenies: raph-like structures whose topology

More information

MOLECULAR phylogenetic methods reconstruct evolutionary

MOLECULAR phylogenetic methods reconstruct evolutionary Calculating the Unrooted Subtree Prune-and-Regraft Distance Chris Whidden and Frederick A. Matsen IV arxiv:.09v [cs.ds] Nov 0 Abstract The subtree prune-and-regraft (SPR) distance metric is a fundamental

More information

Sequence length requirements. Tandy Warnow Department of Computer Science The University of Texas at Austin

Sequence length requirements. Tandy Warnow Department of Computer Science The University of Texas at Austin Sequence length requirements Tandy Warnow Department of Computer Science The University of Texas at Austin Part 1: Absolute Fast Convergence DNA Sequence Evolution AAGGCCT AAGACTT TGGACTT -3 mil yrs -2

More information

Fixed parameter algorithms for compatible and agreement supertree problems

Fixed parameter algorithms for compatible and agreement supertree problems Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 2013 Fixed parameter algorithms for compatible and agreement supertree problems Sudheer Reddy Vakati Iowa State

More information

Efficient Non-binary Gene Tree Resolution with Weighted Reconciliation Cost

Efficient Non-binary Gene Tree Resolution with Weighted Reconciliation Cost Efficient Non-binary Gene Tree Resolution with Weighted Reconciliation Cost DIRO Université de Montréal Manuel Lafond, Emmanuel Noutahi Nadia El-Mabrouk Introduction Gene Tree : representation of the evolutionary

More information

Approximating Subtree Distances Between Phylogenies. MARIA LUISA BONET, 1 KATHERINE ST. JOHN, 2,3 RUCHI MAHINDRU, 2,4 and NINA AMENTA 5 ABSTRACT

Approximating Subtree Distances Between Phylogenies. MARIA LUISA BONET, 1 KATHERINE ST. JOHN, 2,3 RUCHI MAHINDRU, 2,4 and NINA AMENTA 5 ABSTRACT JOURNAL OF COMPUTATIONAL BIOLOGY Volume 13, Number 8, 2006 Mary Ann Liebert, Inc. Pp. 1419 1434 Approximating Subtree Distances Between Phylogenies AU1 AU2 MARIA LUISA BONET, 1 KATHERINE ST. JOHN, 2,3

More information

Algorithms for constructing more accurate and inclusive phylogenetic trees

Algorithms for constructing more accurate and inclusive phylogenetic trees Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 2013 Algorithms for constructing more accurate and inclusive phylogenetic trees Ruchi Chaudhary Iowa State University

More information

arxiv: v2 [q-bio.pe] 8 Sep 2015

arxiv: v2 [q-bio.pe] 8 Sep 2015 RH: Tree-Based Phylogenetic Networks On Tree Based Phylogenetic Networks arxiv:1509.01663v2 [q-bio.pe] 8 Sep 2015 Louxin Zhang 1 1 Department of Mathematics, National University of Singapore, Singapore

More information

Throughout the chapter, we will assume that the reader is familiar with the basics of phylogenetic trees.

Throughout the chapter, we will assume that the reader is familiar with the basics of phylogenetic trees. Chapter 7 SUPERTREE ALGORITHMS FOR NESTED TAXA Philip Daniel and Charles Semple Abstract: Keywords: Most supertree algorithms combine collections of rooted phylogenetic trees with overlapping leaf sets

More information

Efficiently Inferring Pairwise Subtree Prune-and-Regraft Adjacencies between Phylogenetic Trees

Efficiently Inferring Pairwise Subtree Prune-and-Regraft Adjacencies between Phylogenetic Trees Efficiently Inferring Pairwise Subtree Prune-and-Regraft Adjacencies between Phylogenetic Trees Downloaded 01/19/18 to 140.107.151.5. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

More information

Evolution Module. 6.1 Phylogenetic Trees. Bob Gardner and Lev Yampolski. Integrated Biology and Discrete Math (IBMS 1300)

Evolution Module. 6.1 Phylogenetic Trees. Bob Gardner and Lev Yampolski. Integrated Biology and Discrete Math (IBMS 1300) Evolution Module 6.1 Phylogenetic Trees Bob Gardner and Lev Yampolski Integrated Biology and Discrete Math (IBMS 1300) Fall 2008 1 INDUCTION Note. The natural numbers N is the familiar set N = {1, 2, 3,...}.

More information

Parsimonious Reconciliation of Non-binary Trees. Louxin Zhang National University of Singapore

Parsimonious Reconciliation of Non-binary Trees. Louxin Zhang National University of Singapore Parsimonious Reconciliation of Non-binary Trees Louxin Zhang National University of Singapore matzlx@nus.edu.sg Gene Tree vs the (Containing) Species Tree. A species tree S represents the evolutionary

More information

Markovian Models of Genetic Inheritance

Markovian Models of Genetic Inheritance Markovian Models of Genetic Inheritance Elchanan Mossel, U.C. Berkeley mossel@stat.berkeley.edu, http://www.cs.berkeley.edu/~mossel/ 6/18/12 1 General plan Define a number of Markovian Inheritance Models

More information

Computing Maximum Agreement Forests without Cluster Partitioning is Folly

Computing Maximum Agreement Forests without Cluster Partitioning is Folly Computing Maximum Agreement Forests without Cluster Partitioning is Folly Zhijiang Li 1 and Norbert Zeh 2 1 Microsoft Canada, Vancouver, BC, Canada zhijiang.li@dal.ca 2 Faculty of Computer Science, Dalhousie

More information

Clustering of Proteins

Clustering of Proteins Melroy Saldanha saldanha@stanford.edu CS 273 Project Report Clustering of Proteins Introduction Numerous genome-sequencing projects have led to a huge growth in the size of protein databases. Manual annotation

More information

CSE 373 MAY 10 TH SPANNING TREES AND UNION FIND

CSE 373 MAY 10 TH SPANNING TREES AND UNION FIND CSE 373 MAY 0 TH SPANNING TREES AND UNION FIND COURSE LOGISTICS HW4 due tonight, if you want feedback by the weekend COURSE LOGISTICS HW4 due tonight, if you want feedback by the weekend HW5 out tomorrow

More information

Unique reconstruction of tree-like phylogenetic networks from distances between leaves

Unique reconstruction of tree-like phylogenetic networks from distances between leaves Unique reconstruction of tree-like phylogenetic networks from distances between leaves Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA email: swillson@iastate.edu

More information

Prior Distributions on Phylogenetic Trees

Prior Distributions on Phylogenetic Trees Prior Distributions on Phylogenetic Trees Magnus Johansson Masteruppsats i matematisk statistik Master Thesis in Mathematical Statistics Masteruppsats 2011:4 Matematisk statistik Juni 2011 www.math.su.se

More information

The History Bound and ILP

The History Bound and ILP The History Bound and ILP Julia Matsieva and Dan Gusfield UC Davis March 15, 2017 Bad News for Tree Huggers More Bad News Far more convincingly even than the (also highly convincing) fossil evidence, the

More information

Algorithms for Computing Cluster Dissimilarity between Rooted Phylogenetic

Algorithms for Computing Cluster Dissimilarity between Rooted Phylogenetic Send Orders for Reprints to reprints@benthamscience.ae 8 The Open Cybernetics & Systemics Journal, 05, 9, 8-3 Open Access Algorithms for Computing Cluster Dissimilarity between Rooted Phylogenetic Trees

More information

Lecture 3: Graphs and flows

Lecture 3: Graphs and flows Chapter 3 Lecture 3: Graphs and flows Graphs: a useful combinatorial structure. Definitions: graph, directed and undirected graph, edge as ordered pair, path, cycle, connected graph, strongly connected

More information

Recognizing Interval Bigraphs by Forbidden Patterns

Recognizing Interval Bigraphs by Forbidden Patterns Recognizing Interval Bigraphs by Forbidden Patterns Arash Rafiey Simon Fraser University, Vancouver, Canada, and Indiana State University, IN, USA arashr@sfu.ca, arash.rafiey@indstate.edu Abstract Let

More information

INFERENCE OF PARSIMONIOUS SPECIES TREES FROM MULTI-LOCUS DATA BY MINIMIZING DEEP COALESCENCES CUONG THAN AND LUAY NAKHLEH

INFERENCE OF PARSIMONIOUS SPECIES TREES FROM MULTI-LOCUS DATA BY MINIMIZING DEEP COALESCENCES CUONG THAN AND LUAY NAKHLEH INFERENCE OF PARSIMONIOUS SPECIES TREES FROM MULTI-LOCUS DATA BY MINIMIZING DEEP COALESCENCES CUONG THAN AND LUAY NAKHLEH Abstract. One approach for inferring a species tree from a given multi-locus data

More information

12/5/17. trees. CS 220: Discrete Structures and their Applications. Trees Chapter 11 in zybooks. rooted trees. rooted trees

12/5/17. trees. CS 220: Discrete Structures and their Applications. Trees Chapter 11 in zybooks. rooted trees. rooted trees trees CS 220: Discrete Structures and their Applications A tree is an undirected graph that is connected and has no cycles. Trees Chapter 11 in zybooks rooted trees Rooted trees. Given a tree T, choose

More information

Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation

Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation Daniel H. Huson and David Bryant Software Demo, ISMB, Detroit, June 27, 2005 Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document under the terms of the

More information

Computing the All-Pairs Quartet Distance on a set of Evolutionary Trees

Computing the All-Pairs Quartet Distance on a set of Evolutionary Trees Journal of Bioinformatics and Computational Biology c Imperial College Press Computing the All-Pairs Quartet Distance on a set of Evolutionary Trees M. Stissing, T. Mailund, C. N. S. Pedersen and G. S.

More information

Least Common Ancestor Based Method for Efficiently Constructing Rooted Supertrees

Least Common Ancestor Based Method for Efficiently Constructing Rooted Supertrees Least ommon ncestor ased Method for fficiently onstructing Rooted Supertrees M.. Hai Zahid, nkush Mittal, R.. Joshi epartment of lectronics and omputer ngineering, IIT-Roorkee Roorkee, Uttaranchal, INI

More information

Introduction to Triangulated Graphs. Tandy Warnow

Introduction to Triangulated Graphs. Tandy Warnow Introduction to Triangulated Graphs Tandy Warnow Topics for today Triangulated graphs: theorems and algorithms (Chapters 11.3 and 11.9) Examples of triangulated graphs in phylogeny estimation (Chapters

More information

New Common Ancestor Problems in Trees and Directed Acyclic Graphs

New Common Ancestor Problems in Trees and Directed Acyclic Graphs New Common Ancestor Problems in Trees and Directed Acyclic Graphs Johannes Fischer, Daniel H. Huson Universität Tübingen, Center for Bioinformatics (ZBIT), Sand 14, D-72076 Tübingen Abstract We derive

More information

CLUSTERING IN BIOINFORMATICS

CLUSTERING IN BIOINFORMATICS CLUSTERING IN BIOINFORMATICS CSE/BIMM/BENG 8 MAY 4, 0 OVERVIEW Define the clustering problem Motivation: gene expression and microarrays Types of clustering Clustering algorithms Other applications of

More information

Designing parallel algorithms for constructing large phylogenetic trees on Blue Waters

Designing parallel algorithms for constructing large phylogenetic trees on Blue Waters Designing parallel algorithms for constructing large phylogenetic trees on Blue Waters Erin Molloy University of Illinois at Urbana Champaign General Allocation (PI: Tandy Warnow) Exploratory Allocation

More information

On Low Treewidth Graphs and Supertrees

On Low Treewidth Graphs and Supertrees Journal of Graph Algorithms and Applications http://jgaa.info/ vol. 19, no. 1, pp. 325 343 (2015) DOI: 10.7155/jgaa.00361 On Low Treewidth Graphs and Supertrees Alexander Grigoriev 1 Steven Kelk 2 Nela

More information

Lesson 13 Molecular Evolution

Lesson 13 Molecular Evolution Sequence Analysis Spring 2000 Dr. Richard Friedman (212)305-6901 (76901) friedman@cuccfa.ccc.columbia.edu 130BB Lesson 13 Molecular Evolution In this class we learn how to draw molecular evolutionary trees

More information

Analyzing Evolutionary Trees

Analyzing Evolutionary Trees Analyzing Evolutionary Trees Katherine St. John Lehman College and the Graduate Center City University of New York stjohn@lehman.cuny.edu Katherine St. John City University of New York 1 Overview Talk

More information

Fast Hierarchical Clustering via Dynamic Closest Pairs

Fast Hierarchical Clustering via Dynamic Closest Pairs Fast Hierarchical Clustering via Dynamic Closest Pairs David Eppstein Dept. Information and Computer Science Univ. of California, Irvine http://www.ics.uci.edu/ eppstein/ 1 My Interest In Clustering What

More information

arxiv: v3 [math.co] 17 Jan 2018

arxiv: v3 [math.co] 17 Jan 2018 Reconstructing unrooted phylogenetic trees from symbolic ternary metrics arxiv:1702.00190v3 [math.co] 17 Jan 2018 Stefan Grünewald CAS-MPG Partner Institute for Computational Biology Chinese Academy of

More information

From Trees to Networks and Back

From Trees to Networks and Back From Trees to Networks and Back Sarah Bastkowski Supervisor: Prof. Vincent Moulton Co-supervisor: Dr. Geoffrey Mckeown A thesis submitted for the degree of Doctor of Philosophy at the University of East

More information

Fast Algorithms for Large-Scale Phylogenetic Reconstruction

Fast Algorithms for Large-Scale Phylogenetic Reconstruction Fast Algorithms for Large-Scale Phylogenetic Reconstruction by Jakub Truszkowski A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Doctor of Philosophy

More information

Definition For vertices u, v V (G), the distance from u to v, denoted d(u, v), in G is the length of a shortest u, v-path. 1

Definition For vertices u, v V (G), the distance from u to v, denoted d(u, v), in G is the length of a shortest u, v-path. 1 Graph fundamentals Bipartite graph characterization Lemma. If a graph contains an odd closed walk, then it contains an odd cycle. Proof strategy: Consider a shortest closed odd walk W. If W is not a cycle,

More information

For all questions, E. NOTA means none of the above answers is correct. Diagrams are NOT drawn to scale.

For all questions, E. NOTA means none of the above answers is correct. Diagrams are NOT drawn to scale. For all questions, means none of the above answers is correct. Diagrams are NOT drawn to scale.. In the diagram, given m = 57, m = (x+ ), m = (4x 5). Find the degree measure of the smallest angle. 5. The

More information

A practical O(n log 2 n) time algorithm for computing the triplet distance on binary trees

A practical O(n log 2 n) time algorithm for computing the triplet distance on binary trees A practical O(n log 2 n) time algorithm for computing the triplet distance on binary trees Andreas Sand 1,2, Gerth Stølting Brodal 2,3, Rolf Fagerberg 4, Christian N. S. Pedersen 1,2 and Thomas Mailund

More information

Genetics/MBT 541 Spring, 2002 Lecture 1 Joe Felsenstein Department of Genome Sciences Phylogeny methods, part 1 (Parsimony and such)

Genetics/MBT 541 Spring, 2002 Lecture 1 Joe Felsenstein Department of Genome Sciences Phylogeny methods, part 1 (Parsimony and such) Genetics/MBT 541 Spring, 2002 Lecture 1 Joe Felsenstein Department of Genome Sciences joe@gs Phylogeny methods, part 1 (Parsimony and such) Methods of reconstructing phylogenies (evolutionary trees) Parsimony

More information

Graph and Digraph Glossary

Graph and Digraph Glossary 1 of 15 31.1.2004 14:45 Graph and Digraph Glossary A B C D E F G H I-J K L M N O P-Q R S T U V W-Z Acyclic Graph A graph is acyclic if it contains no cycles. Adjacency Matrix A 0-1 square matrix whose

More information