Alignment of Trees and Directed Acyclic Graphs

Similar documents
Research Article The Comparison of Tree-Sibling Time Consistent Phylogenetic Networks Is Graph Isomorphism-Complete

Phylogenetic networks that display a tree twice

SEEING THE TREES AND THEIR BRANCHES IN THE NETWORK IS HARD

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees

arxiv: v2 [q-bio.pe] 8 Sep 2015

Scaling species tree estimation methods to large datasets using NJMerge

The worst case complexity of Maximum Parsimony

arxiv: v2 [q-bio.pe] 8 Aug 2016

Introduction to Triangulated Graphs. Tandy Warnow

Dynamic Programming Course: A structure based flexible search method for motifs in RNA. By: Veksler, I., Ziv-Ukelson, M., Barash, D.

PHYLOGENETIC networks have been studied over the last

Graph and Digraph Glossary

Evolution of Tandemly Repeated Sequences

Dynamic Programming for Phylogenetic Estimation

Trinets encode tree-child and level-2 phylogenetic networks

Throughout the chapter, we will assume that the reader is familiar with the basics of phylogenetic trees.

Applied Mathematics Letters. Graph triangulations and the compatibility of unrooted phylogenetic trees

SPR-BASED TREE RECONCILIATION: NON-BINARY TREES AND MULTIPLE SOLUTIONS

UC Davis Computer Science Technical Report CSE On the Full-Decomposition Optimality Conjecture for Phylogenetic Networks

Olivier Gascuel Arbres formels et Arbre de la Vie Conférence ENS Cachan, septembre Arbres formels et Arbre de la Vie.

New Common Ancestor Problems in Trees and Directed Acyclic Graphs

Answer Set Programming or Hypercleaning: Where does the Magic Lie in Solving Maximum Quartet Consistency?

INFERRING OPTIMAL SPECIES TREES UNDER GENE DUPLICATION AND LOSS

Phylogenetics on CUDA (Parallel) Architectures Bradly Alicea

Introduction to Trees

The History Bound and ILP

of the Balanced Minimum Evolution Polytope Ruriko Yoshida

Designing parallel algorithms for constructing large phylogenetic trees on Blue Waters

Reconstructing Reticulate Evolution in Species Theory and Practice

Rotation Distance is Fixed-Parameter Tractable

Folding and unfolding phylogenetic trees and networks

Graph similarity. Laura Zager and George Verghese EECS, MIT. March 2005

Embedded Subgraph Isomorphism and Related Problems

The SNPR neighbourhood of tree-child networks

Identifiability of Large Phylogenetic Mixture Models

Phylogenetic Networks: Properties and Relationship to Trees and Clusters

Parallel Implementation of a Quartet-Based Algorithm for Phylogenetic Analysis

Chordal Graphs and Evolutionary Trees. Tandy Warnow

Phylogenetics. Introduction to Bioinformatics Dortmund, Lectures: Sven Rahmann. Exercises: Udo Feldkamp, Michael Wurst

Unique reconstruction of tree-like phylogenetic networks from distances between leaves

Evolution Module. 6.1 Phylogenetic Trees. Bob Gardner and Lev Yampolski. Integrated Biology and Discrete Math (IBMS 1300)

DIMACS Tutorial on Phylogenetic Trees and Rapidly Evolving Pathogens. Katherine St. John City University of New York 1

Computing the All-Pairs Quartet Distance on a set of Evolutionary Trees

TreeCmp 2.0: comparison of trees in polynomial time manual

Introduction to Evolutionary Computation

Dynamic Programming & Smith-Waterman algorithm

Improved parameterized complexity of the Maximum Agreement Subtree and Maximum Compatible Tree problems LIRMM, Tech.Rep. num 04026

CS 441 Discrete Mathematics for CS Lecture 26. Graphs. CS 441 Discrete mathematics for CS. Final exam

Cost Partitioning Techniques for Multiple Sequence Alignment. Mirko Riesterer,

Approximating Subtree Distances Between Phylogenies. MARIA LUISA BONET, 1 KATHERINE ST. JOHN, 2,3 RUCHI MAHINDRU, 2,4 and NINA AMENTA 5 ABSTRACT

Discrete mathematics , Fall Instructor: prof. János Pach

Treewidth and graph minors

This is the author s version of a work that was submitted/accepted for publication in the following source:

ABOUT THE LARGEST SUBTREE COMMON TO SEVERAL PHYLOGENETIC TREES Alain Guénoche 1, Henri Garreta 2 and Laurent Tichit 3

Genetics/MBT 541 Spring, 2002 Lecture 1 Joe Felsenstein Department of Genome Sciences Phylogeny methods, part 1 (Parsimony and such)

Population Genetics in BioPerl HOWTO

Lecture 5: Graphs. Rajat Mittal. IIT Kanpur

Molecular Evolution & Phylogenetics Complexity of the search space, distance matrix methods, maximum parsimony

Understanding Spaces of Phylogenetic Trees

Evolutionary tree reconstruction (Chapter 10)

A Lookahead Branch-and-Bound Algorithm for the Maximum Quartet Consistency Problem

An undirected graph is a tree if and only of there is a unique simple path between any 2 of its vertices.

Algorithms for Grid Graphs in the MapReduce Model

Graph Theory. Probabilistic Graphical Models. L. Enrique Sucar, INAOE. Definitions. Types of Graphs. Trajectories and Circuits.

A more efficient algorithm for perfect sorting by reversals

7.3 Spanning trees Spanning trees [ ] 61

A Connection between Network Coding and. Convolutional Codes

Main Reference. Marc A. Suchard: Stochastic Models for Horizontal Gene Transfer: Taking a Random Walk through Tree Space Genetics 2005

Outline. Guaranteed Visibility. Accordion Drawing. Guaranteed Visibility Challenges. Guaranteed Visibility Challenges

What is a phylogenetic tree? Algorithms for Computational Biology. Phylogenetics Summary. Di erent types of phylogenetic trees

Computational Genomics and Molecular Biology, Fall

Distance-based Phylogenetic Methods Near a Polytomy

Approximation Algorithms for Constrained Generalized Tree Alignment Problem

CSE 549: Computational Biology

Parsimony-Based Approaches to Inferring Phylogenetic Trees

arxiv: v1 [cs.dm] 21 Dec 2015

K 4 C 5. Figure 4.5: Some well known family of graphs

Algorithms for Bioinformatics

One of the central problems in computational biology is the problem of reconstructing evolutionary. Research Articles

Crossing bridges. Crossing bridges Great Ideas in Theoretical Computer Science. Lecture 12: Graphs I: The Basics. Königsberg (Prussia)

PROTEIN MULTIPLE ALIGNMENT MOTIVATION: BACKGROUND: Marina Sirota

Markovian Models of Genetic Inheritance

Maximum Parsimony on Phylogenetic networks

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

Missing Data Estimation in Microarrays Using Multi-Organism Approach

Algorithms for Computing Cluster Dissimilarity between Rooted Phylogenetic

Introduction to Computational Phylogenetics

Lecture 7. Transform-and-Conquer

A New Approach For Tree Alignment Based on Local Re-Optimization

Isometric gene tree reconciliation revisited

arxiv: v1 [cs.ds] 12 May 2013

Graph Algorithms Using Depth First Search

A SIMPLE APPROXIMATION ALGORITHM FOR NONOVERLAPPING LOCAL ALIGNMENTS (WEIGHTED INDEPENDENT SETS OF AXIS PARALLEL RECTANGLES)

Chapter 3 Trees. Theorem A graph T is a tree if, and only if, every two distinct vertices of T are joined by a unique path.

INFERENCE OF PARSIMONIOUS SPECIES TREES FROM MULTI-LOCUS DATA BY MINIMIZING DEEP COALESCENCES CUONG THAN AND LUAY NAKHLEH

Simulation of Molecular Evolution with Bioinformatics Analysis

On the EXISTENCE of SPECIAL DEPTH FIRST SEARCH TREES *

Sequence alignment algorithms

Hamilton paths & circuits. Gray codes. Hamilton Circuits. Planar Graphs. Hamilton circuits. 10 Nov 2015

Lecture 2 : Counting X-trees

Transcription:

Alignment of Trees and Directed Acyclic Graphs Gabriel Valiente Algorithms, Bioinformatics, Complexity and Formal Methods Research Group Technical University of Catalonia Computational Biology and Bioinformatics Research Group Research Institute of Health Science, University of the Balearic Islands Centre for Genomic Regulation Barcelona Biomedical Research Park Ben-Gurion University of the Negev, Israel, April 27, 2009 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 1 / 35

Abstract It is well known that the string edit distance and the alignment of strings coincide, while the alignment of trees differs from the tree edit distance. In this talk, we recall various constraints on directed acyclic graphs that allow for a unique (up to isomorphism) representation, called the path multiplicity representation, and present a new method for the alignment of trees and directed acyclic graphs that exploits the path multiplicity representation to produce a meaningful optimal alignment in polynomial time. Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 2 / 35

Plan of the Talk String edit distance and alignment Tree edit distance and alignment DAG representation of phylogenetic networks Path multiplicity representation DAG alignment Tree alignment as DAG alignment Tool support BioPerl module Web interface to the BioPerl module Conclusion Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 3 / 35

String edit distance and alignment Definition The edit distance between two strings is the smallest number of insertions, deletions, and substitutions needed to transform one string into the other Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 4 / 35

String edit distance and alignment Definition The edit distance between two strings is the smallest number of insertions, deletions, and substitutions needed to transform one string into the other Definition An alignment of two strings is an arrangement of the two strings as rows of a matrix, with additional gaps (dashes) between the elements to make some or all of the remaining (aligned) columns contain identical elements but with no column gapped in both strings Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 4 / 35

String edit distance and alignment Definition The edit distance between two strings is the smallest number of insertions, deletions, and substitutions needed to transform one string into the other Definition An alignment of two strings is an arrangement of the two strings as rows of a matrix, with additional gaps (dashes) between the elements to make some or all of the remaining (aligned) columns contain identical elements but with no column gapped in both strings Example (Optimal alignment) - GCTTCCGGCTCGTATAATGTGTGG * * * TGCTTCTGACT ---ATAATA -G--- Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 4 / 35

Tree edit distance and alignment Definition The edit distance between two trees is the smallest number of insertions, deletions, and substitutions needed to transform one tree into the other Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 5 / 35

Tree edit distance and alignment Definition The edit distance between two trees is the smallest number of insertions, deletions, and substitutions needed to transform one tree into the other Example (Edit distance) a a a e d b f b c b c d c d Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 5 / 35

Tree edit distance and alignment Definition An alignment of two trees is an arrangement of the trees with space labeled nodes inserted such that their structures coincide Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 6 / 35

Tree edit distance and alignment Definition An alignment of two trees is an arrangement of the trees with space labeled nodes inserted such that their structures coincide Example (Optimal alignment) a a a a e d e f b f b c b b c c d d c d Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 6 / 35

Tree edit distance and alignment Remark An alignment of trees is a restricted form of tree edit distance in which all the insertions precede all the deletions Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 7 / 35

Tree edit distance and alignment Remark An alignment of trees is a restricted form of tree edit distance in which all the insertions precede all the deletions Remark With insertion cost 1, deletion cost 1, identical substitution cost 0, and non-identical substitution cost 2, an optimal tree edit yields a largest common subtree and an optimal alignment yields a smallest common supertree T. Jiang, L. Wang, and K. Zhang. Alignment of trees an alternative to tree edit. Theoretical Computer Science, 143(1):137 148, 1995 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 7 / 35

Tree edit distance and alignment H. Bunke, X. Jiang, and A. Kandel. On the minimum common supergraph of two graphs. Computing, 65(1):13 25, 2000 M.-L. Fernández and G. Valiente. A graph distance measure combining maximum common subgraph and minimum common supergraph. Pattern Recognition Letters, 22(6 7):753 758, 2001 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 8 / 35

Tree edit distance and alignment H. Bunke, X. Jiang, and A. Kandel. On the minimum common supergraph of two graphs. Computing, 65(1):13 25, 2000 M.-L. Fernández and G. Valiente. A graph distance measure combining maximum common subgraph and minimum common supergraph. Pattern Recognition Letters, 22(6 7):753 758, 2001 Theorem The problems of finding a largest common subtree and a smallest common supertree of two trees, in each case together with a pair of witness (minor, topological, homeomorphic, or isomorphic) embeddings, are reducible to each other in time linear in the size of the trees F. Rosselló and G. Valiente. An algebraic view of the relation between largest common subtrees and smallest common supertrees. Theoretical Computer Science, 362(1 3):33 53, 2006 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 8 / 35

Tree edit distance and alignment Example A. Lozano, R. Pinter, O. Rokhlenko, G. Valiente, and M. Ziv-Ukelson. Seeded tree alignment and planar tanglegram layout. In Proc. 7th Workshop on Algorithms in Bioinformatics, volume 4645 of Lecture Notes in Bioinformatics, pages 98 110. Springer, 2007 A. Lozano, R. Pinter, O. Rokhlenko, G. Valiente, and M. Ziv-Ukelson. Seeded tree alignment. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 5(4):503 513, 2008 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 9 / 35

DAG representation of phylogenetic networks D. H. Huson and D. Bryant. Application of phylogenetic networks in evolutionary studies. Mol. Biol. Evol., 23(2):254 267, 2006 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 10 / 35

DAG representation of phylogenetic networks Definition A phylogenetic network is a directed acyclic graph whose terminal nodes are labeled by taxa names and whose internal nodes are either tree nodes (if they have only one parent) or hybrid nodes (if they have two or more parents) Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 11 / 35

DAG representation of phylogenetic networks Example 44 polymorphic sites in a sample of the single gene encoding for alcohol dehydrogenase in 11 species from 5 natural populations of D. melanogaster CCGCAATAATGGCGCTACTCTCACAATAACCCACTAGACAGCCT CCCCAATATGGGCGCTACTTTCACAATAACCCACTAGACAGCCT CCGCAATATGGGCGCTACCCCCCGGAATCTCCACTAAACAGTCA CCGCAATATGGGCGCTGTCCCCCGGAATCTCCACTAAACTACCT CCGAGATAAGTCCGAGGTCCCCCGGAATCTCCACTAGCCAGCCT CCCCAATATGGGCGCGACCCCCCGGAATCTCTATTCACCAGCTT CCCCAATATGGGCGCGACCCCCCGGAATCTGTCTCCGCCAGCCT TGCAGATAAGTCGGCGACCCCCCGGAATCTGTCTCCGCGAGCCT TGCAGATAAGTCGGCGACCCCCCGGAATCTGTCTCCGCGAGCCT TGCAGATAAGTCGGCGACCCCCCGGAATCTGTCTCCGCGAGCCT TGCAGGGGAGGGCTCGACCCCACGGGATCTGTCTCCGCCAGCCT Wa - S Fl -1 S Af - S Fr - S Fl -2 S Ja - S Fl - F Fr - F Wa - F Af - F Ja - F M. Kreitman. Nucleotide polymorphism at the alcohol dehydrogenase locus of Drosophila melanogaster. Nature, 304(5925):412 417, 1983 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 12 / 35

DAG representation of phylogenetic networks Example Ja-F Af-F Fr-F Wa-F Fl-2S Wa-S Af-S Fr-S Fl-1S Ja-S Fl-F Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 13 / 35

DAG representation of phylogenetic networks Example Fl-F Ja-F Fr-F Wa-F Af-F Ja-S Fl-2S Fr-S Wa-S Af-S Fl-1S Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 14 / 35

DAG representation of phylogenetic networks Definition A phylogenetic network is called tree-sibling if every hybrid node has at least one sibling that is a tree node Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 15 / 35

DAG representation of phylogenetic networks Definition A phylogenetic network is called tree-sibling if every hybrid node has at least one sibling that is a tree node Remark The biological meaning of the tree-sibling condition is that in each of the recombination or hybridization processes, at least one of the species involved in them also has some descendant through mutation Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 15 / 35

DAG representation of phylogenetic networks Definition A phylogenetic network is called tree-child if every internal node has at least one child that is a tree node Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 16 / 35

DAG representation of phylogenetic networks Definition A phylogenetic network is called tree-child if every internal node has at least one child that is a tree node Remark The biological meaning of the tree-child condition is that every non-extant species has some descendant through mutation Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 16 / 35

DAG representation of phylogenetic networks Definition A phylogenetic network is time-consistent if there is a temporal representation of the network, that is, an assignment of times to the nodes of the network that strictly increases on tree edges (those edges whose head is a tree node) and remains the same on hybrid edges (whose head is a hybrid node) Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 17 / 35

DAG representation of phylogenetic networks Definition A phylogenetic network is time-consistent if there is a temporal representation of the network, that is, an assignment of times to the nodes of the network that strictly increases on tree edges (those edges whose head is a tree node) and remains the same on hybrid edges (whose head is a hybrid node) Remark The biological meaning of a temporal assignment is the time when certain species exist or when certain hybridization processes occur, because for these processes to take place, the species involved must coexist in time Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 17 / 35

DAG representation of phylogenetic networks Example (Time consistency) 1 2 3 4 5 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 18 / 35

DAG representation of phylogenetic networks phylogenetic networks tree-sibling tree-child not timeconsistent galled-trees phylogenetic trees Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 19 / 35

DAG representation of phylogenetic networks Number of phylogenetic trees, galled-trees, tree-child, and tree-sibling networks 173 638 1023 27 152331 983 1 22 48252 0 11 16 1616 2000 2000 2000 2000 ρ = 0 ρ = 1 ρ = 2 ρ = 4 ρ = 8 M. Arenas, G. Valiente, and D. Posada. Characterization of phylogenetic reticulate networks based on the coalescent with recombination. Molecular Biology and Evolution, 25(12):2517 2520, 2008 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 20 / 35

Path multiplicity representation Definition The µ-representation of a tree-child phylogenetic network is the multiset of µ-vectors µ(u) of path-to-leaf multiplicities for each node u Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 21 / 35

Path multiplicity representation Definition The µ-representation of a tree-child phylogenetic network is the multiset of µ-vectors µ(u) of path-to-leaf multiplicities for each node u Example 1231 1110 0121 0111 1000 0100 0001 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 21 / 35

Path multiplicity representation Definition The µ-representation of a tree-child phylogenetic network is the multiset of µ-vectors µ(u) of path-to-leaf multiplicities for each node u Example 1231 1110 0121 0111 1000 0100 0001 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 21 / 35

Path multiplicity representation Definition The µ-representation of a tree-child phylogenetic network is the multiset of µ-vectors µ(u) of path-to-leaf multiplicities for each node u Example 1231 1110 0121 0111 1000 0100 0001 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 21 / 35

Path multiplicity representation Definition The µ-representation of a tree-child phylogenetic network is the multiset of µ-vectors µ(u) of path-to-leaf multiplicities for each node u Example 1231 1110 0121 0111 1000 0100 0001 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 21 / 35

Path multiplicity representation Definition The µ-representation of a tree-child phylogenetic network is the multiset of µ-vectors µ(u) of path-to-leaf multiplicities for each node u Example 1231 1110 0121 0111 1000 0100 0001 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 21 / 35

Path multiplicity representation Lemma The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 22 / 35

Path multiplicity representation Lemma The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time Example

Path multiplicity representation Lemma The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time Example 1000 0100 0001

Path multiplicity representation Lemma The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time Example 1000 0100 0001

Path multiplicity representation Lemma The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time Example 1000 0100 0001

Path multiplicity representation Lemma The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time Example 1000 0100 0001

Path multiplicity representation Lemma The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time Example 0111 1000 0100 0001

Path multiplicity representation Lemma The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time Example 1110 0111 1000 0100 0001

Path multiplicity representation Lemma The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time Example 1110 0121 0111 1000 0100 0001

Path multiplicity representation Lemma The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time Example 1231 1110 0121 0111 1000 0100 0001 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 22 / 35

Path multiplicity representation Theorem Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 23 / 35

Path multiplicity representation Theorem Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation Example 1231 1231 1110 1000 1110 0121 0121 0111 0111 0100 0001 1000 0100 0001

Path multiplicity representation Theorem Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation Example 1231 1231 1110 1000 1110 0121 0121 0111 0111 0100 0001 1000 0100 0001

Path multiplicity representation Theorem Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation Example 1231 1231 1110 1000 1110 0121 0121 0111 0111 0100 0001 1000 0100 0001

Path multiplicity representation Theorem Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation Example 1231 1231 1110 1000 1110 0121 0121 0111 0111 0100 0001 1000 0100 0001

Path multiplicity representation Theorem Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation Example 1231 1231 1110 1000 1110 0121 0121 0111 0111 0100 0001 1000 0100 0001

Path multiplicity representation Theorem Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation Example 1231 1231 1110 1000 1110 0121 0121 0111 0111 0100 0001 1000 0100 0001

Path multiplicity representation Theorem Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation Example 1231 1231 1110 1000 1110 0121 0121 0111 0111 0100 0001 1000 0100 0001

Path multiplicity representation Theorem Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation Example 1231 1231 1110 1000 1110 0121 0121 0111 0111 0100 0001 1000 0100 0001 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 23 / 35

Path multiplicity representation Definition The µ-distance between two two tree-child phylogenetic networks N and N is d µ (N,N ) = µ(n) µ(n ) Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 24 / 35

Path multiplicity representation Definition The µ-distance between two two tree-child phylogenetic networks N and N is d µ (N,N ) = µ(n) µ(n ) Example (d µ (N,N ) = 6) N 1231 1110 0121 N 0111 1121 1110 1000 0100 0001 0011 1000 0100 0001 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 24 / 35

Path multiplicity representation Theorem The µ-distance induces a metric on the space of tree-child phylogenetic networks that generalizes the bipartition distance for phylogenetic trees G. Cardona, F. Rosselló, and G. Valiente. Comparison of tree-child phylogenetic networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2009 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 25 / 35

Path multiplicity representation Theorem The µ-distance induces a metric on the space of tree-child phylogenetic networks that generalizes the bipartition distance for phylogenetic trees G. Cardona, F. Rosselló, and G. Valiente. Comparison of tree-child phylogenetic networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2009 Theorem The µ-distance induces a metric on the space of semi-binary tree-sibling time consistent phylogenetic networks that generalizes the bipartition distance for phylogenetic trees G. Cardona, M. Llabrés, F. Rosselló, and G. Valiente. A distance metric for a class of tree-sibling phylogenetic networks. Bioinformatics, 24(13):1481 1488, 2008 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 25 / 35

DAG alignment Definition For every v V and v V of two phylogenetic networks N = (V,E) and N = (V,E ), let m(v,v ) = µ(v) µ(v ) { χ(v,v 0 if v,v ) = are both tree nodes or both hybrid 1 otherwise The weight of the pair (v,v ) is w(v,v ) = m(v,v ) + χ(v,v ) 2n The total weight of a matching M : V V is w(m) = w(v,m(v)) v V Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 26 / 35

DAG alignment Definition An optimal alignment between two phylogenetic networks is a matching with the smallest total weight among all possible matchings Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 27 / 35

DAG alignment Definition An optimal alignment between two phylogenetic networks is a matching with the smallest total weight among all possible matchings Lemma A matching between two phylogenetic networks N = (V,E) and N = (V,E ) is an optimal alignment if and only if it minimizes the sum m(v,m(v)) v V and, among those matchings minimizing this sum, it maximizes the number of nodes that are sent to nodes of the same type G. Cardona, F. Rosselló, and G. Valiente. Comparison of tree-child phylogenetic networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2009 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 27 / 35

DAG alignment Example (Optimal alignment of two phylogenetic networks) r r b v a x c A d B e u X y Y z 1 2 3 4 5 r (1,1,2,3,1) b (0,0,1,2,1) a (1,1,1,1,0) A (0,0,1,1,0) c (1,1,0,0,0) d (0,0,1,1,0) e (0,0,0,1,1) B (0,0,0,1,0) 1 2 3 4 5 r (1,2,1,2,1) u (1,1,0,0,0) v (0,1,1,2,1) x (0,1,1,1,0) y (0,0,1,1,0) z (0,0,0,1,1) X (0,1,0,0,0) Y (0,0,0,1,0) Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 28 / 35

DAG alignment Example (Optimal alignment of two phylogenetic networks) r u v x y z X Y r 3 6 3 5 6 6 7.1 7.1 b 3 6 1 3 2 2 5.1 3.1 a 3 2 3 1 2 4 3.1 3.1 A 5.1 4.1 3.1 1.1 0.1 2.1 3 1 c 5 0 5 3 4 4 1.1 3.1 d 5 4 3 1 0 2 3.1 1.1 e 5 4 3 3 2 0 3.1 1.1 B 6.1 3.1 4.1 2.1 1.1 1.1 2 0 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 29 / 35

DAG alignment Example (Optimal alignment of two phylogenetic networks) r 3 r b 1 v a 1 x c 0 u A 3 X d B e 0 0 0 y Y z 1 2 3 4 5 1 2 3 4 5 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 30 / 35

Tree alignment as DAG alignment Remark If we restrict this alignment method to phylogenetic trees, the weight of a pair of nodes (v 1,v 2 ) is simply C L (v 1 ) C L (v 2 ). This can be seen as an unnormalized version of the score used in TreeJuxtaposer T. Munzner, F. Guimbretière, S. Tasiran, L. Zhang, and Y. Zhou. TreeJuxtaposer: Scalable tree comparison using focus+context with guaranteed visibility. ACM T. Graphics, 22(3):453 462, 2003 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 31 / 35

Tree alignment as DAG alignment Example (Optimal alignment of two phylogenetic trees) 0 2 4 4 1 2 3 4 5 1 2 3 4 5 00011 00111 01111 11000 4 5 4 11100 5 4 3 11110 4 3 2 00011 00111 01111 11000 0/4 0/5 1/5 11100 0/5 1/5 2/5 11110 1/5 2/5 3/5 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 32 / 35

Tool support BioPerl module Bio::PhyloNetwork The Perl module Bio::PhyloNetwork implements all the data structures needed to work with phylogenetic networks, as well as algorithms for reconstructing a network from its enewick string reconstructing a network from its µ-representation exploding a network into the set of its induced subtrees computing the µ-representation of a network computing the µ-distance between two networks computing an optimal alignment between two networks computing the set of tripartitions of a network computing the tripartition error between two networks testing if a network is time consistent computing a temporal representation of a time-consistent network Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 33 / 35

Tool support Web interface to the BioPerl module The web interface at http://dmi.uib.es/ gcardona/bioinfo/alignment.php allows the user to input one or two phylogenetic networks, given by their enewick strings. A Perl script processes these strings and uses the Bio::PhyloNetwork package to compute all available data for them, including a plot of the networks that can be downloaded in PS format; these plots are generated through the application GraphViz and its companion Perl package Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 34 / 35

Tool support Web interface to the BioPerl module The web interface at http://dmi.uib.es/ gcardona/bioinfo/alignment.php allows the user to input one or two phylogenetic networks, given by their enewick strings. A Perl script processes these strings and uses the Bio::PhyloNetwork package to compute all available data for them, including a plot of the networks that can be downloaded in PS format; these plots are generated through the application GraphViz and its companion Perl package Given two networks on the same set of leaves, their µ-distance is also computed, as well as an optimal alignment between them. If their sets of leaves are not the same, their topological restriction on the set of common leaves is first computed followed by the µ-distance and an optimal alignment Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 34 / 35

Tool support Web interface to the BioPerl module The web interface at http://dmi.uib.es/ gcardona/bioinfo/alignment.php allows the user to input one or two phylogenetic networks, given by their enewick strings. A Perl script processes these strings and uses the Bio::PhyloNetwork package to compute all available data for them, including a plot of the networks that can be downloaded in PS format; these plots are generated through the application GraphViz and its companion Perl package Given two networks on the same set of leaves, their µ-distance is also computed, as well as an optimal alignment between them. If their sets of leaves are not the same, their topological restriction on the set of common leaves is first computed followed by the µ-distance and an optimal alignment A Java applet displays the networks side by side, and whenever a node is selected, the corresponding node in the other network (with respect to the optimal alignment) is highlighted, if it exists. This is also extended to edges. Similarities and differences between the networks are thus evident at a glance Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 34 / 35

Conclusion String edit distance and alignment of strings coincide, but alignment of trees differs from tree edit distance and alignment of graphs differs from graph edit distance Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 35 / 35

Conclusion String edit distance and alignment of strings coincide, but alignment of trees differs from tree edit distance and alignment of graphs differs from graph edit distance The alignment of trees and directed acyclic graphs based on the path multiplicity representation can be computed in polynomial time Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 35 / 35

Conclusion String edit distance and alignment of strings coincide, but alignment of trees differs from tree edit distance and alignment of graphs differs from graph edit distance The alignment of trees and directed acyclic graphs based on the path multiplicity representation can be computed in polynomial time The alignment method can be applied to any directed acyclic graphs with the same set of terminal node labels Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 35 / 35

Conclusion String edit distance and alignment of strings coincide, but alignment of trees differs from tree edit distance and alignment of graphs differs from graph edit distance The alignment of trees and directed acyclic graphs based on the path multiplicity representation can be computed in polynomial time The alignment method can be applied to any directed acyclic graphs with the same set of terminal node labels G. Valiente. Algorithms on Trees and Graphs. Springer, 2002 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 35 / 35

Conclusion String edit distance and alignment of strings coincide, but alignment of trees differs from tree edit distance and alignment of graphs differs from graph edit distance The alignment of trees and directed acyclic graphs based on the path multiplicity representation can be computed in polynomial time The alignment method can be applied to any directed acyclic graphs with the same set of terminal node labels G. Valiente. Algorithms on Trees and Graphs. Springer, 2002 G. Valiente. Combinatorial Pattern Matching Algorithms in Computational Biology using Perl and R. Taylor & Francis/CRC Press, 2009 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 35 / 35