Alignment of Trees and Directed Acyclic Graphs Gabriel Valiente Algorithms, Bioinformatics, Complexity and Formal Methods Research Group Technical University of Catalonia Computational Biology and Bioinformatics Research Group Research Institute of Health Science, University of the Balearic Islands Centre for Genomic Regulation Barcelona Biomedical Research Park Ben-Gurion University of the Negev, Israel, April 27, 2009 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 1 / 35
Abstract It is well known that the string edit distance and the alignment of strings coincide, while the alignment of trees differs from the tree edit distance. In this talk, we recall various constraints on directed acyclic graphs that allow for a unique (up to isomorphism) representation, called the path multiplicity representation, and present a new method for the alignment of trees and directed acyclic graphs that exploits the path multiplicity representation to produce a meaningful optimal alignment in polynomial time. Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 2 / 35
Plan of the Talk String edit distance and alignment Tree edit distance and alignment DAG representation of phylogenetic networks Path multiplicity representation DAG alignment Tree alignment as DAG alignment Tool support BioPerl module Web interface to the BioPerl module Conclusion Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 3 / 35
String edit distance and alignment Definition The edit distance between two strings is the smallest number of insertions, deletions, and substitutions needed to transform one string into the other Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 4 / 35
String edit distance and alignment Definition The edit distance between two strings is the smallest number of insertions, deletions, and substitutions needed to transform one string into the other Definition An alignment of two strings is an arrangement of the two strings as rows of a matrix, with additional gaps (dashes) between the elements to make some or all of the remaining (aligned) columns contain identical elements but with no column gapped in both strings Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 4 / 35
String edit distance and alignment Definition The edit distance between two strings is the smallest number of insertions, deletions, and substitutions needed to transform one string into the other Definition An alignment of two strings is an arrangement of the two strings as rows of a matrix, with additional gaps (dashes) between the elements to make some or all of the remaining (aligned) columns contain identical elements but with no column gapped in both strings Example (Optimal alignment) - GCTTCCGGCTCGTATAATGTGTGG * * * TGCTTCTGACT ---ATAATA -G--- Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 4 / 35
Tree edit distance and alignment Definition The edit distance between two trees is the smallest number of insertions, deletions, and substitutions needed to transform one tree into the other Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 5 / 35
Tree edit distance and alignment Definition The edit distance between two trees is the smallest number of insertions, deletions, and substitutions needed to transform one tree into the other Example (Edit distance) a a a e d b f b c b c d c d Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 5 / 35
Tree edit distance and alignment Definition An alignment of two trees is an arrangement of the trees with space labeled nodes inserted such that their structures coincide Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 6 / 35
Tree edit distance and alignment Definition An alignment of two trees is an arrangement of the trees with space labeled nodes inserted such that their structures coincide Example (Optimal alignment) a a a a e d e f b f b c b b c c d d c d Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 6 / 35
Tree edit distance and alignment Remark An alignment of trees is a restricted form of tree edit distance in which all the insertions precede all the deletions Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 7 / 35
Tree edit distance and alignment Remark An alignment of trees is a restricted form of tree edit distance in which all the insertions precede all the deletions Remark With insertion cost 1, deletion cost 1, identical substitution cost 0, and non-identical substitution cost 2, an optimal tree edit yields a largest common subtree and an optimal alignment yields a smallest common supertree T. Jiang, L. Wang, and K. Zhang. Alignment of trees an alternative to tree edit. Theoretical Computer Science, 143(1):137 148, 1995 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 7 / 35
Tree edit distance and alignment H. Bunke, X. Jiang, and A. Kandel. On the minimum common supergraph of two graphs. Computing, 65(1):13 25, 2000 M.-L. Fernández and G. Valiente. A graph distance measure combining maximum common subgraph and minimum common supergraph. Pattern Recognition Letters, 22(6 7):753 758, 2001 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 8 / 35
Tree edit distance and alignment H. Bunke, X. Jiang, and A. Kandel. On the minimum common supergraph of two graphs. Computing, 65(1):13 25, 2000 M.-L. Fernández and G. Valiente. A graph distance measure combining maximum common subgraph and minimum common supergraph. Pattern Recognition Letters, 22(6 7):753 758, 2001 Theorem The problems of finding a largest common subtree and a smallest common supertree of two trees, in each case together with a pair of witness (minor, topological, homeomorphic, or isomorphic) embeddings, are reducible to each other in time linear in the size of the trees F. Rosselló and G. Valiente. An algebraic view of the relation between largest common subtrees and smallest common supertrees. Theoretical Computer Science, 362(1 3):33 53, 2006 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 8 / 35
Tree edit distance and alignment Example A. Lozano, R. Pinter, O. Rokhlenko, G. Valiente, and M. Ziv-Ukelson. Seeded tree alignment and planar tanglegram layout. In Proc. 7th Workshop on Algorithms in Bioinformatics, volume 4645 of Lecture Notes in Bioinformatics, pages 98 110. Springer, 2007 A. Lozano, R. Pinter, O. Rokhlenko, G. Valiente, and M. Ziv-Ukelson. Seeded tree alignment. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 5(4):503 513, 2008 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 9 / 35
DAG representation of phylogenetic networks D. H. Huson and D. Bryant. Application of phylogenetic networks in evolutionary studies. Mol. Biol. Evol., 23(2):254 267, 2006 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 10 / 35
DAG representation of phylogenetic networks Definition A phylogenetic network is a directed acyclic graph whose terminal nodes are labeled by taxa names and whose internal nodes are either tree nodes (if they have only one parent) or hybrid nodes (if they have two or more parents) Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 11 / 35
DAG representation of phylogenetic networks Example 44 polymorphic sites in a sample of the single gene encoding for alcohol dehydrogenase in 11 species from 5 natural populations of D. melanogaster CCGCAATAATGGCGCTACTCTCACAATAACCCACTAGACAGCCT CCCCAATATGGGCGCTACTTTCACAATAACCCACTAGACAGCCT CCGCAATATGGGCGCTACCCCCCGGAATCTCCACTAAACAGTCA CCGCAATATGGGCGCTGTCCCCCGGAATCTCCACTAAACTACCT CCGAGATAAGTCCGAGGTCCCCCGGAATCTCCACTAGCCAGCCT CCCCAATATGGGCGCGACCCCCCGGAATCTCTATTCACCAGCTT CCCCAATATGGGCGCGACCCCCCGGAATCTGTCTCCGCCAGCCT TGCAGATAAGTCGGCGACCCCCCGGAATCTGTCTCCGCGAGCCT TGCAGATAAGTCGGCGACCCCCCGGAATCTGTCTCCGCGAGCCT TGCAGATAAGTCGGCGACCCCCCGGAATCTGTCTCCGCGAGCCT TGCAGGGGAGGGCTCGACCCCACGGGATCTGTCTCCGCCAGCCT Wa - S Fl -1 S Af - S Fr - S Fl -2 S Ja - S Fl - F Fr - F Wa - F Af - F Ja - F M. Kreitman. Nucleotide polymorphism at the alcohol dehydrogenase locus of Drosophila melanogaster. Nature, 304(5925):412 417, 1983 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 12 / 35
DAG representation of phylogenetic networks Example Ja-F Af-F Fr-F Wa-F Fl-2S Wa-S Af-S Fr-S Fl-1S Ja-S Fl-F Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 13 / 35
DAG representation of phylogenetic networks Example Fl-F Ja-F Fr-F Wa-F Af-F Ja-S Fl-2S Fr-S Wa-S Af-S Fl-1S Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 14 / 35
DAG representation of phylogenetic networks Definition A phylogenetic network is called tree-sibling if every hybrid node has at least one sibling that is a tree node Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 15 / 35
DAG representation of phylogenetic networks Definition A phylogenetic network is called tree-sibling if every hybrid node has at least one sibling that is a tree node Remark The biological meaning of the tree-sibling condition is that in each of the recombination or hybridization processes, at least one of the species involved in them also has some descendant through mutation Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 15 / 35
DAG representation of phylogenetic networks Definition A phylogenetic network is called tree-child if every internal node has at least one child that is a tree node Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 16 / 35
DAG representation of phylogenetic networks Definition A phylogenetic network is called tree-child if every internal node has at least one child that is a tree node Remark The biological meaning of the tree-child condition is that every non-extant species has some descendant through mutation Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 16 / 35
DAG representation of phylogenetic networks Definition A phylogenetic network is time-consistent if there is a temporal representation of the network, that is, an assignment of times to the nodes of the network that strictly increases on tree edges (those edges whose head is a tree node) and remains the same on hybrid edges (whose head is a hybrid node) Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 17 / 35
DAG representation of phylogenetic networks Definition A phylogenetic network is time-consistent if there is a temporal representation of the network, that is, an assignment of times to the nodes of the network that strictly increases on tree edges (those edges whose head is a tree node) and remains the same on hybrid edges (whose head is a hybrid node) Remark The biological meaning of a temporal assignment is the time when certain species exist or when certain hybridization processes occur, because for these processes to take place, the species involved must coexist in time Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 17 / 35
DAG representation of phylogenetic networks Example (Time consistency) 1 2 3 4 5 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 18 / 35
DAG representation of phylogenetic networks phylogenetic networks tree-sibling tree-child not timeconsistent galled-trees phylogenetic trees Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 19 / 35
DAG representation of phylogenetic networks Number of phylogenetic trees, galled-trees, tree-child, and tree-sibling networks 173 638 1023 27 152331 983 1 22 48252 0 11 16 1616 2000 2000 2000 2000 ρ = 0 ρ = 1 ρ = 2 ρ = 4 ρ = 8 M. Arenas, G. Valiente, and D. Posada. Characterization of phylogenetic reticulate networks based on the coalescent with recombination. Molecular Biology and Evolution, 25(12):2517 2520, 2008 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 20 / 35
Path multiplicity representation Definition The µ-representation of a tree-child phylogenetic network is the multiset of µ-vectors µ(u) of path-to-leaf multiplicities for each node u Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 21 / 35
Path multiplicity representation Definition The µ-representation of a tree-child phylogenetic network is the multiset of µ-vectors µ(u) of path-to-leaf multiplicities for each node u Example 1231 1110 0121 0111 1000 0100 0001 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 21 / 35
Path multiplicity representation Definition The µ-representation of a tree-child phylogenetic network is the multiset of µ-vectors µ(u) of path-to-leaf multiplicities for each node u Example 1231 1110 0121 0111 1000 0100 0001 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 21 / 35
Path multiplicity representation Definition The µ-representation of a tree-child phylogenetic network is the multiset of µ-vectors µ(u) of path-to-leaf multiplicities for each node u Example 1231 1110 0121 0111 1000 0100 0001 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 21 / 35
Path multiplicity representation Definition The µ-representation of a tree-child phylogenetic network is the multiset of µ-vectors µ(u) of path-to-leaf multiplicities for each node u Example 1231 1110 0121 0111 1000 0100 0001 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 21 / 35
Path multiplicity representation Definition The µ-representation of a tree-child phylogenetic network is the multiset of µ-vectors µ(u) of path-to-leaf multiplicities for each node u Example 1231 1110 0121 0111 1000 0100 0001 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 21 / 35
Path multiplicity representation Lemma The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 22 / 35
Path multiplicity representation Lemma The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time Example
Path multiplicity representation Lemma The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time Example 1000 0100 0001
Path multiplicity representation Lemma The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time Example 1000 0100 0001
Path multiplicity representation Lemma The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time Example 1000 0100 0001
Path multiplicity representation Lemma The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time Example 1000 0100 0001
Path multiplicity representation Lemma The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time Example 0111 1000 0100 0001
Path multiplicity representation Lemma The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time Example 1110 0111 1000 0100 0001
Path multiplicity representation Lemma The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time Example 1110 0121 0111 1000 0100 0001
Path multiplicity representation Lemma The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time Example 1231 1110 0121 0111 1000 0100 0001 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 22 / 35
Path multiplicity representation Theorem Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 23 / 35
Path multiplicity representation Theorem Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation Example 1231 1231 1110 1000 1110 0121 0121 0111 0111 0100 0001 1000 0100 0001
Path multiplicity representation Theorem Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation Example 1231 1231 1110 1000 1110 0121 0121 0111 0111 0100 0001 1000 0100 0001
Path multiplicity representation Theorem Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation Example 1231 1231 1110 1000 1110 0121 0121 0111 0111 0100 0001 1000 0100 0001
Path multiplicity representation Theorem Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation Example 1231 1231 1110 1000 1110 0121 0121 0111 0111 0100 0001 1000 0100 0001
Path multiplicity representation Theorem Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation Example 1231 1231 1110 1000 1110 0121 0121 0111 0111 0100 0001 1000 0100 0001
Path multiplicity representation Theorem Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation Example 1231 1231 1110 1000 1110 0121 0121 0111 0111 0100 0001 1000 0100 0001
Path multiplicity representation Theorem Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation Example 1231 1231 1110 1000 1110 0121 0121 0111 0111 0100 0001 1000 0100 0001
Path multiplicity representation Theorem Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation Example 1231 1231 1110 1000 1110 0121 0121 0111 0111 0100 0001 1000 0100 0001 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 23 / 35
Path multiplicity representation Definition The µ-distance between two two tree-child phylogenetic networks N and N is d µ (N,N ) = µ(n) µ(n ) Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 24 / 35
Path multiplicity representation Definition The µ-distance between two two tree-child phylogenetic networks N and N is d µ (N,N ) = µ(n) µ(n ) Example (d µ (N,N ) = 6) N 1231 1110 0121 N 0111 1121 1110 1000 0100 0001 0011 1000 0100 0001 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 24 / 35
Path multiplicity representation Theorem The µ-distance induces a metric on the space of tree-child phylogenetic networks that generalizes the bipartition distance for phylogenetic trees G. Cardona, F. Rosselló, and G. Valiente. Comparison of tree-child phylogenetic networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2009 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 25 / 35
Path multiplicity representation Theorem The µ-distance induces a metric on the space of tree-child phylogenetic networks that generalizes the bipartition distance for phylogenetic trees G. Cardona, F. Rosselló, and G. Valiente. Comparison of tree-child phylogenetic networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2009 Theorem The µ-distance induces a metric on the space of semi-binary tree-sibling time consistent phylogenetic networks that generalizes the bipartition distance for phylogenetic trees G. Cardona, M. Llabrés, F. Rosselló, and G. Valiente. A distance metric for a class of tree-sibling phylogenetic networks. Bioinformatics, 24(13):1481 1488, 2008 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 25 / 35
DAG alignment Definition For every v V and v V of two phylogenetic networks N = (V,E) and N = (V,E ), let m(v,v ) = µ(v) µ(v ) { χ(v,v 0 if v,v ) = are both tree nodes or both hybrid 1 otherwise The weight of the pair (v,v ) is w(v,v ) = m(v,v ) + χ(v,v ) 2n The total weight of a matching M : V V is w(m) = w(v,m(v)) v V Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 26 / 35
DAG alignment Definition An optimal alignment between two phylogenetic networks is a matching with the smallest total weight among all possible matchings Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 27 / 35
DAG alignment Definition An optimal alignment between two phylogenetic networks is a matching with the smallest total weight among all possible matchings Lemma A matching between two phylogenetic networks N = (V,E) and N = (V,E ) is an optimal alignment if and only if it minimizes the sum m(v,m(v)) v V and, among those matchings minimizing this sum, it maximizes the number of nodes that are sent to nodes of the same type G. Cardona, F. Rosselló, and G. Valiente. Comparison of tree-child phylogenetic networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2009 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 27 / 35
DAG alignment Example (Optimal alignment of two phylogenetic networks) r r b v a x c A d B e u X y Y z 1 2 3 4 5 r (1,1,2,3,1) b (0,0,1,2,1) a (1,1,1,1,0) A (0,0,1,1,0) c (1,1,0,0,0) d (0,0,1,1,0) e (0,0,0,1,1) B (0,0,0,1,0) 1 2 3 4 5 r (1,2,1,2,1) u (1,1,0,0,0) v (0,1,1,2,1) x (0,1,1,1,0) y (0,0,1,1,0) z (0,0,0,1,1) X (0,1,0,0,0) Y (0,0,0,1,0) Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 28 / 35
DAG alignment Example (Optimal alignment of two phylogenetic networks) r u v x y z X Y r 3 6 3 5 6 6 7.1 7.1 b 3 6 1 3 2 2 5.1 3.1 a 3 2 3 1 2 4 3.1 3.1 A 5.1 4.1 3.1 1.1 0.1 2.1 3 1 c 5 0 5 3 4 4 1.1 3.1 d 5 4 3 1 0 2 3.1 1.1 e 5 4 3 3 2 0 3.1 1.1 B 6.1 3.1 4.1 2.1 1.1 1.1 2 0 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 29 / 35
DAG alignment Example (Optimal alignment of two phylogenetic networks) r 3 r b 1 v a 1 x c 0 u A 3 X d B e 0 0 0 y Y z 1 2 3 4 5 1 2 3 4 5 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 30 / 35
Tree alignment as DAG alignment Remark If we restrict this alignment method to phylogenetic trees, the weight of a pair of nodes (v 1,v 2 ) is simply C L (v 1 ) C L (v 2 ). This can be seen as an unnormalized version of the score used in TreeJuxtaposer T. Munzner, F. Guimbretière, S. Tasiran, L. Zhang, and Y. Zhou. TreeJuxtaposer: Scalable tree comparison using focus+context with guaranteed visibility. ACM T. Graphics, 22(3):453 462, 2003 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 31 / 35
Tree alignment as DAG alignment Example (Optimal alignment of two phylogenetic trees) 0 2 4 4 1 2 3 4 5 1 2 3 4 5 00011 00111 01111 11000 4 5 4 11100 5 4 3 11110 4 3 2 00011 00111 01111 11000 0/4 0/5 1/5 11100 0/5 1/5 2/5 11110 1/5 2/5 3/5 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 32 / 35
Tool support BioPerl module Bio::PhyloNetwork The Perl module Bio::PhyloNetwork implements all the data structures needed to work with phylogenetic networks, as well as algorithms for reconstructing a network from its enewick string reconstructing a network from its µ-representation exploding a network into the set of its induced subtrees computing the µ-representation of a network computing the µ-distance between two networks computing an optimal alignment between two networks computing the set of tripartitions of a network computing the tripartition error between two networks testing if a network is time consistent computing a temporal representation of a time-consistent network Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 33 / 35
Tool support Web interface to the BioPerl module The web interface at http://dmi.uib.es/ gcardona/bioinfo/alignment.php allows the user to input one or two phylogenetic networks, given by their enewick strings. A Perl script processes these strings and uses the Bio::PhyloNetwork package to compute all available data for them, including a plot of the networks that can be downloaded in PS format; these plots are generated through the application GraphViz and its companion Perl package Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 34 / 35
Tool support Web interface to the BioPerl module The web interface at http://dmi.uib.es/ gcardona/bioinfo/alignment.php allows the user to input one or two phylogenetic networks, given by their enewick strings. A Perl script processes these strings and uses the Bio::PhyloNetwork package to compute all available data for them, including a plot of the networks that can be downloaded in PS format; these plots are generated through the application GraphViz and its companion Perl package Given two networks on the same set of leaves, their µ-distance is also computed, as well as an optimal alignment between them. If their sets of leaves are not the same, their topological restriction on the set of common leaves is first computed followed by the µ-distance and an optimal alignment Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 34 / 35
Tool support Web interface to the BioPerl module The web interface at http://dmi.uib.es/ gcardona/bioinfo/alignment.php allows the user to input one or two phylogenetic networks, given by their enewick strings. A Perl script processes these strings and uses the Bio::PhyloNetwork package to compute all available data for them, including a plot of the networks that can be downloaded in PS format; these plots are generated through the application GraphViz and its companion Perl package Given two networks on the same set of leaves, their µ-distance is also computed, as well as an optimal alignment between them. If their sets of leaves are not the same, their topological restriction on the set of common leaves is first computed followed by the µ-distance and an optimal alignment A Java applet displays the networks side by side, and whenever a node is selected, the corresponding node in the other network (with respect to the optimal alignment) is highlighted, if it exists. This is also extended to edges. Similarities and differences between the networks are thus evident at a glance Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 34 / 35
Conclusion String edit distance and alignment of strings coincide, but alignment of trees differs from tree edit distance and alignment of graphs differs from graph edit distance Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 35 / 35
Conclusion String edit distance and alignment of strings coincide, but alignment of trees differs from tree edit distance and alignment of graphs differs from graph edit distance The alignment of trees and directed acyclic graphs based on the path multiplicity representation can be computed in polynomial time Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 35 / 35
Conclusion String edit distance and alignment of strings coincide, but alignment of trees differs from tree edit distance and alignment of graphs differs from graph edit distance The alignment of trees and directed acyclic graphs based on the path multiplicity representation can be computed in polynomial time The alignment method can be applied to any directed acyclic graphs with the same set of terminal node labels Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 35 / 35
Conclusion String edit distance and alignment of strings coincide, but alignment of trees differs from tree edit distance and alignment of graphs differs from graph edit distance The alignment of trees and directed acyclic graphs based on the path multiplicity representation can be computed in polynomial time The alignment method can be applied to any directed acyclic graphs with the same set of terminal node labels G. Valiente. Algorithms on Trees and Graphs. Springer, 2002 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 35 / 35
Conclusion String edit distance and alignment of strings coincide, but alignment of trees differs from tree edit distance and alignment of graphs differs from graph edit distance The alignment of trees and directed acyclic graphs based on the path multiplicity representation can be computed in polynomial time The alignment method can be applied to any directed acyclic graphs with the same set of terminal node labels G. Valiente. Algorithms on Trees and Graphs. Springer, 2002 G. Valiente. Combinatorial Pattern Matching Algorithms in Computational Biology using Perl and R. Taylor & Francis/CRC Press, 2009 Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 35 / 35