Dynamic Programming Course: A structure based flexible search method for motifs in RNA. By: Veksler, I., Ziv-Ukelson, M., Barash, D.

Size: px
Start display at page:

Download "Dynamic Programming Course: A structure based flexible search method for motifs in RNA. By: Veksler, I., Ziv-Ukelson, M., Barash, D."

Transcription

1 Dynamic Programming Course: A structure based flexible search method for motifs in RNA By: Veksler, I., Ziv-Ukelson, M., Barash, D., Kedem, K

2 Outline Background Motivation RNA s structure representations Trees comparison Our Algorithm Results

3 The Central Dogma of Molecular Biology DNA transcription RNA translation Non Coding RNA - RNA molecule that is not translated into a protein - Have been found to have roles in a great variety of processes Protein DNA RNA Protein

4 Non Coding RNA Families They are not conserved in sequence, but they are conserved in structure. Have a role in regulating gene expression. trna, rrna, snorna, microrna, sirna, Riboswitch

5 Motivation The discovery of non coding RNA (ncrna) motifs and their role in regulating gene expression has recently attracted considerable attention. The goal is to discover these motifs in a sequence database. Most RNA motif search methods start from the primary sequence and only then take into account secondary structure considerations. Since different motifs vary in structure rigidity and in local sequence constraints, there is a need for algorithms and tools that can be fine-tuned according to the searched RNA motif.

6 Our Goal Discover ncrna motifs in a sequence database. Genome Sequence QUERY millions of nucleotides ACGCUGACGUAGUCAGUAGACGAC AGACAGAUACGUCACCGCAGAUAC GCAUAGUAGCAGUAGCAGAUGACG ACGCUGACGUAGUCAGUAGACGAC AGACAGAUACGUCACCGCAGAUAC GCAUAGUAGCAGUAGCAGAUGACG Are there any appearances of this structure in the genome?

7 The tool - STRMS (Structural RNA Motif Search): Input: Secondary structure of the query, including local sequence and structure constraints, and a target sequence database. Output: All occurrences of the query in the target, ranked by their similarity to the query [in html file]. The tool is flexible and takes into account a large number of sequence options. Our approach combines: pre-folding with MFOLD (Zuker, 2003) RNA pattern matching algorithm [O(mn)] based on subtree homeomorphism for ordered, rooted trees.

8 The method: Our method consists of two phases: preprocessing phase Preparing the target database for a variety of future queries: Partitioning the target text into given size consecutive overlapping windows with a predefined overlap. Folding each window (by mfold) Optimal and few sub-optimal structures. Converting each structure to its tree representation tree data base (TDB). search phase Tree alignment algorithm and filter according to our pre-defined constraints. The division into two phases enables the user to run various queries and refine the constraints of each query search without reinvesting time in folding the target database.

9 RNA s Secondary Structure Pseudoknot Single-Stranded Stem Interior Loop Bulge Loop Hairpin loop Junction (Multiloop) Image Wuchty

10 RNA s Secondary Structure (((((((..((((.)))).(((((.)))))..(((((.))))))))))))

11 RNA s Secondary Structure Graph

12 Ordered rooted tree Shapiro, 1988: The nodes correspond to elements of secondary structure (hairpin loop, bulge, internal loop or multi-loop). The edges correspond to base-paired (stem) regions. Zhang, 1998: The nodes of the tree represent either unpaired bases (leaves) or paired bases (internal nodes). Each node is labeled with a base or a pair of bases, respectively. Two kinds of edges, alternatively connecting either consecutive stem base-pairs or a leaf base with the last base-pair in the corresponding stem.

13 This leads to a precise screening of the target text by first selecting candidates whose structural tree representation is similar to that of the query, and then further filtering these candidates by applying sequence considerations. Our tree representation Compressed as in [Shapiro, 1988] + a node for every single strand component in multiloops. Includes additional information on nodes and on edges for the purpose of sequence analysis. It is more informative than Shapiro s tree representation and more compact then Zhang s.

14 Our tree representation origin of a single structure interior loop bulge loop dangling ends stem -edge single-strand components of the multiloop hairpin loop Single-strand components and stem-edges are annotated with length and sequence. A small circle node carries only topological information. Generating the tree structure from a ct-file (output from mfold). The tree construction is ordered by the 5 to 3 ordering of the molecule. Compressed structure which retains also the sequence information.

15 Our tree representation

16 Comparison of ordered rooted trees Trees are among the most common and wellstudied combinatorial structures in computer science. In particular, the problem of comparing trees occurs in several diverse areas such as: computational biology structured text databases image analysis automatic theorem proving compiler optimization.

17 Comparison of ordered rooted trees The following operations are defined on ordered trees: relabel - Change the label of a node v in T. delete - Delete a non-root node v in T with parent v, making the children of v become the children of v. The children are inserted in the place of v as a subsequence in the left-to-right order of the children of v. insert - The complement of delete. Insert a node v as a child of v in T making v the parent of a consecutive subsequence of the children of v.

18 Edit distance Assume that we are given a cost function defined on each edit operation. An edit script S between T1 and T2 is a sequence of edit operations turning T1 into T2. The cost of S is the sum of the costs of the operations in S. An optimal edit script between T1 and T2 is an edit script between T1 and T2 of minimum cost and this cost is the tree edit distance, denoted by δ(t1, T2). The tree edit distance problem is to compute the edit distance and a corresponding edit script.

19 Edit distance

20 Edit distance

21 Edit distance

22 Edit distance

23 Edit distance

24 Tree Inclusion T1 is included in T2 if there is a sequence of delete operations performed on T2 which makes T2 isomorphic to T1. The tree inclusion problem is to decide if T1 is included in T2. The tree inclusion problem is a special case of the tree edit distance problem: If insertions all have cost 0 and all other operations have cost 1, then T1 can be included in T2 if and only if δ(t1,t2) = 0.

25 Tree Inclusion

26 Tree Inclusion

27 Polynomial time algorithms exist for these problems. They are all based on the classic technique of dynamic programming and most of them are simple combinatorial algorithms.

28 Comparison of ordered rooted trees Ordered tree comparison is generally computed by tree edit distance, which allows various forms of deletions and insertions in both query and target. The search for small non-coding RNAs naturally yields a more specific tree search formulation since we do not allow deletions in the query. In our method we apply a weighted pattern matching algorithm for finding the best homeomorphic mapping between two rooted ordered trees. Specific constraints on the searched motif can be defined in the input to the search: structural constraints (lengths), allowing or forbidding element deletion in the target, sequence constraints (existence of sibling pseudoknots, local conserved sequence segments).

29 The Algorithm The subtree isomorphism problem [Matula, 1968,1978]: Given a pattern tree P and a text tree T, find a subtree of T which is isomorphic to P, i.e. find if some subtree of T that is identical in structure to P can be obtained by removing entire subtrees of T, or decide that there is no such tree. The subtree homeomorphism problem [Chung, 1987, Reyner, 1977, Pinter et al., 2004]: Is a variant of the former problem, where degree-2 nodes can be deleted from the text tree. Homeomorphism Example

30 The Algorithm - Motivation Point-mutation events could easily result in an extra bulge in an RNA structure. However, in some cases the functional homology to the original, non-mutated structure is still preserved. The suggested alignment should be flexible enough to allow the deletion of degree- 2 nodes from the target tree. bulge riboswitch and its functional homologue

31 The Algorithm - Motivation In some cases subtrees may be deleted from the target tree but not from the query tree, as in trna case. Both of the above application-specific properties are captured by subtree homeomorphism. Subtree homeomorphism on ordered rooted trees is more efficient (quadratic in input size) than tree edit distance (cubic in input size). Thus, by utilizing the biological properties that are typical to our application we obtain a fast variant of weighted subtree-homeomorphism on ordered rooted trees that captures our search criteria.

32 Subtree Homeomorphism Score Let T 1 and T 2 be two ordered, rooted, homeomorphic trees. A mapping µ : T 1 T 2 is a one-to-one map from the nodes of T 1 to the nodes of T 2 that preserves the ancestor relations of the nodes and their relative order. The subtree homeomorphism score of the mapping, denoted S(µ), is a user defined nodeto-node similarity score function edge-to-edge similarity score function where e u T1, e v T2 are corresponding edges. The penalty of deleting a degree-2-node from T 2 The penalty for deleting any other node.

33 Subtree Homeomorphism Score Given two rooted ordered trees, P and T, the weighted subtree homeomorphism problem is to find a homeomorphism-preserving mapping µ : P t from P to some subtree t of T, such that S(µ) is maximal.

34 Subtree Homeomorphism Score The cost function varies from one application to another, depending upon the amount of information supplied with the query. The simplest one just compares the topology of the structures. More complex functions include length differences of the structural elements, sequence conservation and pseudoknot matching. The node deletion score (i.e., gap penalty) reflects the tradeoff between a gap and a mismatch. As the gap penalty increases, the algorithm tends to match distant nodes to avoid gaps. As different values may suit different needs, our tool enables users to set this parameter for each run.

35 The Tree Alignment Algorithm A bottom-up two level dynamic programming (DP) and computing optimal alignments between P and any homeomorphic subtree t of T which maximizes the homeomorphism score between P and t. O(mn) algorithm, where m and n are the number of vertices in P and T respectively. The bottom-up computation requires computing scores for all subtrees of P and T.

36 The Tree Alignment Algorithm We define score(u,v) to be: a subtree of P rooted in node u P a subtree of T rooted in node v T

37 The two-stage DP approach to the tree alignment The compared trees = score(a,1) Large DP - m*n table Activated during computation of each non-leaf entry (u,v) in the L DP in order to compute the optimal mapping between the children of u and the children of v. Small DP - comparing subtrees of f and 9 ( second-level dynamic programming )

38 The Computation of score(u,v) Done recursively in a postorder traversal of T and P: First, score(u,v) values are computed for all leaf nodes of T and P. Next, score(u,v) values are computed for each node pair in P and T, based on the values of the previously computed scores for all children of u and v: If c(u) c(v) S DP is computed for sequences <x 1,...,x c(u) > and <y 1,...y c(v) >. the ordered set (5 3 ) of children of node v the ordered set (5 3 ) of children of node u

39 The Small_DP The cost of the diagonal edge in cell (x i,y j ) is set to score(x i,y j ). The costs of the vertical edges are set to - to reflect the fact that no deletions are allowed from the query. All horizontal edges are assigned the cost of deleting a node from T (denoted by δ 2 ). Let OptP be the highest scoring path in S DP. Then score(u,v) is assigned to be: Deleting v

40 The Tree Alignment Algorithm The algorithm returns a vertex v* T that maximizes the score S(µ:P t v* ) (found in the last row of L DP ). V*

41 Time Complexity Analysis W - the sliding window size N - the size of the target sequence m, n - the number of nodes in the tree representations of the query and of a folded window (of the target sequence) O(W 3 ), O(NW 3 ) O(W),O(NW) Total - O(NW 3 )

42 Time Complexity Analysis The search stage. For each given query: iterating over all O(N) trees in the TDB and applying the subtree homeomorphism algorithm. The algorithm computes an O(mn) dynamic programming matrix, denoted L DP. For each computed entry (u,v) in the L DP matrix, the core work is that of computing the corresponding S DP dynamic programming matrix in O(c(u)*c(v)).

43 Dealing with Potential Pseudoknots Extension of the subtree homeomorphism algorithm to handle the pseudoknot considerations posed by the riboswitches in our study. 2 GGUAU Indeed, [Mandal et al., 2003] predicted a potential pseudoknot between the two arms of the purine riboswitch aptamer. 4 CCGUA In order to extend our model to take such key information into consideration we annotate the tree with this additional information by connecting node 2 and node 4 with a potential pseudoknot edge.

44 Dealing with Potential Pseudoknots Observations: These edges break down the tree-like representation of the RNA secondary structures. The potential pseudoknot is confined to the subtree rooted in node 8, i.e., node 2 and node 4 are sibling nodes sharing a common parent node. For all riboswitch aptamer queries in this study, only one potential pseudoknot is predicted and it is always formed between two sibling leaf nodes sharing a common parent node. The text subtrees could be annotated with any number of potential sibling pseudoknots*. sibling pseudoknot edge * based on loop sequence complementarity analysis that is executed in the preprocessing stage.

45 Updating the S DP X : pseudoknot in the query Y and Z : candidate pseudoknots in the text. If arc X is to be matched to arc Y: the optimal DP path must enter block G2 through vertex (0, 2) and leave it through vertex (3, 6). In this case, the weight of the optimal path will be the sum of its three components: OptPath G1 [(0,0),(2,2)] + OptPath G2 [(0, 2),(3, 6)] + OptPath G3 [(1, 8),(0, 6)] The optimal pseudoknot matching corresponds to the highest scoring path among all the optional paths. When the number of optional paths is constant, the pseudoknot matching increases the time complexity of the main stage by a constant factor only. This is, in practice, the observed case for the riboswitch searches applied in this study.

46 Taking into account sequence considerations Variety of sequence considerations: Single-stranded RNA-RNA or RNA- Protein interactions (e.g. trna and riboswitches) - apply sequence alignment criterion to the single strand regions like bulges and loops. Double-stranded interactions (e.g. mirna) - sequence alignment scoring is applied to the compared stems. Target database Filtering by structure and pseudoknot constraints Relatively small number of structures* Sequence comparisons are performed on the small number of filtered candidates the effect of its runtime on the overall search is negligible. Applying sequence constraints Final pool of candidates * We will see it in results later

47 Experimental Results Riboswitches Purine Riboswitch trna

48 Purine Riboswitch Riboswitches: Part of an RNA molecule. Directly bind a small target molecules with high affinity and as a consequence they respond with conformational switching that affects the gene s activity. Purine riboswitch - binds guanine/adenine to regulate purine metabolism and transport.

49 Purine Riboswitch The secondary structure: A three-stem junction with a multiloop connecting two hairpins and the 5-3 end. Significant sequence conservation occurs within P1 and in the unpaired regions. Some base-pairing potential exists between the two stemloop sequences, which might permit the formation of a pseudoknot.

50 Results First dataset FN=0 Sensetivity (TP/TP+FN )=1 PPV (TP/TP+FP )= 1 except for Clostridium perfringens

51 Results Second dataset The search was conducted in three stages: 1. Based only on topological similarity, as computed via subtree homeomorphism (S1). 2. Enhancing the structural comparison with edge and loop length criteria (S2). 3. Combining the sequence considerations into the search (S3). This reduced the number of false positives to zero or one. This shows the importance of additional constraints supported by our tool in false positives control.

52 Searching for Riboswitches in Newly Sequenced Data Lactobacillus family Lactobacillus acidophilus at c( ) Lactobacillus delbrueckii at c( ) Lactobacillus salivarius at c( ) Sequential conservation of nucleotides in the functionally critical positions. [Mandal et al., 2003]

53

Dynamic Programming (cont d) CS 466 Saurabh Sinha

Dynamic Programming (cont d) CS 466 Saurabh Sinha Dynamic Programming (cont d) CS 466 Saurabh Sinha Spliced Alignment Begins by selecting either all putative exons between potential acceptor and donor sites or by finding all substrings similar to the

More information

Towards a Weighted-Tree Similarity Algorithm for RNA Secondary Structure Comparison

Towards a Weighted-Tree Similarity Algorithm for RNA Secondary Structure Comparison Towards a Weighted-Tree Similarity Algorithm for RNA Secondary Structure Comparison Jing Jin, Biplab K. Sarker, Virendra C. Bhavsar, Harold Boley 2, Lu Yang Faculty of Computer Science, University of New

More information

Dynamic Programming User Manual v1.0 Anton E. Weisstein, Truman State University Aug. 19, 2014

Dynamic Programming User Manual v1.0 Anton E. Weisstein, Truman State University Aug. 19, 2014 Dynamic Programming User Manual v1.0 Anton E. Weisstein, Truman State University Aug. 19, 2014 Dynamic programming is a group of mathematical methods used to sequentially split a complicated problem into

More information

Long Read RNA-seq Mapper

Long Read RNA-seq Mapper UNIVERSITY OF ZAGREB FACULTY OF ELECTRICAL ENGENEERING AND COMPUTING MASTER THESIS no. 1005 Long Read RNA-seq Mapper Josip Marić Zagreb, February 2015. Table of Contents 1. Introduction... 1 2. RNA Sequencing...

More information

Deciphering the Information Encoded in RNA Viral Genomes

Deciphering the Information Encoded in RNA Viral Genomes Deciphering the Information Encoded in RNA Viral Genomes Christine E. Heitsch Genome Center of Wisconsin and Mathematics Department University of Wisconsin Madison Detecting and Processing Regularities

More information

Computational Genomics and Molecular Biology, Fall

Computational Genomics and Molecular Biology, Fall Computational Genomics and Molecular Biology, Fall 2015 1 Sequence Alignment Dannie Durand Pairwise Sequence Alignment The goal of pairwise sequence alignment is to establish a correspondence between the

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 04: Variations of sequence alignments http://www.pitt.edu/~mcs2/teaching/biocomp/tutorials/global.html Slides adapted from Dr. Shaojie Zhang (University

More information

Computational Molecular Biology

Computational Molecular Biology Computational Molecular Biology Erwin M. Bakker Lecture 3, mainly from material by R. Shamir [2] and H.J. Hoogeboom [4]. 1 Pairwise Sequence Alignment Biological Motivation Algorithmic Aspect Recursive

More information

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Fundamenta Informaticae 56 (2003) 105 120 105 IOS Press A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Jesper Jansson Department of Computer Science Lund University, Box 118 SE-221

More information

Alignment of Trees and Directed Acyclic Graphs

Alignment of Trees and Directed Acyclic Graphs Alignment of Trees and Directed Acyclic Graphs Gabriel Valiente Algorithms, Bioinformatics, Complexity and Formal Methods Research Group Technical University of Catalonia Computational Biology and Bioinformatics

More information

A Multiple Graph Layers Model with Application to RNA Secondary Structures Comparison

A Multiple Graph Layers Model with Application to RNA Secondary Structures Comparison Author manuscript, published in "String Processing and Information Retrieval 2005, Argentine (2005)" A Multiple Graph Layers Model with Application to RNA Secondary Structures Comparison Julien Allali

More information

Dynamic Programming: Sequence alignment. CS 466 Saurabh Sinha

Dynamic Programming: Sequence alignment. CS 466 Saurabh Sinha Dynamic Programming: Sequence alignment CS 466 Saurabh Sinha DNA Sequence Comparison: First Success Story Finding sequence similarities with genes of known function is a common approach to infer a newly

More information

A Seeded Genetic Algorithm for RNA Secondary Structural Prediction with Pseudoknots

A Seeded Genetic Algorithm for RNA Secondary Structural Prediction with Pseudoknots San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research 2008 A Seeded Genetic Algorithm for RNA Secondary Structural Prediction with Pseudoknots Ryan Pham San

More information

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio. 1990. CS 466 Saurabh Sinha Motivation Sequence homology to a known protein suggest function of newly sequenced protein Bioinformatics

More information

EECS 4425: Introductory Computational Bioinformatics Fall Suprakash Datta

EECS 4425: Introductory Computational Bioinformatics Fall Suprakash Datta EECS 4425: Introductory Computational Bioinformatics Fall 2018 Suprakash Datta datta [at] cse.yorku.ca Office: CSEB 3043 Phone: 416-736-2100 ext 77875 Course page: http://www.cse.yorku.ca/course/4425 Many

More information

Finding local RNA motifs using covariance models

Finding local RNA motifs using covariance models Finding local RNA motifs using covariance models Sohrab P. Shah and Anne Condon Department of Computer Science, University of British Columbia, Vancouver, BC, Canada sshah, condon@cs.ubc.ca Technical Report

More information

Biology 644: Bioinformatics

Biology 644: Bioinformatics A statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states in the training data. First used in speech and handwriting recognition In

More information

Linear trees and RNA secondary. structuret's

Linear trees and RNA secondary. structuret's ELSEVIER Discrete Applied Mathematics 51 (1994) 317-323 DISCRETE APPLIED MATHEMATICS Linear trees and RNA secondary. structuret's William R. Schmitt*.", Michael S. Watermanb "University of Memphis. Memphis,

More information

Math 8803/4803, Spring 2008: Discrete Mathematical Biology

Math 8803/4803, Spring 2008: Discrete Mathematical Biology Math 8803/4803, Spring 2008: Discrete Mathematical Biology Prof. Christine Heitsch School of Mathematics Georgia Institute of Technology Lecture 11 February 1, 2008 and give one secondary structure for

More information

Special course in Computer Science: Advanced Text Algorithms

Special course in Computer Science: Advanced Text Algorithms Special course in Computer Science: Advanced Text Algorithms Lecture 6: Alignments Elena Czeizler and Ion Petre Department of IT, Abo Akademi Computational Biomodelling Laboratory http://www.users.abo.fi/ipetre/textalg

More information

Graph Algorithms Using Depth First Search

Graph Algorithms Using Depth First Search Graph Algorithms Using Depth First Search Analysis of Algorithms Week 8, Lecture 1 Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Graph Algorithms Using Depth

More information

USING AN EXTENDED SUFFIX TREE TO SPEED-UP SEQUENCE ALIGNMENT

USING AN EXTENDED SUFFIX TREE TO SPEED-UP SEQUENCE ALIGNMENT IADIS International Conference Applied Computing 2006 USING AN EXTENDED SUFFIX TREE TO SPEED-UP SEQUENCE ALIGNMENT Divya R. Singh Software Engineer Microsoft Corporation, Redmond, WA 98052, USA Abdullah

More information

As of August 15, 2008, GenBank contained bases from reported sequences. The search procedure should be

As of August 15, 2008, GenBank contained bases from reported sequences. The search procedure should be 48 Bioinformatics I, WS 09-10, S. Henz (script by D. Huson) November 26, 2009 4 BLAST and BLAT Outline of the chapter: 1. Heuristics for the pairwise local alignment of two sequences 2. BLAST: search and

More information

Manual for RNA-As-Graphs Topology (RAGTOP) software suite

Manual for RNA-As-Graphs Topology (RAGTOP) software suite Manual for RNA-As-Graphs Topology (RAGTOP) software suite Schlick lab Contents 1 General Information 1 1.1 Copyright statement....................................... 1 1.2 Citation requirements.......................................

More information

Sept. 9, An Introduction to Bioinformatics. Special Topics BSC5936:

Sept. 9, An Introduction to Bioinformatics. Special Topics BSC5936: Special Topics BSC5936: An Introduction to Bioinformatics. Florida State University The Department of Biological Science www.bio.fsu.edu Sept. 9, 2003 The Dot Matrix Method Steven M. Thompson Florida State

More information

BLAST & Genome assembly

BLAST & Genome assembly BLAST & Genome assembly Solon P. Pissis Tomáš Flouri Heidelberg Institute for Theoretical Studies May 15, 2014 1 BLAST What is BLAST? The algorithm 2 Genome assembly De novo assembly Mapping assembly 3

More information

PROTEIN MULTIPLE ALIGNMENT MOTIVATION: BACKGROUND: Marina Sirota

PROTEIN MULTIPLE ALIGNMENT MOTIVATION: BACKGROUND: Marina Sirota Marina Sirota MOTIVATION: PROTEIN MULTIPLE ALIGNMENT To study evolution on the genetic level across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein

More information

7.36/7.91/20.390/20.490/6.802/6.874 PROBLEM SET 3. Gibbs Sampler, RNA secondary structure, Protein Structure with PyRosetta, Connections (25 Points)

7.36/7.91/20.390/20.490/6.802/6.874 PROBLEM SET 3. Gibbs Sampler, RNA secondary structure, Protein Structure with PyRosetta, Connections (25 Points) 7.36/7.91/20.390/20.490/6.802/6.874 PROBLEM SET 3. Gibbs Sampler, RNA secondary structure, Protein Structure with PyRosetta, Connections (25 Points) Due: Thursday, April 3 th at noon. Python Scripts All

More information

Lecture 10. Sequence alignments

Lecture 10. Sequence alignments Lecture 10 Sequence alignments Alignment algorithms: Overview Given a scoring system, we need to have an algorithm for finding an optimal alignment for a pair of sequences. We want to maximize the score

More information

The Dot Matrix Method

The Dot Matrix Method Special Topics BS5936: An Introduction to Bioinformatics. Florida State niversity The Department of Biological Science www.bio.fsu.edu Sept. 9, 2003 The Dot Matrix Method Steven M. Thompson Florida State

More information

1. R. Durbin, S. Eddy, A. Krogh und G. Mitchison: Biological sequence analysis, Cambridge, 1998

1. R. Durbin, S. Eddy, A. Krogh und G. Mitchison: Biological sequence analysis, Cambridge, 1998 7 Multiple Sequence Alignment The exposition was prepared by Clemens Gröpl, based on earlier versions by Daniel Huson, Knut Reinert, and Gunnar Klau. It is based on the following sources, which are all

More information

Computational Molecular Biology

Computational Molecular Biology Computational Molecular Biology Erwin M. Bakker Lecture 2 Materials used from R. Shamir [2] and H.J. Hoogeboom [4]. 1 Molecular Biology Sequences DNA A, T, C, G RNA A, U, C, G Protein A, R, D, N, C E,

More information

Data Mining Technologies for Bioinformatics Sequences

Data Mining Technologies for Bioinformatics Sequences Data Mining Technologies for Bioinformatics Sequences Deepak Garg Computer Science and Engineering Department Thapar Institute of Engineering & Tecnology, Patiala Abstract Main tool used for sequence alignment

More information

GSNAP: Fast and SNP-tolerant detection of complex variants and splicing in short reads by Thomas D. Wu and Serban Nacu

GSNAP: Fast and SNP-tolerant detection of complex variants and splicing in short reads by Thomas D. Wu and Serban Nacu GSNAP: Fast and SNP-tolerant detection of complex variants and splicing in short reads by Thomas D. Wu and Serban Nacu Matt Huska Freie Universität Berlin Computational Methods for High-Throughput Omics

More information

Binary Trees

Binary Trees Binary Trees 4-7-2005 Opening Discussion What did we talk about last class? Do you have any code to show? Do you have any questions about the assignment? What is a Tree? You are all familiar with what

More information

A Partition Function Algorithm for Nucleic Acid Secondary Structure Including Pseudoknots

A Partition Function Algorithm for Nucleic Acid Secondary Structure Including Pseudoknots A Partition Function Algorithm for Nucleic Acid Secondary Structure Including Pseudoknots ROBERT M. DIRKS, 1 NILES A. PIERCE 2 1 Department of Chemistry, California Institute of Technology, Pasadena, California

More information

Lecture 5: Multiple sequence alignment

Lecture 5: Multiple sequence alignment Lecture 5: Multiple sequence alignment Introduction to Computational Biology Teresa Przytycka, PhD (with some additions by Martin Vingron) Why do we need multiple sequence alignment Pairwise sequence alignment

More information

Today s Lecture. Edit graph & alignment algorithms. Local vs global Computational complexity of pairwise alignment Multiple sequence alignment

Today s Lecture. Edit graph & alignment algorithms. Local vs global Computational complexity of pairwise alignment Multiple sequence alignment Today s Lecture Edit graph & alignment algorithms Smith-Waterman algorithm Needleman-Wunsch algorithm Local vs global Computational complexity of pairwise alignment Multiple sequence alignment 1 Sequence

More information

Weighted Tree Kernels for Sequence Analysis

Weighted Tree Kernels for Sequence Analysis ESANN 2014 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence Weighted Tree Kernels for Sequence Analysis Christopher J. Bowles and James M. Hogan School of Electrical

More information

Biology 644: Bioinformatics

Biology 644: Bioinformatics Find the best alignment between 2 sequences with lengths n and m, respectively Best alignment is very dependent upon the substitution matrix and gap penalties The Global Alignment Problem tries to find

More information

BLAST, Profile, and PSI-BLAST

BLAST, Profile, and PSI-BLAST BLAST, Profile, and PSI-BLAST Jianlin Cheng, PhD School of Electrical Engineering and Computer Science University of Central Florida 26 Free for academic use Copyright @ Jianlin Cheng & original sources

More information

Pairwise alignment II

Pairwise alignment II Pairwise alignment II Agenda - Previous Lesson: Minhala + Introduction - Review Dynamic Programming - Pariwise Alignment Biological Motivation Today: - Quick Review: Sequence Alignment (Global, Local,

More information

Alignments BLAST, BLAT

Alignments BLAST, BLAT Alignments BLAST, BLAT Genome Genome Gene vs Built of DNA DNA Describes Organism Protein gene Stored as Circular/ linear Single molecule, or a few of them Both (depending on the species) Part of genome

More information

Dynamic Programming Part I: Examples. Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, / 77

Dynamic Programming Part I: Examples. Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, / 77 Dynamic Programming Part I: Examples Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 1 / 77 Dynamic Programming Recall: the Change Problem Other problems: Manhattan

More information

Mismatch String Kernels for SVM Protein Classification

Mismatch String Kernels for SVM Protein Classification Mismatch String Kernels for SVM Protein Classification by C. Leslie, E. Eskin, J. Weston, W.S. Noble Athina Spiliopoulou Morfoula Fragopoulou Ioannis Konstas Outline Definitions & Background Proteins Remote

More information

Trees : Part 1. Section 4.1. Theory and Terminology. A Tree? A Tree? Theory and Terminology. Theory and Terminology

Trees : Part 1. Section 4.1. Theory and Terminology. A Tree? A Tree? Theory and Terminology. Theory and Terminology Trees : Part Section. () (2) Preorder, Postorder and Levelorder Traversals Definition: A tree is a connected graph with no cycles Consequences: Between any two vertices, there is exactly one unique path

More information

Gene regulation. DNA is merely the blueprint Shared spatially (among all tissues) and temporally But cells manage to differentiate

Gene regulation. DNA is merely the blueprint Shared spatially (among all tissues) and temporally But cells manage to differentiate Gene regulation DNA is merely the blueprint Shared spatially (among all tissues) and temporally But cells manage to differentiate Especially but not only during developmental stage And cells respond to

More information

Dynamic Programming & Smith-Waterman algorithm

Dynamic Programming & Smith-Waterman algorithm m m Seminar: Classical Papers in Bioinformatics May 3rd, 2010 m m 1 2 3 m m Introduction m Definition is a method of solving problems by breaking them down into simpler steps problem need to contain overlapping

More information

An Efficient Algorithm for Identifying the Most Contributory Substring. Ben Stephenson Department of Computer Science University of Western Ontario

An Efficient Algorithm for Identifying the Most Contributory Substring. Ben Stephenson Department of Computer Science University of Western Ontario An Efficient Algorithm for Identifying the Most Contributory Substring Ben Stephenson Department of Computer Science University of Western Ontario Problem Definition Related Problems Applications Algorithm

More information

Evolutionary tree reconstruction (Chapter 10)

Evolutionary tree reconstruction (Chapter 10) Evolutionary tree reconstruction (Chapter 10) Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships between species since Darwin till early

More information

Compares a sequence of protein to another sequence or database of a protein, or a sequence of DNA to another sequence or library of DNA.

Compares a sequence of protein to another sequence or database of a protein, or a sequence of DNA to another sequence or library of DNA. Compares a sequence of protein to another sequence or database of a protein, or a sequence of DNA to another sequence or library of DNA. Fasta is used to compare a protein or DNA sequence to all of the

More information

Inexact Matching, Alignment. See Gusfield, Chapter 9 Dasgupta et al. Chapter 6 (Dynamic Programming)

Inexact Matching, Alignment. See Gusfield, Chapter 9 Dasgupta et al. Chapter 6 (Dynamic Programming) Inexact Matching, Alignment See Gusfield, Chapter 9 Dasgupta et al. Chapter 6 (Dynamic Programming) Outline Yet more applications of generalized suffix trees, when combined with a least common ancestor

More information

Lecture Overview. Sequence search & alignment. Searching sequence databases. Sequence Alignment & Search. Goals: Motivations:

Lecture Overview. Sequence search & alignment. Searching sequence databases. Sequence Alignment & Search. Goals: Motivations: Lecture Overview Sequence Alignment & Search Karin Verspoor, Ph.D. Faculty, Computational Bioscience Program University of Colorado School of Medicine With credit and thanks to Larry Hunter for creating

More information

Longest Common Subsequence, Knapsack, Independent Set Scribe: Wilbur Yang (2016), Mary Wootters (2017) Date: November 6, 2017

Longest Common Subsequence, Knapsack, Independent Set Scribe: Wilbur Yang (2016), Mary Wootters (2017) Date: November 6, 2017 CS161 Lecture 13 Longest Common Subsequence, Knapsack, Independent Set Scribe: Wilbur Yang (2016), Mary Wootters (2017) Date: November 6, 2017 1 Overview Last lecture, we talked about dynamic programming

More information

FASTA. Besides that, FASTA package provides SSEARCH, an implementation of the optimal Smith- Waterman algorithm.

FASTA. Besides that, FASTA package provides SSEARCH, an implementation of the optimal Smith- Waterman algorithm. FASTA INTRODUCTION Definition (by David J. Lipman and William R. Pearson in 1985) - Compares a sequence of protein to another sequence or database of a protein, or a sequence of DNA to another sequence

More information

Visualization of Secondary RNA Structure Prediction Algorithms

Visualization of Secondary RNA Structure Prediction Algorithms San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research 2006 Visualization of Secondary RNA Structure Prediction Algorithms Brandon Hunter San Jose State University

More information

CLC Server. End User USER MANUAL

CLC Server. End User USER MANUAL CLC Server End User USER MANUAL Manual for CLC Server 10.0.1 Windows, macos and Linux March 8, 2018 This software is for research purposes only. QIAGEN Aarhus Silkeborgvej 2 Prismet DK-8000 Aarhus C Denmark

More information

Sequence Alignment & Search

Sequence Alignment & Search Sequence Alignment & Search Karin Verspoor, Ph.D. Faculty, Computational Bioscience Program University of Colorado School of Medicine With credit and thanks to Larry Hunter for creating the first version

More information

HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT

HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT - Swarbhanu Chatterjee. Hidden Markov models are a sophisticated and flexible statistical tool for the study of protein models. Using HMMs to analyze proteins

More information

Motif Discovery using optimized Suffix Tries

Motif Discovery using optimized Suffix Tries Motif Discovery using optimized Suffix Tries Sergio Prado Promoter: Prof. dr. ir. Jan Fostier Supervisor: ir. Dieter De Witte Faculty of Engineering and Architecture Department of Information Technology

More information

8/19/13. Computational problems. Introduction to Algorithm

8/19/13. Computational problems. Introduction to Algorithm I519, Introduction to Introduction to Algorithm Yuzhen Ye (yye@indiana.edu) School of Informatics and Computing, IUB Computational problems A computational problem specifies an input-output relationship

More information

motifs In the context of networks, the term motif may refer to di erent notions. Subgraph motifs Coloured motifs { }

motifs In the context of networks, the term motif may refer to di erent notions. Subgraph motifs Coloured motifs { } motifs In the context of networks, the term motif may refer to di erent notions. Subgraph motifs Coloured motifs G M { } 2 subgraph motifs 3 motifs Find interesting patterns in a network. 4 motifs Find

More information

Database Searching Using BLAST

Database Searching Using BLAST Mahidol University Objectives SCMI512 Molecular Sequence Analysis Database Searching Using BLAST Lecture 2B After class, students should be able to: explain the FASTA algorithm for database searching explain

More information

Important Example: Gene Sequence Matching. Corrigiendum. Central Dogma of Modern Biology. Genetics. How Nucleotides code for Amino Acids

Important Example: Gene Sequence Matching. Corrigiendum. Central Dogma of Modern Biology. Genetics. How Nucleotides code for Amino Acids Important Example: Gene Sequence Matching Century of Biology Two views of computer science s relationship to biology: Bioinformatics: computational methods to help discover new biology from lots of data

More information

HymenopteraMine Documentation

HymenopteraMine Documentation HymenopteraMine Documentation Release 1.0 Aditi Tayal, Deepak Unni, Colin Diesh, Chris Elsik, Darren Hagen Apr 06, 2017 Contents 1 Welcome to HymenopteraMine 3 1.1 Overview of HymenopteraMine.....................................

More information

An Analysis of Pairwise Sequence Alignment Algorithm Complexities: Needleman-Wunsch, Smith-Waterman, FASTA, BLAST and Gapped BLAST

An Analysis of Pairwise Sequence Alignment Algorithm Complexities: Needleman-Wunsch, Smith-Waterman, FASTA, BLAST and Gapped BLAST An Analysis of Pairwise Sequence Alignment Algorithm Complexities: Needleman-Wunsch, Smith-Waterman, FASTA, BLAST and Gapped BLAST Alexander Chan 5075504 Biochemistry 218 Final Project An Analysis of Pairwise

More information

Brief review from last class

Brief review from last class Sequence Alignment Brief review from last class DNA is has direction, we will use only one (5 -> 3 ) and generate the opposite strand as needed. DNA is a 3D object (see lecture 1) but we will model it

More information

Dynamic Programming: Widget Layout

Dynamic Programming: Widget Layout Dynamic Programming: Widget Layout Setup There are two types of widgets. A leaf widget is a visible widget that someone may see or use, such as a button or an image. Every leaf widget has a list of possible

More information

Sequence analysis Pairwise sequence alignment

Sequence analysis Pairwise sequence alignment UMF11 Introduction to bioinformatics, 25 Sequence analysis Pairwise sequence alignment 1. Sequence alignment Lecturer: Marina lexandersson 12 September, 25 here are two types of sequence alignments, global

More information

ON HEURISTIC METHODS IN NEXT-GENERATION SEQUENCING DATA ANALYSIS

ON HEURISTIC METHODS IN NEXT-GENERATION SEQUENCING DATA ANALYSIS ON HEURISTIC METHODS IN NEXT-GENERATION SEQUENCING DATA ANALYSIS Ivan Vogel Doctoral Degree Programme (1), FIT BUT E-mail: xvogel01@stud.fit.vutbr.cz Supervised by: Jaroslav Zendulka E-mail: zendulka@fit.vutbr.cz

More information

Shortest Path Algorithm

Shortest Path Algorithm Shortest Path Algorithm C Works just fine on this graph. C Length of shortest path = Copyright 2005 DIMACS BioMath Connect Institute Robert Hochberg Dynamic Programming SP #1 Same Questions, Different

More information

Approximate Labelled Subtree Homeomorphism

Approximate Labelled Subtree Homeomorphism Approximate Labelled Subtree Homeomorphism Ron Y. Pinter 1,, Oleg Rokhlenko 1,, Dekel Tsur 2, and Michal Ziv-Ukelson 1, 1 Dept. of Computer Science, Technion - Israel Institute of Technology, Haifa 32000,

More information

Today s Lecture. Multiple sequence alignment. Improved scoring of pairwise alignments. Affine gap penalties Profiles

Today s Lecture. Multiple sequence alignment. Improved scoring of pairwise alignments. Affine gap penalties Profiles Today s Lecture Multiple sequence alignment Improved scoring of pairwise alignments Affine gap penalties Profiles 1 The Edit Graph for a Pair of Sequences G A C G T T G A A T G A C C C A C A T G A C G

More information

e-ccc-biclustering: Related work on biclustering algorithms for time series gene expression data

e-ccc-biclustering: Related work on biclustering algorithms for time series gene expression data : Related work on biclustering algorithms for time series gene expression data Sara C. Madeira 1,2,3, Arlindo L. Oliveira 1,2 1 Knowledge Discovery and Bioinformatics (KDBIO) group, INESC-ID, Lisbon, Portugal

More information

Sequence alignment is an essential concept for bioinformatics, as most of our data analysis and interpretation techniques make use of it.

Sequence alignment is an essential concept for bioinformatics, as most of our data analysis and interpretation techniques make use of it. Sequence Alignments Overview Sequence alignment is an essential concept for bioinformatics, as most of our data analysis and interpretation techniques make use of it. Sequence alignment means arranging

More information

BLAST. Basic Local Alignment Search Tool. Used to quickly compare a protein or DNA sequence to a database.

BLAST. Basic Local Alignment Search Tool. Used to quickly compare a protein or DNA sequence to a database. BLAST Basic Local Alignment Search Tool Used to quickly compare a protein or DNA sequence to a database. There is no such thing as a free lunch BLAST is fast and highly sensitive compared to competitors.

More information

PLANAR: RNA Sequence Alignment using Non-Affine Gap Penalty and Secondary Structure

PLANAR: RNA Sequence Alignment using Non-Affine Gap Penalty and Secondary Structure PLANAR: RNA Sequence Alignment using Non-Affine Gap Penalty and Secondary Structure O. GILL Courant Inst., NYU; E-mail: gill@cs.nyu.edu. N. RAMAKRISHNAN Virginia Tech.; Email: naren@cs.vt.edu B. MISHRA

More information

Reconstructing long sequences from overlapping sequence fragment. Searching databases for related sequences and subsequences

Reconstructing long sequences from overlapping sequence fragment. Searching databases for related sequences and subsequences SEQUENCE ALIGNMENT ALGORITHMS 1 Why compare sequences? Reconstructing long sequences from overlapping sequence fragment Searching databases for related sequences and subsequences Storing, retrieving and

More information

V8 Molecular decomposition of graphs

V8 Molecular decomposition of graphs V8 Molecular decomposition of graphs - Most cellular processes result from a cascade of events mediated by proteins that act in a cooperative manner. - Protein complexes can share components: proteins

More information

The affix array data structure and its applications to RNA secondary structure analysis

The affix array data structure and its applications to RNA secondary structure analysis Theoretical Computer Science 389 (2007) 278 294 www.elsevier.com/locate/tcs The affix array data structure and its applications to RNA secondary structure analysis Dirk Strothmann Technische Fakultät,

More information

Special course in Computer Science: Advanced Text Algorithms

Special course in Computer Science: Advanced Text Algorithms Special course in Computer Science: Advanced Text Algorithms Lecture 8: Multiple alignments Elena Czeizler and Ion Petre Department of IT, Abo Akademi Computational Biomodelling Laboratory http://www.users.abo.fi/ipetre/textalg

More information

Sequence clustering. Introduction. Clustering basics. Hierarchical clustering

Sequence clustering. Introduction. Clustering basics. Hierarchical clustering Sequence clustering Introduction Data clustering is one of the key tools used in various incarnations of data-mining - trying to make sense of large datasets. It is, thus, natural to ask whether clustering

More information

Pairwise Sequence Alignment: Dynamic Programming Algorithms. COMP Spring 2015 Luay Nakhleh, Rice University

Pairwise Sequence Alignment: Dynamic Programming Algorithms. COMP Spring 2015 Luay Nakhleh, Rice University Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 - Spring 2015 Luay Nakhleh, Rice University DP Algorithms for Pairwise Alignment The number of all possible pairwise alignments (if

More information

Acceleration of the Smith-Waterman algorithm for DNA sequence alignment using an FPGA platform

Acceleration of the Smith-Waterman algorithm for DNA sequence alignment using an FPGA platform Acceleration of the Smith-Waterman algorithm for DNA sequence alignment using an FPGA platform Barry Strengholt Matthijs Brobbel Delft University of Technology Faculty of Electrical Engineering, Mathematics

More information

Sequence Alignment. part 2

Sequence Alignment. part 2 Sequence Alignment part 2 Dynamic programming with more realistic scoring scheme Using the same initial sequences, we ll look at a dynamic programming example with a scoring scheme that selects for matches

More information

Global Alignment Scoring Matrices Local Alignment Alignment with Affine Gap Penalties

Global Alignment Scoring Matrices Local Alignment Alignment with Affine Gap Penalties Global Alignment Scoring Matrices Local Alignment Alignment with Affine Gap Penalties From LCS to Alignment: Change the Scoring The Longest Common Subsequence (LCS) problem the simplest form of sequence

More information

Bioinformatics I, WS 09-10, D. Huson, February 10,

Bioinformatics I, WS 09-10, D. Huson, February 10, Bioinformatics I, WS 09-10, D. Huson, February 10, 2010 189 12 More on Suffix Trees This week we study the following material: WOTD-algorithm MUMs finding repeats using suffix trees 12.1 The WOTD Algorithm

More information

1. R. Durbin, S. Eddy, A. Krogh und G. Mitchison: Biological sequence analysis, Cambridge, 1998

1. R. Durbin, S. Eddy, A. Krogh und G. Mitchison: Biological sequence analysis, Cambridge, 1998 7 Multiple Sequence Alignment The exposition was prepared by Clemens GrÃP pl, based on earlier versions by Daniel Huson, Knut Reinert, and Gunnar Klau. It is based on the following sources, which are all

More information

Subject Index. Journal of Discrete Algorithms 5 (2007)

Subject Index. Journal of Discrete Algorithms 5 (2007) Journal of Discrete Algorithms 5 (2007) 751 755 www.elsevier.com/locate/jda Subject Index Ad hoc and wireless networks Ad hoc networks Admission control Algorithm ; ; A simple fast hybrid pattern-matching

More information

ROTS: Reproducibility Optimized Test Statistic

ROTS: Reproducibility Optimized Test Statistic ROTS: Reproducibility Optimized Test Statistic Fatemeh Seyednasrollah, Tomi Suomi, Laura L. Elo fatsey (at) utu.fi March 3, 2016 Contents 1 Introduction 2 2 Algorithm overview 3 3 Input data 3 4 Preprocessing

More information

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

CISC 636 Computational Biology & Bioinformatics (Fall 2016) CISC 636 Computational Biology & Bioinformatics (Fall 2016) Sequence pairwise alignment Score statistics: E-value and p-value Heuristic algorithms: BLAST and FASTA Database search: gene finding and annotations

More information

Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 Luay Nakhleh, Rice University

Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 Luay Nakhleh, Rice University 1 Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 Luay Nakhleh, Rice University DP Algorithms for Pairwise Alignment 2 The number of all possible pairwise alignments (if gaps are allowed)

More information

Prediction of RNA secondary structure including kissing hairpin motifs

Prediction of RNA secondary structure including kissing hairpin motifs Prediction of RNA secondary structure including kissing hairpin motifs Corinna Theis, Stefan Janssen and Robert Giegerich Faculty of Technology & Center for Biotechnology Bielefeld University, Germany

More information

Transfer String Kernel for Cross-Context Sequence Specific DNA-Protein Binding Prediction. by Ritambhara Singh IIIT-Delhi June 10, 2016

Transfer String Kernel for Cross-Context Sequence Specific DNA-Protein Binding Prediction. by Ritambhara Singh IIIT-Delhi June 10, 2016 Transfer String Kernel for Cross-Context Sequence Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 Biology in a Slide DNA RNA PROTEIN CELL ORGANISM 2 DNA and Diseases

More information

Paths, Flowers and Vertex Cover

Paths, Flowers and Vertex Cover Paths, Flowers and Vertex Cover Venkatesh Raman, M.S. Ramanujan, and Saket Saurabh Presenting: Hen Sender 1 Introduction 2 Abstract. It is well known that in a bipartite (and more generally in a Konig)

More information

Explanation for Tree isomorphism talk

Explanation for Tree isomorphism talk Joint Advanced Student School Explanation for Tree isomorphism talk by Alexander Smal (avsmal@gmail.com) Saint-Petersburg, Russia 2008 Abstract In this talk we considered a problem of tree isomorphism.

More information

Parsimony-Based Approaches to Inferring Phylogenetic Trees

Parsimony-Based Approaches to Inferring Phylogenetic Trees Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 www.biostat.wisc.edu/bmi576.html Mark Craven craven@biostat.wisc.edu Fall 0 Phylogenetic tree approaches! three general types! distance:

More information

Single Pass, BLAST-like, Approximate String Matching on FPGAs*

Single Pass, BLAST-like, Approximate String Matching on FPGAs* Single Pass, BLAST-like, Approximate String Matching on FPGAs* Martin Herbordt Josh Model Yongfeng Gu Bharat Sukhwani Tom VanCourt Computer Architecture and Automated Design Laboratory Department of Electrical

More information

A more efficient algorithm for perfect sorting by reversals

A more efficient algorithm for perfect sorting by reversals A more efficient algorithm for perfect sorting by reversals Sèverine Bérard 1,2, Cedric Chauve 3,4, and Christophe Paul 5 1 Département de Mathématiques et d Informatique Appliquée, INRA, Toulouse, France.

More information

Locality-sensitive hashing and biological network alignment

Locality-sensitive hashing and biological network alignment Locality-sensitive hashing and biological network alignment Laura LeGault - University of Wisconsin, Madison 12 May 2008 Abstract Large biological networks contain much information about the functionality

More information