Dynamic Programming Course: A structure based flexible search method for motifs in RNA. By: Veksler, I., Ziv-Ukelson, M., Barash, D.
|
|
- Chester Webster
- 6 years ago
- Views:
Transcription
1 Dynamic Programming Course: A structure based flexible search method for motifs in RNA By: Veksler, I., Ziv-Ukelson, M., Barash, D., Kedem, K
2 Outline Background Motivation RNA s structure representations Trees comparison Our Algorithm Results
3 The Central Dogma of Molecular Biology DNA transcription RNA translation Non Coding RNA - RNA molecule that is not translated into a protein - Have been found to have roles in a great variety of processes Protein DNA RNA Protein
4 Non Coding RNA Families They are not conserved in sequence, but they are conserved in structure. Have a role in regulating gene expression. trna, rrna, snorna, microrna, sirna, Riboswitch
5 Motivation The discovery of non coding RNA (ncrna) motifs and their role in regulating gene expression has recently attracted considerable attention. The goal is to discover these motifs in a sequence database. Most RNA motif search methods start from the primary sequence and only then take into account secondary structure considerations. Since different motifs vary in structure rigidity and in local sequence constraints, there is a need for algorithms and tools that can be fine-tuned according to the searched RNA motif.
6 Our Goal Discover ncrna motifs in a sequence database. Genome Sequence QUERY millions of nucleotides ACGCUGACGUAGUCAGUAGACGAC AGACAGAUACGUCACCGCAGAUAC GCAUAGUAGCAGUAGCAGAUGACG ACGCUGACGUAGUCAGUAGACGAC AGACAGAUACGUCACCGCAGAUAC GCAUAGUAGCAGUAGCAGAUGACG Are there any appearances of this structure in the genome?
7 The tool - STRMS (Structural RNA Motif Search): Input: Secondary structure of the query, including local sequence and structure constraints, and a target sequence database. Output: All occurrences of the query in the target, ranked by their similarity to the query [in html file]. The tool is flexible and takes into account a large number of sequence options. Our approach combines: pre-folding with MFOLD (Zuker, 2003) RNA pattern matching algorithm [O(mn)] based on subtree homeomorphism for ordered, rooted trees.
8 The method: Our method consists of two phases: preprocessing phase Preparing the target database for a variety of future queries: Partitioning the target text into given size consecutive overlapping windows with a predefined overlap. Folding each window (by mfold) Optimal and few sub-optimal structures. Converting each structure to its tree representation tree data base (TDB). search phase Tree alignment algorithm and filter according to our pre-defined constraints. The division into two phases enables the user to run various queries and refine the constraints of each query search without reinvesting time in folding the target database.
9 RNA s Secondary Structure Pseudoknot Single-Stranded Stem Interior Loop Bulge Loop Hairpin loop Junction (Multiloop) Image Wuchty
10 RNA s Secondary Structure (((((((..((((.)))).(((((.)))))..(((((.))))))))))))
11 RNA s Secondary Structure Graph
12 Ordered rooted tree Shapiro, 1988: The nodes correspond to elements of secondary structure (hairpin loop, bulge, internal loop or multi-loop). The edges correspond to base-paired (stem) regions. Zhang, 1998: The nodes of the tree represent either unpaired bases (leaves) or paired bases (internal nodes). Each node is labeled with a base or a pair of bases, respectively. Two kinds of edges, alternatively connecting either consecutive stem base-pairs or a leaf base with the last base-pair in the corresponding stem.
13 This leads to a precise screening of the target text by first selecting candidates whose structural tree representation is similar to that of the query, and then further filtering these candidates by applying sequence considerations. Our tree representation Compressed as in [Shapiro, 1988] + a node for every single strand component in multiloops. Includes additional information on nodes and on edges for the purpose of sequence analysis. It is more informative than Shapiro s tree representation and more compact then Zhang s.
14 Our tree representation origin of a single structure interior loop bulge loop dangling ends stem -edge single-strand components of the multiloop hairpin loop Single-strand components and stem-edges are annotated with length and sequence. A small circle node carries only topological information. Generating the tree structure from a ct-file (output from mfold). The tree construction is ordered by the 5 to 3 ordering of the molecule. Compressed structure which retains also the sequence information.
15 Our tree representation
16 Comparison of ordered rooted trees Trees are among the most common and wellstudied combinatorial structures in computer science. In particular, the problem of comparing trees occurs in several diverse areas such as: computational biology structured text databases image analysis automatic theorem proving compiler optimization.
17 Comparison of ordered rooted trees The following operations are defined on ordered trees: relabel - Change the label of a node v in T. delete - Delete a non-root node v in T with parent v, making the children of v become the children of v. The children are inserted in the place of v as a subsequence in the left-to-right order of the children of v. insert - The complement of delete. Insert a node v as a child of v in T making v the parent of a consecutive subsequence of the children of v.
18 Edit distance Assume that we are given a cost function defined on each edit operation. An edit script S between T1 and T2 is a sequence of edit operations turning T1 into T2. The cost of S is the sum of the costs of the operations in S. An optimal edit script between T1 and T2 is an edit script between T1 and T2 of minimum cost and this cost is the tree edit distance, denoted by δ(t1, T2). The tree edit distance problem is to compute the edit distance and a corresponding edit script.
19 Edit distance
20 Edit distance
21 Edit distance
22 Edit distance
23 Edit distance
24 Tree Inclusion T1 is included in T2 if there is a sequence of delete operations performed on T2 which makes T2 isomorphic to T1. The tree inclusion problem is to decide if T1 is included in T2. The tree inclusion problem is a special case of the tree edit distance problem: If insertions all have cost 0 and all other operations have cost 1, then T1 can be included in T2 if and only if δ(t1,t2) = 0.
25 Tree Inclusion
26 Tree Inclusion
27 Polynomial time algorithms exist for these problems. They are all based on the classic technique of dynamic programming and most of them are simple combinatorial algorithms.
28 Comparison of ordered rooted trees Ordered tree comparison is generally computed by tree edit distance, which allows various forms of deletions and insertions in both query and target. The search for small non-coding RNAs naturally yields a more specific tree search formulation since we do not allow deletions in the query. In our method we apply a weighted pattern matching algorithm for finding the best homeomorphic mapping between two rooted ordered trees. Specific constraints on the searched motif can be defined in the input to the search: structural constraints (lengths), allowing or forbidding element deletion in the target, sequence constraints (existence of sibling pseudoknots, local conserved sequence segments).
29 The Algorithm The subtree isomorphism problem [Matula, 1968,1978]: Given a pattern tree P and a text tree T, find a subtree of T which is isomorphic to P, i.e. find if some subtree of T that is identical in structure to P can be obtained by removing entire subtrees of T, or decide that there is no such tree. The subtree homeomorphism problem [Chung, 1987, Reyner, 1977, Pinter et al., 2004]: Is a variant of the former problem, where degree-2 nodes can be deleted from the text tree. Homeomorphism Example
30 The Algorithm - Motivation Point-mutation events could easily result in an extra bulge in an RNA structure. However, in some cases the functional homology to the original, non-mutated structure is still preserved. The suggested alignment should be flexible enough to allow the deletion of degree- 2 nodes from the target tree. bulge riboswitch and its functional homologue
31 The Algorithm - Motivation In some cases subtrees may be deleted from the target tree but not from the query tree, as in trna case. Both of the above application-specific properties are captured by subtree homeomorphism. Subtree homeomorphism on ordered rooted trees is more efficient (quadratic in input size) than tree edit distance (cubic in input size). Thus, by utilizing the biological properties that are typical to our application we obtain a fast variant of weighted subtree-homeomorphism on ordered rooted trees that captures our search criteria.
32 Subtree Homeomorphism Score Let T 1 and T 2 be two ordered, rooted, homeomorphic trees. A mapping µ : T 1 T 2 is a one-to-one map from the nodes of T 1 to the nodes of T 2 that preserves the ancestor relations of the nodes and their relative order. The subtree homeomorphism score of the mapping, denoted S(µ), is a user defined nodeto-node similarity score function edge-to-edge similarity score function where e u T1, e v T2 are corresponding edges. The penalty of deleting a degree-2-node from T 2 The penalty for deleting any other node.
33 Subtree Homeomorphism Score Given two rooted ordered trees, P and T, the weighted subtree homeomorphism problem is to find a homeomorphism-preserving mapping µ : P t from P to some subtree t of T, such that S(µ) is maximal.
34 Subtree Homeomorphism Score The cost function varies from one application to another, depending upon the amount of information supplied with the query. The simplest one just compares the topology of the structures. More complex functions include length differences of the structural elements, sequence conservation and pseudoknot matching. The node deletion score (i.e., gap penalty) reflects the tradeoff between a gap and a mismatch. As the gap penalty increases, the algorithm tends to match distant nodes to avoid gaps. As different values may suit different needs, our tool enables users to set this parameter for each run.
35 The Tree Alignment Algorithm A bottom-up two level dynamic programming (DP) and computing optimal alignments between P and any homeomorphic subtree t of T which maximizes the homeomorphism score between P and t. O(mn) algorithm, where m and n are the number of vertices in P and T respectively. The bottom-up computation requires computing scores for all subtrees of P and T.
36 The Tree Alignment Algorithm We define score(u,v) to be: a subtree of P rooted in node u P a subtree of T rooted in node v T
37 The two-stage DP approach to the tree alignment The compared trees = score(a,1) Large DP - m*n table Activated during computation of each non-leaf entry (u,v) in the L DP in order to compute the optimal mapping between the children of u and the children of v. Small DP - comparing subtrees of f and 9 ( second-level dynamic programming )
38 The Computation of score(u,v) Done recursively in a postorder traversal of T and P: First, score(u,v) values are computed for all leaf nodes of T and P. Next, score(u,v) values are computed for each node pair in P and T, based on the values of the previously computed scores for all children of u and v: If c(u) c(v) S DP is computed for sequences <x 1,...,x c(u) > and <y 1,...y c(v) >. the ordered set (5 3 ) of children of node v the ordered set (5 3 ) of children of node u
39 The Small_DP The cost of the diagonal edge in cell (x i,y j ) is set to score(x i,y j ). The costs of the vertical edges are set to - to reflect the fact that no deletions are allowed from the query. All horizontal edges are assigned the cost of deleting a node from T (denoted by δ 2 ). Let OptP be the highest scoring path in S DP. Then score(u,v) is assigned to be: Deleting v
40 The Tree Alignment Algorithm The algorithm returns a vertex v* T that maximizes the score S(µ:P t v* ) (found in the last row of L DP ). V*
41 Time Complexity Analysis W - the sliding window size N - the size of the target sequence m, n - the number of nodes in the tree representations of the query and of a folded window (of the target sequence) O(W 3 ), O(NW 3 ) O(W),O(NW) Total - O(NW 3 )
42 Time Complexity Analysis The search stage. For each given query: iterating over all O(N) trees in the TDB and applying the subtree homeomorphism algorithm. The algorithm computes an O(mn) dynamic programming matrix, denoted L DP. For each computed entry (u,v) in the L DP matrix, the core work is that of computing the corresponding S DP dynamic programming matrix in O(c(u)*c(v)).
43 Dealing with Potential Pseudoknots Extension of the subtree homeomorphism algorithm to handle the pseudoknot considerations posed by the riboswitches in our study. 2 GGUAU Indeed, [Mandal et al., 2003] predicted a potential pseudoknot between the two arms of the purine riboswitch aptamer. 4 CCGUA In order to extend our model to take such key information into consideration we annotate the tree with this additional information by connecting node 2 and node 4 with a potential pseudoknot edge.
44 Dealing with Potential Pseudoknots Observations: These edges break down the tree-like representation of the RNA secondary structures. The potential pseudoknot is confined to the subtree rooted in node 8, i.e., node 2 and node 4 are sibling nodes sharing a common parent node. For all riboswitch aptamer queries in this study, only one potential pseudoknot is predicted and it is always formed between two sibling leaf nodes sharing a common parent node. The text subtrees could be annotated with any number of potential sibling pseudoknots*. sibling pseudoknot edge * based on loop sequence complementarity analysis that is executed in the preprocessing stage.
45 Updating the S DP X : pseudoknot in the query Y and Z : candidate pseudoknots in the text. If arc X is to be matched to arc Y: the optimal DP path must enter block G2 through vertex (0, 2) and leave it through vertex (3, 6). In this case, the weight of the optimal path will be the sum of its three components: OptPath G1 [(0,0),(2,2)] + OptPath G2 [(0, 2),(3, 6)] + OptPath G3 [(1, 8),(0, 6)] The optimal pseudoknot matching corresponds to the highest scoring path among all the optional paths. When the number of optional paths is constant, the pseudoknot matching increases the time complexity of the main stage by a constant factor only. This is, in practice, the observed case for the riboswitch searches applied in this study.
46 Taking into account sequence considerations Variety of sequence considerations: Single-stranded RNA-RNA or RNA- Protein interactions (e.g. trna and riboswitches) - apply sequence alignment criterion to the single strand regions like bulges and loops. Double-stranded interactions (e.g. mirna) - sequence alignment scoring is applied to the compared stems. Target database Filtering by structure and pseudoknot constraints Relatively small number of structures* Sequence comparisons are performed on the small number of filtered candidates the effect of its runtime on the overall search is negligible. Applying sequence constraints Final pool of candidates * We will see it in results later
47 Experimental Results Riboswitches Purine Riboswitch trna
48 Purine Riboswitch Riboswitches: Part of an RNA molecule. Directly bind a small target molecules with high affinity and as a consequence they respond with conformational switching that affects the gene s activity. Purine riboswitch - binds guanine/adenine to regulate purine metabolism and transport.
49 Purine Riboswitch The secondary structure: A three-stem junction with a multiloop connecting two hairpins and the 5-3 end. Significant sequence conservation occurs within P1 and in the unpaired regions. Some base-pairing potential exists between the two stemloop sequences, which might permit the formation of a pseudoknot.
50 Results First dataset FN=0 Sensetivity (TP/TP+FN )=1 PPV (TP/TP+FP )= 1 except for Clostridium perfringens
51 Results Second dataset The search was conducted in three stages: 1. Based only on topological similarity, as computed via subtree homeomorphism (S1). 2. Enhancing the structural comparison with edge and loop length criteria (S2). 3. Combining the sequence considerations into the search (S3). This reduced the number of false positives to zero or one. This shows the importance of additional constraints supported by our tool in false positives control.
52 Searching for Riboswitches in Newly Sequenced Data Lactobacillus family Lactobacillus acidophilus at c( ) Lactobacillus delbrueckii at c( ) Lactobacillus salivarius at c( ) Sequential conservation of nucleotides in the functionally critical positions. [Mandal et al., 2003]
53
Dynamic Programming (cont d) CS 466 Saurabh Sinha
Dynamic Programming (cont d) CS 466 Saurabh Sinha Spliced Alignment Begins by selecting either all putative exons between potential acceptor and donor sites or by finding all substrings similar to the
More informationTowards a Weighted-Tree Similarity Algorithm for RNA Secondary Structure Comparison
Towards a Weighted-Tree Similarity Algorithm for RNA Secondary Structure Comparison Jing Jin, Biplab K. Sarker, Virendra C. Bhavsar, Harold Boley 2, Lu Yang Faculty of Computer Science, University of New
More informationDynamic Programming User Manual v1.0 Anton E. Weisstein, Truman State University Aug. 19, 2014
Dynamic Programming User Manual v1.0 Anton E. Weisstein, Truman State University Aug. 19, 2014 Dynamic programming is a group of mathematical methods used to sequentially split a complicated problem into
More informationLong Read RNA-seq Mapper
UNIVERSITY OF ZAGREB FACULTY OF ELECTRICAL ENGENEERING AND COMPUTING MASTER THESIS no. 1005 Long Read RNA-seq Mapper Josip Marić Zagreb, February 2015. Table of Contents 1. Introduction... 1 2. RNA Sequencing...
More informationDeciphering the Information Encoded in RNA Viral Genomes
Deciphering the Information Encoded in RNA Viral Genomes Christine E. Heitsch Genome Center of Wisconsin and Mathematics Department University of Wisconsin Madison Detecting and Processing Regularities
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2015 1 Sequence Alignment Dannie Durand Pairwise Sequence Alignment The goal of pairwise sequence alignment is to establish a correspondence between the
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 04: Variations of sequence alignments http://www.pitt.edu/~mcs2/teaching/biocomp/tutorials/global.html Slides adapted from Dr. Shaojie Zhang (University
More informationComputational Molecular Biology
Computational Molecular Biology Erwin M. Bakker Lecture 3, mainly from material by R. Shamir [2] and H.J. Hoogeboom [4]. 1 Pairwise Sequence Alignment Biological Motivation Algorithmic Aspect Recursive
More informationA Fast Algorithm for Optimal Alignment between Similar Ordered Trees
Fundamenta Informaticae 56 (2003) 105 120 105 IOS Press A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Jesper Jansson Department of Computer Science Lund University, Box 118 SE-221
More informationAlignment of Trees and Directed Acyclic Graphs
Alignment of Trees and Directed Acyclic Graphs Gabriel Valiente Algorithms, Bioinformatics, Complexity and Formal Methods Research Group Technical University of Catalonia Computational Biology and Bioinformatics
More informationA Multiple Graph Layers Model with Application to RNA Secondary Structures Comparison
Author manuscript, published in "String Processing and Information Retrieval 2005, Argentine (2005)" A Multiple Graph Layers Model with Application to RNA Secondary Structures Comparison Julien Allali
More informationDynamic Programming: Sequence alignment. CS 466 Saurabh Sinha
Dynamic Programming: Sequence alignment CS 466 Saurabh Sinha DNA Sequence Comparison: First Success Story Finding sequence similarities with genes of known function is a common approach to infer a newly
More informationA Seeded Genetic Algorithm for RNA Secondary Structural Prediction with Pseudoknots
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research 2008 A Seeded Genetic Algorithm for RNA Secondary Structural Prediction with Pseudoknots Ryan Pham San
More informationBLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio. 1990. CS 466 Saurabh Sinha Motivation Sequence homology to a known protein suggest function of newly sequenced protein Bioinformatics
More informationEECS 4425: Introductory Computational Bioinformatics Fall Suprakash Datta
EECS 4425: Introductory Computational Bioinformatics Fall 2018 Suprakash Datta datta [at] cse.yorku.ca Office: CSEB 3043 Phone: 416-736-2100 ext 77875 Course page: http://www.cse.yorku.ca/course/4425 Many
More informationFinding local RNA motifs using covariance models
Finding local RNA motifs using covariance models Sohrab P. Shah and Anne Condon Department of Computer Science, University of British Columbia, Vancouver, BC, Canada sshah, condon@cs.ubc.ca Technical Report
More informationBiology 644: Bioinformatics
A statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states in the training data. First used in speech and handwriting recognition In
More informationLinear trees and RNA secondary. structuret's
ELSEVIER Discrete Applied Mathematics 51 (1994) 317-323 DISCRETE APPLIED MATHEMATICS Linear trees and RNA secondary. structuret's William R. Schmitt*.", Michael S. Watermanb "University of Memphis. Memphis,
More informationMath 8803/4803, Spring 2008: Discrete Mathematical Biology
Math 8803/4803, Spring 2008: Discrete Mathematical Biology Prof. Christine Heitsch School of Mathematics Georgia Institute of Technology Lecture 11 February 1, 2008 and give one secondary structure for
More informationSpecial course in Computer Science: Advanced Text Algorithms
Special course in Computer Science: Advanced Text Algorithms Lecture 6: Alignments Elena Czeizler and Ion Petre Department of IT, Abo Akademi Computational Biomodelling Laboratory http://www.users.abo.fi/ipetre/textalg
More informationGraph Algorithms Using Depth First Search
Graph Algorithms Using Depth First Search Analysis of Algorithms Week 8, Lecture 1 Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Graph Algorithms Using Depth
More informationUSING AN EXTENDED SUFFIX TREE TO SPEED-UP SEQUENCE ALIGNMENT
IADIS International Conference Applied Computing 2006 USING AN EXTENDED SUFFIX TREE TO SPEED-UP SEQUENCE ALIGNMENT Divya R. Singh Software Engineer Microsoft Corporation, Redmond, WA 98052, USA Abdullah
More informationAs of August 15, 2008, GenBank contained bases from reported sequences. The search procedure should be
48 Bioinformatics I, WS 09-10, S. Henz (script by D. Huson) November 26, 2009 4 BLAST and BLAT Outline of the chapter: 1. Heuristics for the pairwise local alignment of two sequences 2. BLAST: search and
More informationManual for RNA-As-Graphs Topology (RAGTOP) software suite
Manual for RNA-As-Graphs Topology (RAGTOP) software suite Schlick lab Contents 1 General Information 1 1.1 Copyright statement....................................... 1 1.2 Citation requirements.......................................
More informationSept. 9, An Introduction to Bioinformatics. Special Topics BSC5936:
Special Topics BSC5936: An Introduction to Bioinformatics. Florida State University The Department of Biological Science www.bio.fsu.edu Sept. 9, 2003 The Dot Matrix Method Steven M. Thompson Florida State
More informationBLAST & Genome assembly
BLAST & Genome assembly Solon P. Pissis Tomáš Flouri Heidelberg Institute for Theoretical Studies May 15, 2014 1 BLAST What is BLAST? The algorithm 2 Genome assembly De novo assembly Mapping assembly 3
More informationPROTEIN MULTIPLE ALIGNMENT MOTIVATION: BACKGROUND: Marina Sirota
Marina Sirota MOTIVATION: PROTEIN MULTIPLE ALIGNMENT To study evolution on the genetic level across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein
More information7.36/7.91/20.390/20.490/6.802/6.874 PROBLEM SET 3. Gibbs Sampler, RNA secondary structure, Protein Structure with PyRosetta, Connections (25 Points)
7.36/7.91/20.390/20.490/6.802/6.874 PROBLEM SET 3. Gibbs Sampler, RNA secondary structure, Protein Structure with PyRosetta, Connections (25 Points) Due: Thursday, April 3 th at noon. Python Scripts All
More informationLecture 10. Sequence alignments
Lecture 10 Sequence alignments Alignment algorithms: Overview Given a scoring system, we need to have an algorithm for finding an optimal alignment for a pair of sequences. We want to maximize the score
More informationThe Dot Matrix Method
Special Topics BS5936: An Introduction to Bioinformatics. Florida State niversity The Department of Biological Science www.bio.fsu.edu Sept. 9, 2003 The Dot Matrix Method Steven M. Thompson Florida State
More information1. R. Durbin, S. Eddy, A. Krogh und G. Mitchison: Biological sequence analysis, Cambridge, 1998
7 Multiple Sequence Alignment The exposition was prepared by Clemens Gröpl, based on earlier versions by Daniel Huson, Knut Reinert, and Gunnar Klau. It is based on the following sources, which are all
More informationComputational Molecular Biology
Computational Molecular Biology Erwin M. Bakker Lecture 2 Materials used from R. Shamir [2] and H.J. Hoogeboom [4]. 1 Molecular Biology Sequences DNA A, T, C, G RNA A, U, C, G Protein A, R, D, N, C E,
More informationData Mining Technologies for Bioinformatics Sequences
Data Mining Technologies for Bioinformatics Sequences Deepak Garg Computer Science and Engineering Department Thapar Institute of Engineering & Tecnology, Patiala Abstract Main tool used for sequence alignment
More informationGSNAP: Fast and SNP-tolerant detection of complex variants and splicing in short reads by Thomas D. Wu and Serban Nacu
GSNAP: Fast and SNP-tolerant detection of complex variants and splicing in short reads by Thomas D. Wu and Serban Nacu Matt Huska Freie Universität Berlin Computational Methods for High-Throughput Omics
More informationBinary Trees
Binary Trees 4-7-2005 Opening Discussion What did we talk about last class? Do you have any code to show? Do you have any questions about the assignment? What is a Tree? You are all familiar with what
More informationA Partition Function Algorithm for Nucleic Acid Secondary Structure Including Pseudoknots
A Partition Function Algorithm for Nucleic Acid Secondary Structure Including Pseudoknots ROBERT M. DIRKS, 1 NILES A. PIERCE 2 1 Department of Chemistry, California Institute of Technology, Pasadena, California
More informationLecture 5: Multiple sequence alignment
Lecture 5: Multiple sequence alignment Introduction to Computational Biology Teresa Przytycka, PhD (with some additions by Martin Vingron) Why do we need multiple sequence alignment Pairwise sequence alignment
More informationToday s Lecture. Edit graph & alignment algorithms. Local vs global Computational complexity of pairwise alignment Multiple sequence alignment
Today s Lecture Edit graph & alignment algorithms Smith-Waterman algorithm Needleman-Wunsch algorithm Local vs global Computational complexity of pairwise alignment Multiple sequence alignment 1 Sequence
More informationWeighted Tree Kernels for Sequence Analysis
ESANN 2014 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence Weighted Tree Kernels for Sequence Analysis Christopher J. Bowles and James M. Hogan School of Electrical
More informationBiology 644: Bioinformatics
Find the best alignment between 2 sequences with lengths n and m, respectively Best alignment is very dependent upon the substitution matrix and gap penalties The Global Alignment Problem tries to find
More informationBLAST, Profile, and PSI-BLAST
BLAST, Profile, and PSI-BLAST Jianlin Cheng, PhD School of Electrical Engineering and Computer Science University of Central Florida 26 Free for academic use Copyright @ Jianlin Cheng & original sources
More informationPairwise alignment II
Pairwise alignment II Agenda - Previous Lesson: Minhala + Introduction - Review Dynamic Programming - Pariwise Alignment Biological Motivation Today: - Quick Review: Sequence Alignment (Global, Local,
More informationAlignments BLAST, BLAT
Alignments BLAST, BLAT Genome Genome Gene vs Built of DNA DNA Describes Organism Protein gene Stored as Circular/ linear Single molecule, or a few of them Both (depending on the species) Part of genome
More informationDynamic Programming Part I: Examples. Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, / 77
Dynamic Programming Part I: Examples Bioinfo I (Institut Pasteur de Montevideo) Dynamic Programming -class4- July 25th, 2011 1 / 77 Dynamic Programming Recall: the Change Problem Other problems: Manhattan
More informationMismatch String Kernels for SVM Protein Classification
Mismatch String Kernels for SVM Protein Classification by C. Leslie, E. Eskin, J. Weston, W.S. Noble Athina Spiliopoulou Morfoula Fragopoulou Ioannis Konstas Outline Definitions & Background Proteins Remote
More informationTrees : Part 1. Section 4.1. Theory and Terminology. A Tree? A Tree? Theory and Terminology. Theory and Terminology
Trees : Part Section. () (2) Preorder, Postorder and Levelorder Traversals Definition: A tree is a connected graph with no cycles Consequences: Between any two vertices, there is exactly one unique path
More informationGene regulation. DNA is merely the blueprint Shared spatially (among all tissues) and temporally But cells manage to differentiate
Gene regulation DNA is merely the blueprint Shared spatially (among all tissues) and temporally But cells manage to differentiate Especially but not only during developmental stage And cells respond to
More informationDynamic Programming & Smith-Waterman algorithm
m m Seminar: Classical Papers in Bioinformatics May 3rd, 2010 m m 1 2 3 m m Introduction m Definition is a method of solving problems by breaking them down into simpler steps problem need to contain overlapping
More informationAn Efficient Algorithm for Identifying the Most Contributory Substring. Ben Stephenson Department of Computer Science University of Western Ontario
An Efficient Algorithm for Identifying the Most Contributory Substring Ben Stephenson Department of Computer Science University of Western Ontario Problem Definition Related Problems Applications Algorithm
More informationEvolutionary tree reconstruction (Chapter 10)
Evolutionary tree reconstruction (Chapter 10) Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships between species since Darwin till early
More informationCompares a sequence of protein to another sequence or database of a protein, or a sequence of DNA to another sequence or library of DNA.
Compares a sequence of protein to another sequence or database of a protein, or a sequence of DNA to another sequence or library of DNA. Fasta is used to compare a protein or DNA sequence to all of the
More informationInexact Matching, Alignment. See Gusfield, Chapter 9 Dasgupta et al. Chapter 6 (Dynamic Programming)
Inexact Matching, Alignment See Gusfield, Chapter 9 Dasgupta et al. Chapter 6 (Dynamic Programming) Outline Yet more applications of generalized suffix trees, when combined with a least common ancestor
More informationLecture Overview. Sequence search & alignment. Searching sequence databases. Sequence Alignment & Search. Goals: Motivations:
Lecture Overview Sequence Alignment & Search Karin Verspoor, Ph.D. Faculty, Computational Bioscience Program University of Colorado School of Medicine With credit and thanks to Larry Hunter for creating
More informationLongest Common Subsequence, Knapsack, Independent Set Scribe: Wilbur Yang (2016), Mary Wootters (2017) Date: November 6, 2017
CS161 Lecture 13 Longest Common Subsequence, Knapsack, Independent Set Scribe: Wilbur Yang (2016), Mary Wootters (2017) Date: November 6, 2017 1 Overview Last lecture, we talked about dynamic programming
More informationFASTA. Besides that, FASTA package provides SSEARCH, an implementation of the optimal Smith- Waterman algorithm.
FASTA INTRODUCTION Definition (by David J. Lipman and William R. Pearson in 1985) - Compares a sequence of protein to another sequence or database of a protein, or a sequence of DNA to another sequence
More informationVisualization of Secondary RNA Structure Prediction Algorithms
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research 2006 Visualization of Secondary RNA Structure Prediction Algorithms Brandon Hunter San Jose State University
More informationCLC Server. End User USER MANUAL
CLC Server End User USER MANUAL Manual for CLC Server 10.0.1 Windows, macos and Linux March 8, 2018 This software is for research purposes only. QIAGEN Aarhus Silkeborgvej 2 Prismet DK-8000 Aarhus C Denmark
More informationSequence Alignment & Search
Sequence Alignment & Search Karin Verspoor, Ph.D. Faculty, Computational Bioscience Program University of Colorado School of Medicine With credit and thanks to Larry Hunter for creating the first version
More informationHIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT
HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT - Swarbhanu Chatterjee. Hidden Markov models are a sophisticated and flexible statistical tool for the study of protein models. Using HMMs to analyze proteins
More informationMotif Discovery using optimized Suffix Tries
Motif Discovery using optimized Suffix Tries Sergio Prado Promoter: Prof. dr. ir. Jan Fostier Supervisor: ir. Dieter De Witte Faculty of Engineering and Architecture Department of Information Technology
More information8/19/13. Computational problems. Introduction to Algorithm
I519, Introduction to Introduction to Algorithm Yuzhen Ye (yye@indiana.edu) School of Informatics and Computing, IUB Computational problems A computational problem specifies an input-output relationship
More informationmotifs In the context of networks, the term motif may refer to di erent notions. Subgraph motifs Coloured motifs { }
motifs In the context of networks, the term motif may refer to di erent notions. Subgraph motifs Coloured motifs G M { } 2 subgraph motifs 3 motifs Find interesting patterns in a network. 4 motifs Find
More informationDatabase Searching Using BLAST
Mahidol University Objectives SCMI512 Molecular Sequence Analysis Database Searching Using BLAST Lecture 2B After class, students should be able to: explain the FASTA algorithm for database searching explain
More informationImportant Example: Gene Sequence Matching. Corrigiendum. Central Dogma of Modern Biology. Genetics. How Nucleotides code for Amino Acids
Important Example: Gene Sequence Matching Century of Biology Two views of computer science s relationship to biology: Bioinformatics: computational methods to help discover new biology from lots of data
More informationHymenopteraMine Documentation
HymenopteraMine Documentation Release 1.0 Aditi Tayal, Deepak Unni, Colin Diesh, Chris Elsik, Darren Hagen Apr 06, 2017 Contents 1 Welcome to HymenopteraMine 3 1.1 Overview of HymenopteraMine.....................................
More informationAn Analysis of Pairwise Sequence Alignment Algorithm Complexities: Needleman-Wunsch, Smith-Waterman, FASTA, BLAST and Gapped BLAST
An Analysis of Pairwise Sequence Alignment Algorithm Complexities: Needleman-Wunsch, Smith-Waterman, FASTA, BLAST and Gapped BLAST Alexander Chan 5075504 Biochemistry 218 Final Project An Analysis of Pairwise
More informationBrief review from last class
Sequence Alignment Brief review from last class DNA is has direction, we will use only one (5 -> 3 ) and generate the opposite strand as needed. DNA is a 3D object (see lecture 1) but we will model it
More informationDynamic Programming: Widget Layout
Dynamic Programming: Widget Layout Setup There are two types of widgets. A leaf widget is a visible widget that someone may see or use, such as a button or an image. Every leaf widget has a list of possible
More informationSequence analysis Pairwise sequence alignment
UMF11 Introduction to bioinformatics, 25 Sequence analysis Pairwise sequence alignment 1. Sequence alignment Lecturer: Marina lexandersson 12 September, 25 here are two types of sequence alignments, global
More informationON HEURISTIC METHODS IN NEXT-GENERATION SEQUENCING DATA ANALYSIS
ON HEURISTIC METHODS IN NEXT-GENERATION SEQUENCING DATA ANALYSIS Ivan Vogel Doctoral Degree Programme (1), FIT BUT E-mail: xvogel01@stud.fit.vutbr.cz Supervised by: Jaroslav Zendulka E-mail: zendulka@fit.vutbr.cz
More informationShortest Path Algorithm
Shortest Path Algorithm C Works just fine on this graph. C Length of shortest path = Copyright 2005 DIMACS BioMath Connect Institute Robert Hochberg Dynamic Programming SP #1 Same Questions, Different
More informationApproximate Labelled Subtree Homeomorphism
Approximate Labelled Subtree Homeomorphism Ron Y. Pinter 1,, Oleg Rokhlenko 1,, Dekel Tsur 2, and Michal Ziv-Ukelson 1, 1 Dept. of Computer Science, Technion - Israel Institute of Technology, Haifa 32000,
More informationToday s Lecture. Multiple sequence alignment. Improved scoring of pairwise alignments. Affine gap penalties Profiles
Today s Lecture Multiple sequence alignment Improved scoring of pairwise alignments Affine gap penalties Profiles 1 The Edit Graph for a Pair of Sequences G A C G T T G A A T G A C C C A C A T G A C G
More informatione-ccc-biclustering: Related work on biclustering algorithms for time series gene expression data
: Related work on biclustering algorithms for time series gene expression data Sara C. Madeira 1,2,3, Arlindo L. Oliveira 1,2 1 Knowledge Discovery and Bioinformatics (KDBIO) group, INESC-ID, Lisbon, Portugal
More informationSequence alignment is an essential concept for bioinformatics, as most of our data analysis and interpretation techniques make use of it.
Sequence Alignments Overview Sequence alignment is an essential concept for bioinformatics, as most of our data analysis and interpretation techniques make use of it. Sequence alignment means arranging
More informationBLAST. Basic Local Alignment Search Tool. Used to quickly compare a protein or DNA sequence to a database.
BLAST Basic Local Alignment Search Tool Used to quickly compare a protein or DNA sequence to a database. There is no such thing as a free lunch BLAST is fast and highly sensitive compared to competitors.
More informationPLANAR: RNA Sequence Alignment using Non-Affine Gap Penalty and Secondary Structure
PLANAR: RNA Sequence Alignment using Non-Affine Gap Penalty and Secondary Structure O. GILL Courant Inst., NYU; E-mail: gill@cs.nyu.edu. N. RAMAKRISHNAN Virginia Tech.; Email: naren@cs.vt.edu B. MISHRA
More informationReconstructing long sequences from overlapping sequence fragment. Searching databases for related sequences and subsequences
SEQUENCE ALIGNMENT ALGORITHMS 1 Why compare sequences? Reconstructing long sequences from overlapping sequence fragment Searching databases for related sequences and subsequences Storing, retrieving and
More informationV8 Molecular decomposition of graphs
V8 Molecular decomposition of graphs - Most cellular processes result from a cascade of events mediated by proteins that act in a cooperative manner. - Protein complexes can share components: proteins
More informationThe affix array data structure and its applications to RNA secondary structure analysis
Theoretical Computer Science 389 (2007) 278 294 www.elsevier.com/locate/tcs The affix array data structure and its applications to RNA secondary structure analysis Dirk Strothmann Technische Fakultät,
More informationSpecial course in Computer Science: Advanced Text Algorithms
Special course in Computer Science: Advanced Text Algorithms Lecture 8: Multiple alignments Elena Czeizler and Ion Petre Department of IT, Abo Akademi Computational Biomodelling Laboratory http://www.users.abo.fi/ipetre/textalg
More informationSequence clustering. Introduction. Clustering basics. Hierarchical clustering
Sequence clustering Introduction Data clustering is one of the key tools used in various incarnations of data-mining - trying to make sense of large datasets. It is, thus, natural to ask whether clustering
More informationPairwise Sequence Alignment: Dynamic Programming Algorithms. COMP Spring 2015 Luay Nakhleh, Rice University
Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 - Spring 2015 Luay Nakhleh, Rice University DP Algorithms for Pairwise Alignment The number of all possible pairwise alignments (if
More informationAcceleration of the Smith-Waterman algorithm for DNA sequence alignment using an FPGA platform
Acceleration of the Smith-Waterman algorithm for DNA sequence alignment using an FPGA platform Barry Strengholt Matthijs Brobbel Delft University of Technology Faculty of Electrical Engineering, Mathematics
More informationSequence Alignment. part 2
Sequence Alignment part 2 Dynamic programming with more realistic scoring scheme Using the same initial sequences, we ll look at a dynamic programming example with a scoring scheme that selects for matches
More informationGlobal Alignment Scoring Matrices Local Alignment Alignment with Affine Gap Penalties
Global Alignment Scoring Matrices Local Alignment Alignment with Affine Gap Penalties From LCS to Alignment: Change the Scoring The Longest Common Subsequence (LCS) problem the simplest form of sequence
More informationBioinformatics I, WS 09-10, D. Huson, February 10,
Bioinformatics I, WS 09-10, D. Huson, February 10, 2010 189 12 More on Suffix Trees This week we study the following material: WOTD-algorithm MUMs finding repeats using suffix trees 12.1 The WOTD Algorithm
More information1. R. Durbin, S. Eddy, A. Krogh und G. Mitchison: Biological sequence analysis, Cambridge, 1998
7 Multiple Sequence Alignment The exposition was prepared by Clemens GrÃP pl, based on earlier versions by Daniel Huson, Knut Reinert, and Gunnar Klau. It is based on the following sources, which are all
More informationSubject Index. Journal of Discrete Algorithms 5 (2007)
Journal of Discrete Algorithms 5 (2007) 751 755 www.elsevier.com/locate/jda Subject Index Ad hoc and wireless networks Ad hoc networks Admission control Algorithm ; ; A simple fast hybrid pattern-matching
More informationROTS: Reproducibility Optimized Test Statistic
ROTS: Reproducibility Optimized Test Statistic Fatemeh Seyednasrollah, Tomi Suomi, Laura L. Elo fatsey (at) utu.fi March 3, 2016 Contents 1 Introduction 2 2 Algorithm overview 3 3 Input data 3 4 Preprocessing
More informationCISC 636 Computational Biology & Bioinformatics (Fall 2016)
CISC 636 Computational Biology & Bioinformatics (Fall 2016) Sequence pairwise alignment Score statistics: E-value and p-value Heuristic algorithms: BLAST and FASTA Database search: gene finding and annotations
More informationPairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 Luay Nakhleh, Rice University
1 Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 Luay Nakhleh, Rice University DP Algorithms for Pairwise Alignment 2 The number of all possible pairwise alignments (if gaps are allowed)
More informationPrediction of RNA secondary structure including kissing hairpin motifs
Prediction of RNA secondary structure including kissing hairpin motifs Corinna Theis, Stefan Janssen and Robert Giegerich Faculty of Technology & Center for Biotechnology Bielefeld University, Germany
More informationTransfer String Kernel for Cross-Context Sequence Specific DNA-Protein Binding Prediction. by Ritambhara Singh IIIT-Delhi June 10, 2016
Transfer String Kernel for Cross-Context Sequence Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 Biology in a Slide DNA RNA PROTEIN CELL ORGANISM 2 DNA and Diseases
More informationPaths, Flowers and Vertex Cover
Paths, Flowers and Vertex Cover Venkatesh Raman, M.S. Ramanujan, and Saket Saurabh Presenting: Hen Sender 1 Introduction 2 Abstract. It is well known that in a bipartite (and more generally in a Konig)
More informationExplanation for Tree isomorphism talk
Joint Advanced Student School Explanation for Tree isomorphism talk by Alexander Smal (avsmal@gmail.com) Saint-Petersburg, Russia 2008 Abstract In this talk we considered a problem of tree isomorphism.
More informationParsimony-Based Approaches to Inferring Phylogenetic Trees
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 www.biostat.wisc.edu/bmi576.html Mark Craven craven@biostat.wisc.edu Fall 0 Phylogenetic tree approaches! three general types! distance:
More informationSingle Pass, BLAST-like, Approximate String Matching on FPGAs*
Single Pass, BLAST-like, Approximate String Matching on FPGAs* Martin Herbordt Josh Model Yongfeng Gu Bharat Sukhwani Tom VanCourt Computer Architecture and Automated Design Laboratory Department of Electrical
More informationA more efficient algorithm for perfect sorting by reversals
A more efficient algorithm for perfect sorting by reversals Sèverine Bérard 1,2, Cedric Chauve 3,4, and Christophe Paul 5 1 Département de Mathématiques et d Informatique Appliquée, INRA, Toulouse, France.
More informationLocality-sensitive hashing and biological network alignment
Locality-sensitive hashing and biological network alignment Laura LeGault - University of Wisconsin, Madison 12 May 2008 Abstract Large biological networks contain much information about the functionality
More information