Computational Biology 6.095/6.895 Database Search Lecture 5 Prof. Piotr Indyk
|
|
- Dortha Davis
- 6 years ago
- Views:
Transcription
1 Computational Biology 6.095/6.895 Database Searh Leture 5 Prof. Piotr Indyk
2 Previous letures Leture -3: Global alignment in O(mn) Dynami programming Leture -2: Loal alignment, variants, in O(mn) Leture -1: Exat string mathing in O(n) Hashing: number mod q Quiz: What do these problems have in ommon? Answer: they enable omparison of two sequenes
3 The Big Piture Gene Finding DNA Sequene alignment Database lookup
4 Database searh Database searh: AIKWQPRSTW. Database IKMQRHIKW. HDLFWHLWH. Query: RGIKW Output: sequenes similar to query
5 What does similar mean? Simplest idea: just ount the number of ommon amino-aids E.g., RGRKW mathes RGIKW at 4 positions, or idper = 80% Not all mathes are reated equal - soring matrix In general,we should allow insertions and deletions as well
6 How to answer the query We ould just san the whole database But: Query must be very fast Most sequenes will be ompletely unrelated to query Individual alignment needs not be perfet. Can finetune Exploit nature of the problem If you re going to rejet any math with idper < 90%, then why bother even looking at sequenes whih don t have a fairly long streth of mathing a.a. in a row.
7 W-mer indexing IKW Preproessing: For every W-mer (e.g., IKZ W=3) store every loation in the database where it ours (an use hashing if W is large) Query: Generate W-mers and look them up in the database. Proess the results Running time benefit: RGIKW AIKWQPRSTW. IKMQRHIKW. HDLFWHLWH. For W=3, if the sequenes are IKW random, then roughly one W-mer in 23 3 will math, i.e., one in a ten... thousand We hit only a small fration of all sequenes
8 BLAST Speifi (and very effiient) implementation of the W-mer indexing idea How to generate W-mers from the query How to proess the mathes Basi loal alignment searh tool SF Altshul, W Gish, W Miller, EW Myers, DJ Lipman - J. Mol. Biol, inaoep.mx... Basi Loal Alignment Searh Tool Stephen F. Altshul', Warren Gish', Webb Miller2 Eugene W. Myers3 and David J. Lipmanl... Page S. F. Altshul et al.... Cited by 14181
9 THE BLAST SEARCH ALGORITHM Query word (W =3) Query: GSVEDTTGSQSLAALLNKCKTPQGQRLVNQWIKQPLMDKNRIEERLNLVEAFVEDAELRQTLQEDL Neighborhood words PQG PEG PRG PKG PNG PDG PHG PMG PSG PQA PQN Neighborhood sore threshold (T=13) et... X Query: 325 SLAALLNKCKTPQGQRLVNQWIKQPLMDKNRIEERLNLVEA 365 +LA++L+ TP G R++ +W+ P+ D + ER + A Sbjt: 290 TLASVLDCTVTPMGSRMLKRWLHMPVRDTRVLLERQQTIGA 330 High-soring Segment Pair (HSP) Figure by MIT OCW.
10 Blast Algorithm Overview Reeive query Split query into overlapping words of length W Find neighborhood words for eah word until threshold T Look into the table where these neighbor words our: seeds Extend seeds until sore drops off under X Evaluate statistial signifiane of sore Report sores and alignments PMG W-mer Database
11 Extending the seeds Query: Sbjt: 325 SLAALLNKCKTPQGQRLVNQWIKQPLMDKNRIEERLNLVEA 365 +LA++L+ TP G R++ +W+ P+ D + ER + A 290 TLASVLDCTVTPMGSRMLKRWLHMPVRDTRVLLERQQTIGA 330 High-soring Segment Pair ( HSP) Cumulative Sore Figure by MIT OCW. Break into two HSPs Extend until the umulative sore drops Extension
12 Length and Perent Identity
13 Why this works. I.e., what do we miss? In the worst ase W-mer: W=3 No neighborhoods Query: RKIWGDPRS Datab.: RKIVGDRRS 7 idential a.a
14 Pigeonhole priniple Pigeonhole priniple If you have 2 pigeons and 3 holes, there must be at least one hole with no pigeon
15 Pigeonhole and W-mers RKI WGD PRS RKI VGD RRS Pigeonholing mis-mathes Two sequenes, eah 9 amino-aids, with 7 identities There is a streth of 3 amino-aids perfetly onserved In general: Sequene length: n Identities: t Can use W-mers for W= [n/(n-t+1)]
16 True alignments: Looking for K-mers Personal experiment run in Kb region of human, and mouse 450Kb ortholog. Blasted every piee of mouse against human (6,50) Identify slope of best fit line.two sets of blast alignments. 320 olinear / 770 alignments Can ask the question: What makes a blast hit on the line look good. What makes a blast hit off the diagonal look bad. Count K-mers How many k-mers do we find: n How long are they: k Counted their distribution inside and outside the sequene.
17 True alignments: Looking for K-mers number of k-mers that happen for eah length of k-mer. Red islands ome from olinear alignments Blue islands ome from off-diagonal alignments Note: more than one data point per alignment. Linear plot Log Log plot
18 Extensions Ideas beyond W-mer indexing? Faster Better sensitivity (less false negatives)
19 Extensions: Filtering Low omplexity regions an ause spurious hits Filter out low omplexity in your query Filter most over-represented items in your database
20 Extensions: Two-hit blast Improves sensitivity for any speed Two smaller W-mers are more likely than one longer one Therefore it s a more sensitive searhing method to look for two hits instead of one, with the same speed. Improves speed for any sensitivity No need to extend a lot of the W-mers, when isolated
21 Extensions: beyond W-mers W-mers (without neighborhoods): RGIKW RGI, GIK, IKW No reason to use only onseutive symbols Instead, we ould use ombs, e.g., RGIKW R*IK*, RG**W, Indexing same as for W-mers: For eah omb, store the list of positions in the database where it ours Perform lookups to answer the query How to hoose the ombs? Randomized projetion: Califano-Rigoutsos 93, Buhler 01, Indyk-Motwani 98 Choose the positions of * at random Analyze false positives and false negatives
22 Combs and Random Projetions Assume we selet k positions, whih do not ontain *, at random with replaement What is the probability of a false negative? At most: 1-idper k In our ase: 1-(7/9) 4 = What is we repeat the proess l times, independently? Miss prob. = 0.63 l For l=5, it is less than 10% Query: RKIWGDPRS Datab.: RKIVGDRRS k=4 Query: *KI*G***S Datab.: *KI*G***S
23 Suffix trees Great tool for text proessing E.g., searhing for exat ourrene of a pattern Suffix tree for: xabxa a x b a x b x a a b x a
24 Suffix tree definition x a b x a a b x a b x a x a a 1 a x x b a a x 6 b b a x a 2 Definition: Suffix tree ST for text T[1..n] Rooted, direted tree T, n leaves, numbered 1..n Text labels on the edges Path to leaf i spells out the suffix S[i..], by onatenating edge labels Common prefixes share ommon paths, diverge to form internal nodes
25 Properties of suffix trees a x x b a a x 6 b b a x a 1 2 How muh spae do we need to represent a suffix tree of T[1..n]? Only O(n) At most O(n) edges Eah edge label an be represented as T[i j]
26 Exat string mathing with suffix trees Given the suffix tree for text T Searh for pattern P[1 m] For every harater in P, traverse T: xabxa the appropriate path of the tree, P: abx reading one harater eah time If P is not found in a path, P does not our in T x If P is found in its entirety, then all a ourrenes of P in T are exatly b the hildren of that node x Every hild orresponds to a x exatly one ourrene a Simply list eah of the leaf indies Time: O(m) 1 2 b a 4 5 b x 6 a 3
27 Suffix Tree Constrution x a b x a a b x a b x a x a a x a b x a a b x a b x a Running time: O(n 2 ) Can be improved to O(n)
28 Today Searh among many database sequenes W-mer indexing BLAST Combs and random projetions Suffix trees
6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationData Structures in Java
Data Strutures in Java Leture 8: Trees and Tree Traversals. 10/5/2015 Daniel Bauer 1 Trees in Computer Siene A lot of data omes in a hierarhial/nested struture. Mathematial expressions. Program struture.
More informationDynamic Algorithms Multiple Choice Test
3226 Dynami Algorithms Multiple Choie Test Sample test: only 8 questions 32 minutes (Real test has 30 questions 120 minutes) Årskort Name Eah of the following 8 questions has 4 possible answers of whih
More informationBioinformatics for Biologists
Bioinformatics for Biologists Sequence Analysis: Part I. Pairwise alignment and database searching Fran Lewitter, Ph.D. Director Bioinformatics & Research Computing Whitehead Institute Topics to Cover
More informationChapter 4: Blast. Chaochun Wei Fall 2014
Course organization Introduction ( Week 1-2) Course introduction A brief introduction to molecular biology A brief introduction to sequence comparison Part I: Algorithms for Sequence Analysis (Week 3-11)
More informationXML Data Streams. XML Stream Processing. XML Stream Processing. Yanlei Diao. University of Massachusetts Amherst
XML Stream Proessing Yanlei Diao University of Massahusetts Amherst XML Data Streams XML is the wire format for data exhanged online. Purhase orders http://www.oasis-open.org/ommittees/t_home.php?wg_abbrev=ubl
More information1 The Knuth-Morris-Pratt Algorithm
5-45/65: Design & Analysis of Algorithms September 26, 26 Leture #9: String Mathing last hanged: September 26, 27 There s an entire field dediated to solving problems on strings. The book Algorithms on
More informationAn Analysis of Pairwise Sequence Alignment Algorithm Complexities: Needleman-Wunsch, Smith-Waterman, FASTA, BLAST and Gapped BLAST
An Analysis of Pairwise Sequence Alignment Algorithm Complexities: Needleman-Wunsch, Smith-Waterman, FASTA, BLAST and Gapped BLAST Alexander Chan 5075504 Biochemistry 218 Final Project An Analysis of Pairwise
More informationAlgorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking
Algorithms for External Memory Leture 6 Graph Algorithms - Weighted List Ranking Leturer: Nodari Sithinava Sribe: Andi Hellmund, Simon Ohsenreither 1 Introdution & Motivation After talking about I/O-effiient
More informationAs of August 15, 2008, GenBank contained bases from reported sequences. The search procedure should be
48 Bioinformatics I, WS 09-10, S. Henz (script by D. Huson) November 26, 2009 4 BLAST and BLAT Outline of the chapter: 1. Heuristics for the pairwise local alignment of two sequences 2. BLAST: search and
More informationIncremental Mining of Partial Periodic Patterns in Time-series Databases
CERIAS Teh Report 2000-03 Inremental Mining of Partial Periodi Patterns in Time-series Dataases Mohamed G. Elfeky Center for Eduation and Researh in Information Assurane and Seurity Purdue University,
More information24 Grundlagen der Bioinformatik, SS 10, D. Huson, April 26, This lecture is based on the following papers, which are all recommended reading:
24 Grundlagen der Bioinformatik, SS 10, D. Huson, April 26, 2010 3 BLAST and FASTA This lecture is based on the following papers, which are all recommended reading: D.J. Lipman and W.R. Pearson, Rapid
More informationHEXA: Compact Data Structures for Faster Packet Processing
Washington University in St. Louis Washington University Open Sholarship All Computer Siene and Engineering Researh Computer Siene and Engineering Report Number: 27-26 27 HEXA: Compat Data Strutures for
More informationCS 284A: Algorithms for Computational Biology Notes on Lecture: BLAST. The statistics of alignment scores.
CS 284A: Algorithms for Computational Biology Notes on Lecture: BLAST. The statistics of alignment scores. prepared by Oleksii Kuchaiev, based on presentation by Xiaohui Xie on February 20th. 1 Introduction
More informationComputational Molecular Biology
Computational Molecular Biology Erwin M. Bakker Lecture 3, mainly from material by R. Shamir [2] and H.J. Hoogeboom [4]. 1 Pairwise Sequence Alignment Biological Motivation Algorithmic Aspect Recursive
More informationLearning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract
CS 9 Projet Final Report: Learning Convention Propagation in BeerAdvoate Reviews from a etwork Perspetive Abstrat We look at the way onventions propagate between reviews on the BeerAdvoate dataset, and
More informationL11 Balanced Trees. Alice E. Fischer. Fall Alice E. Fischer L11 Balanced Trees... 1/34 Fall / 34
L11 Balaned Trees Alie E. Fisher Fall 2018 Alie E. Fisher L11 Balaned Trees... 1/34 Fall 2018 1 / 34 Outline 1 AVL Trees 2 Red-Blak Trees Insertion Insertion 3 B-Trees Alie E. Fisher L11 Balaned Trees...
More informationBioinformatics explained: BLAST. March 8, 2007
Bioinformatics Explained Bioinformatics explained: BLAST March 8, 2007 CLC bio Gustav Wieds Vej 10 8000 Aarhus C Denmark Telephone: +45 70 22 55 09 Fax: +45 70 22 55 19 www.clcbio.com info@clcbio.com Bioinformatics
More informationQuery Evaluation Overview. Query Optimization: Chap. 15. Evaluation Example. Cost Estimation. Query Blocks. Query Blocks
Query Evaluation Overview Query Optimization: Chap. 15 CS634 Leture 12 SQL query first translated to relational algebra (RA) Atually, some additional operators needed for SQL Tree of RA operators, with
More informationExtracting Partition Statistics from Semistructured Data
Extrating Partition Statistis from Semistrutured Data John N. Wilson Rihard Gourlay Robert Japp Mathias Neumüller Department of Computer and Information Sienes University of Strathlyde, Glasgow, UK {jnw,rsg,rpj,mathias}@is.strath.a.uk
More informationDynamic Programming. Lecture #8 of Algorithms, Data structures and Complexity. Joost-Pieter Katoen Formal Methods and Tools Group
Dynami Programming Leture #8 of Algorithms, Data strutures and Complexity Joost-Pieter Katoen Formal Methods and Tools Group E-mail: katoen@s.utwente.nl Otober 29, 2002 JPK #8: Dynami Programming ADC (214020)
More informationType of document: Usebility Checklist
Projet: JEGraph Type of doument: Usebility Cheklist Author: Max Bryan Version: 1.30 2011 Envidate GmbH Type of Doumet Developer guidelines User guidelines Dutybook Speifiation Programming and testing Test
More informationInternational Journal of Advancements in Research & Technology, Volume 3, Issue 3, March-2014 ISSN
International Journal of Advanements in Researh & Tehnology, Volume 3, Issue 3, Marh-204 ISSN 2278-773 47 Phrase Based Doument Retrieving y Comining Suffix Tree index data struture and Boyer- Moore faster
More informationComparative Analysis of Protein Alignment Algorithms in Parallel environment using CUDA
Comparative Analysis of Protein Alignment Algorithms in Parallel environment using BLAST versus Smith-Waterman Shadman Fahim shadmanbracu09@gmail.com Shehabul Hossain rudrozzal@gmail.com Gulshan Jubaed
More informationHeuristic methods for pairwise alignment:
Bi03c_1 Unit 03c: Heuristic methods for pairwise alignment: k-tuple-methods k-tuple-methods for alignment of pairs of sequences Bi03c_2 dynamic programming is too slow for large databases Use heuristic
More informationRecursion examples: Problem 2. (More) Recursion and Lists. Tail recursion. Recursion examples: Problem 2. Recursion examples: Problem 3
Reursion eamples: Problem 2 (More) Reursion and s Reursive funtion to reverse a string publi String revstring(string str) { if(str.equals( )) return str; return revstring(str.substring(1, str.length()))
More informationON HEURISTIC METHODS IN NEXT-GENERATION SEQUENCING DATA ANALYSIS
ON HEURISTIC METHODS IN NEXT-GENERATION SEQUENCING DATA ANALYSIS Ivan Vogel Doctoral Degree Programme (1), FIT BUT E-mail: xvogel01@stud.fit.vutbr.cz Supervised by: Jaroslav Zendulka E-mail: zendulka@fit.vutbr.cz
More informationDatabase Searching Using BLAST
Mahidol University Objectives SCMI512 Molecular Sequence Analysis Database Searching Using BLAST Lecture 2B After class, students should be able to: explain the FASTA algorithm for database searching explain
More informationBioinformatics. Sequence alignment BLAST Significance. Next time Protein Structure
Bioinformatics Sequence alignment BLAST Significance Next time Protein Structure 1 Experimental origins of sequence data The Sanger dideoxynucleotide method F Each color is one lane of an electrophoresis
More informationOutline. CS38 Introduction to Algorithms. Administrative Stuff. Administrative Stuff. Motivation/Overview. Administrative Stuff
Outline CS38 Introdution to Algorithms Leture 1 April 1, 2014 administrative stuff motivation and overview of the ourse stale mathings example graphs, representing graphs graph traversals (BFS, DFS) onnetivity,
More informationA Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering
A Novel Bit Level Time Series Representation with Impliation of Similarity Searh and lustering hotirat Ratanamahatana, Eamonn Keogh, Anthony J. Bagnall 2, and Stefano Lonardi Dept. of omputer Siene & Engineering,
More informationParallelizing Frequent Web Access Pattern Mining with Partial Enumeration for High Speedup
Parallelizing Frequent Web Aess Pattern Mining with Partial Enumeration for High Peiyi Tang Markus P. Turkia Department of Computer Siene Department of Computer Siene University of Arkansas at Little Rok
More informationBLAST & Genome assembly
BLAST & Genome assembly Solon P. Pissis Tomáš Flouri Heidelberg Institute for Theoretical Studies May 15, 2014 1 BLAST What is BLAST? The algorithm 2 Genome assembly De novo assembly Mapping assembly 3
More informationCombinatorial Pattern Matching
Combinatorial Pattern Matching Outline Exact Pattern Matching Keyword Trees Suffix Trees Approximate String Matching Local alignment is to slow Quadratic local alignment is too slow while looking for similarities
More informationFrom Smith-Waterman to BLAST
From Smith-Waterman to BLAST Jeremy Buhler July 23, 2015 Smith-Waterman is the fundamental tool that we use to decide how similar two sequences are. Isn t that all that BLAST does? In principle, it is
More informationGray Codes for Reflectable Languages
Gray Codes for Refletable Languages Yue Li Joe Sawada Marh 8, 2008 Abstrat We lassify a type of language alled a refletable language. We then develop a generi algorithm that an be used to list all strings
More informationScoring and heuristic methods for sequence alignment CG 17
Scoring and heuristic methods for sequence alignment CG 17 Amino Acid Substitution Matrices Used to score alignments. Reflect evolution of sequences. Unitary Matrix: M ij = 1 i=j { 0 o/w Genetic Code Matrix:
More informationBLAST, Profile, and PSI-BLAST
BLAST, Profile, and PSI-BLAST Jianlin Cheng, PhD School of Electrical Engineering and Computer Science University of Central Florida 26 Free for academic use Copyright @ Jianlin Cheng & original sources
More informationRelevance for Computer Vision
The Geometry of ROC Spae: Understanding Mahine Learning Metris through ROC Isometris, by Peter A. Flah International Conferene on Mahine Learning (ICML-23) http://www.s.bris.a.uk/publiations/papers/74.pdf
More informationBLAST. Basic Local Alignment Search Tool. Used to quickly compare a protein or DNA sequence to a database.
BLAST Basic Local Alignment Search Tool Used to quickly compare a protein or DNA sequence to a database. There is no such thing as a free lunch BLAST is fast and highly sensitive compared to competitors.
More informationChapter 2: Introduction to Maple V
Chapter 2: Introdution to Maple V 2-1 Working with Maple Worksheets Try It! (p. 15) Start a Maple session with an empty worksheet. The name of the worksheet should be Untitled (1). Use one of the standard
More informationAlgorithms in Bioinformatics: A Practical Introduction. Database Search
Algorithms in Bioinformatics: A Practical Introduction Database Search Biological databases Biological data is double in size every 15 or 16 months Increasing in number of queries: 40,000 queries per day
More informationJyoti Lakhani 1, Ajay Khunteta 2, Dharmesh Harwani *3 1 Poornima University, Jaipur & Maharaja Ganga Singh University, Bikaner, Rajasthan, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 6 ISSN : 2456-3307 Improvisation of Global Pairwise Sequence Alignment
More informationCOS 551: Introduction to Computational Molecular Biology Lecture: Oct 17, 2000 Lecturer: Mona Singh Scribe: Jacob Brenner 1. Database Searching
COS 551: Introduction to Computational Molecular Biology Lecture: Oct 17, 2000 Lecturer: Mona Singh Scribe: Jacob Brenner 1 Database Searching In database search, we typically have a large sequence database
More informationComputational Biology
Compuaional Bioloy A.P. Gulyaev (Saha) Appliaion oriened view appliaion H.J. Hooeboom (Hendrik Jan) Theory oriened view www.lias.nl/home/hooeboo/mb/ hemes problem model (e. raph) known alorihms haraerizaion
More informationPathRings. Manual. Version 1.0. Yongnan Zhu December 16,
PathRings Version 1.0 Manual Yongnan Zhu E-mail: yongnan@umb.edu Deember 16, 2014-1 - PathRings This tutorial introdues PathRings, the user interfae and the interation. For better to learn, you will need
More informationOutline: Software Design
Outline: Software Design. Goals History of software design ideas Design priniples Design methods Life belt or leg iron? (Budgen) Copyright Nany Leveson, Sept. 1999 A Little History... At first, struggling
More informationL4: Blast: Alignment Scores etc.
L4: Blast: Alignment Scores etc. Why is Blast Fast? Silly Question Prove or Disprove: There are two people in New York City with exactly the same number of hairs. Large database search Database (n) Query
More informationGradient based progressive probabilistic Hough transform
Gradient based progressive probabilisti Hough transform C.Galambos, J.Kittler and J.Matas Abstrat: The authors look at the benefits of exploiting gradient information to enhane the progressive probabilisti
More informationDefinitions Homework. Quine McCluskey Optimal solutions are possible for some large functions Espresso heuristic. Definitions Homework
EECS 33 There be Dragons here http://ziyang.ees.northwestern.edu/ees33/ Teaher: Offie: Email: Phone: L477 Teh dikrp@northwestern.edu 847 467 2298 Today s material might at first appear diffiult Perhaps
More informationAn Event Display for ATLAS H8 Pixel Test Beam Data
An Event Display for ATLAS H8 Pixel Test Beam Data George Gollin Centre de Physique des Partiules de Marseille and University of Illinois April 17, 1999 g-gollin@uiu.edu An event display program is now
More informationCleanUp: Improving Quadrilateral Finite Element Meshes
CleanUp: Improving Quadrilateral Finite Element Meshes Paul Kinney MD-10 ECC P.O. Box 203 Ford Motor Company Dearborn, MI. 8121 (313) 28-1228 pkinney@ford.om Abstrat: Unless an all quadrilateral (quad)
More informationA Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks
A Partial Sorting Algorithm in Multi-Hop Wireless Sensor Networks Abouberine Ould Cheikhna Department of Computer Siene University of Piardie Jules Verne 80039 Amiens Frane Ould.heikhna.abouberine @u-piardie.fr
More informationAlignments BLAST, BLAT
Alignments BLAST, BLAT Genome Genome Gene vs Built of DNA DNA Describes Organism Protein gene Stored as Circular/ linear Single molecule, or a few of them Both (depending on the species) Part of genome
More informationLecture Overview. Sequence search & alignment. Searching sequence databases. Sequence Alignment & Search. Goals: Motivations:
Lecture Overview Sequence Alignment & Search Karin Verspoor, Ph.D. Faculty, Computational Bioscience Program University of Colorado School of Medicine With credit and thanks to Larry Hunter for creating
More informationarxiv: v1 [cs.db] 13 Sep 2017
An effiient lustering algorithm from the measure of loal Gaussian distribution Yuan-Yen Tai (Dated: May 27, 2018) In this paper, I will introdue a fast and novel lustering algorithm based on Gaussian distribution
More informationOn - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2
On - Line Path Delay Fault Testing of Omega MINs M. Bellos, E. Kalligeros, D. Nikolos,2 & H. T. Vergos,2 Dept. of Computer Engineering and Informatis 2 Computer Tehnology Institute University of Patras,
More informationSequence Alignment & Search
Sequence Alignment & Search Karin Verspoor, Ph.D. Faculty, Computational Bioscience Program University of Colorado School of Medicine With credit and thanks to Larry Hunter for creating the first version
More informationBLAST & Genome assembly
BLAST & Genome assembly Solon P. Pissis Tomáš Flouri Heidelberg Institute for Theoretical Studies November 17, 2012 1 Introduction Introduction 2 BLAST What is BLAST? The algorithm 3 Genome assembly De
More information1 Disjoint-set data structure.
CS 124 Setion #4 Union-Fin, Greey Algorithms 2/20/17 1 Disjoint-set ata struture. 1.1 Operations Disjoint-set ata struture enale us to effiiently perform operations suh as plaing elements into sets, querying
More information1. Introduction. 2. The Probable Stope Algorithm
1. Introdution Optimization in underground mine design has reeived less attention than that in open pit mines. This is mostly due to the diversity o underground mining methods and omplexity o underground
More informationOvidSP Quick Reference Card
OvidSP Quik Referene Card Searh in any of several dynami modes, ombine results, apply limits, use improved researh tools, develop strategies, save searhes, set automati alerts and RSS feeds, share results...
More informationSelf-Adaptive Parent to Mean-Centric Recombination for Real-Parameter Optimization
Self-Adaptive Parent to Mean-Centri Reombination for Real-Parameter Optimization Kalyanmoy Deb and Himanshu Jain Department of Mehanial Engineering Indian Institute of Tehnology Kanpur Kanpur, PIN 86 {deb,hjain}@iitk.a.in
More informationShort Read Alignment. Mapping Reads to a Reference
Short Read Alignment Mapping Reads to a Reference Brandi Cantarel, Ph.D. & Daehwan Kim, Ph.D. BICF 05/2018 Introduction to Mapping Short Read Aligners DNA vs RNA Alignment Quality Pitfalls and Improvements
More informationSimple strategy Computes shortest paths from the start note
Leture Notes for I500 Fall 2009 Fundamental Computer Conepts of Informatis by Xiaoqian Zhang (Instrutor: Predrag Radivoja) Last time: dynami programming Today: Elementary graph algorithm (textbook Chapter
More informationCalculation of typical running time of a branch-and-bound algorithm for the vertex-cover problem
Calulation of typial running time of a branh-and-bound algorithm for the vertex-over problem Joni Pajarinen, Joni.Pajarinen@iki.fi Otober 21, 2007 1 Introdution The vertex-over problem is one of a olletion
More informationLecture 14. Recursion
Leture 14 Reursion Announements for Today Prelim 1 Tonight at 5:15 OR 7:30 A D (5:15, Uris G01) E-K (5:15, Statler) L P (7:30, Uris G01) Q-Z (7:30, Statler) Graded y noon on Sun Sores will e in CMS In
More informationSequence Alignment Heuristics
Sequence Alignment Heuristics Some slides from: Iosif Vaisman, GMU mason.gmu.edu/~mmasso/binf630alignment.ppt Serafim Batzoglu, Stanford http://ai.stanford.edu/~serafim/ Geoffrey J. Barton, Oxford Protein
More informationMatLab Basics: Data type, Matrices, Graphics
MatLa Basis: Data type, Matries, Graphis 1 Plotting Data 0.8 0.6 0.4 os(t/10) 0.2 0-0.2-0.4-0.6 X: 78 Y: 0.05396-0.8-1 0 10 20 30 40 50 60 70 80 90 100 t Figure y MIT OCW. MatLa logial har NUMERIC ell
More informationMultiple-Criteria Decision Analysis: A Novel Rank Aggregation Method
3537 Multiple-Criteria Deision Analysis: A Novel Rank Aggregation Method Derya Yiltas-Kaplan Department of Computer Engineering, Istanbul University, 34320, Avilar, Istanbul, Turkey Email: dyiltas@ istanbul.edu.tr
More informationCS313 Exercise 4 Cover Page Fall 2017
CS313 Exercise 4 Cover Page Fall 2017 Due by the start of class on Thursday, October 12, 2017. Name(s): In the TIME column, please estimate the time you spent on the parts of this exercise. Please try
More informationChapter 4. Sequence Comparison
1896 1920 1987 2006 Chapter 4. Sequence Comparison 1 Contents 1. Sequence comparison 2. Sequence alignment 3. Sequence mapping Reading materials Required 1. A general method applicable to the search for
More informationCISC 636 Computational Biology & Bioinformatics (Fall 2016)
CISC 636 Computational Biology & Bioinformatics (Fall 2016) Sequence pairwise alignment Score statistics: E-value and p-value Heuristic algorithms: BLAST and FASTA Database search: gene finding and annotations
More informationOPEN MP-BASED PARALLEL AND SCALABLE GENETIC SEQUENCE ALIGNMENT
OPEN MP-BASED PARALLEL AND SCALABLE GENETIC SEQUENCE ALIGNMENT Asif Ali Khan*, Laiq Hassan*, Salim Ullah* ABSTRACT: In bioinformatics, sequence alignment is a common and insistent task. Biologists align
More informationAlgorithms, Mechanisms and Procedures for the Computer-aided Project Generation System
Algorithms, Mehanisms and Proedures for the Computer-aided Projet Generation System Anton O. Butko 1*, Aleksandr P. Briukhovetskii 2, Dmitry E. Grigoriev 2# and Konstantin S. Kalashnikov 3 1 Department
More informationA Dictionary based Efficient Text Compression Technique using Replacement Strategy
A based Effiient Text Compression Tehnique using Replaement Strategy Debashis Chakraborty Assistant Professor, Department of CSE, St. Thomas College of Engineering and Tehnology, Kolkata, 700023, India
More informationTriangles. Learning Objectives. Pre-Activity
Setion 3.2 Pre-tivity Preparation Triangles Geena needs to make sure that the dek she is building is perfetly square to the brae holding the dek in plae. How an she use geometry to ensure that the boards
More informationParadigm 5. Data Structure. Suffix trees. What is a suffix tree? Suffix tree. Simple applications. Simple applications. Algorithms
Prdigm. Dt Struture Known exmples: link tble, hep, Our leture: suffix tree Will involve mortize method tht will be stressed shortly in this ourse Suffix trees Wht is suffix tree? Simple pplitions History
More informationTackling IPv6 Address Scalability from the Root
Takling IPv6 Address Salability from the Root Mei Wang Ashish Goel Balaji Prabhakar Stanford University {wmei, ashishg, balaji}@stanford.edu ABSTRACT Internet address alloation shemes have a huge impat
More informationImplementing Load-Balanced Switches With Fat-Tree Networks
Implementing Load-Balaned Swithes With Fat-Tree Networks Hung-Shih Chueh, Ching-Min Lien, Cheng-Shang Chang, Jay Cheng, and Duan-Shin Lee Department of Eletrial Engineering & Institute of Communiations
More informationThe Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines
The Minimum Redundany Maximum Relevane Approah to Building Sparse Support Vetor Mahines Xiaoxing Yang, Ke Tang, and Xin Yao, Nature Inspired Computation and Appliations Laboratory (NICAL), Shool of Computer
More informationA DNA Index Structure Using Frequency and Position Information of Genetic Alphabet
A DNA Index Structure Using Frequency and Position Information of Genetic Alphabet Woo-Cheol Kim 1, Sanghyun Park 1, Jung-Im Won 1, Sang-Wook Kim 2, and Jee-Hee Yoon 3 1 Department of Computer Science,
More informationOptimizing Correlated Path Queries in XML Languages. Technical Report CS November 2002
Optimizing Correlated Path Queries in XML Languages Ning Zhang and M. Tamer Özsu Tehnial Report CS-2002-36 November 2002 Shool Of Computer Siene, University of Waterloo, {nzhang,tozsu}@uwaterloo.a 1 Abstrat
More informationApproximate logic synthesis for error tolerant applications
Approximate logi synthesis for error tolerant appliations Doohul Shin and Sandeep K. Gupta Eletrial Engineering Department, University of Southern California, Los Angeles, CA 989 {doohuls, sandeep}@us.edu
More informationAustralian Journal of Basic and Applied Sciences. A new Divide and Shuffle Based algorithm of Encryption for Text Message
ISSN:1991-8178 Australian Journal of Basi and Applied Sienes Journal home page: www.ajbasweb.om A new Divide and Shuffle Based algorithm of Enryption for Text Message Dr. S. Muthusundari R.M.D. Engineering
More informationAutomatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425)
Automati Physial Design Tuning: Workload as a Sequene Sanjay Agrawal Mirosoft Researh One Mirosoft Way Redmond, WA, USA +1-(425) 75-357 sagrawal@mirosoft.om Eri Chu * Computer Sienes Department University
More informationAn I/O device driver for bioinformatics tools: the case for BLAST
An I/O device driver for bioinformatics tools 563 An I/O device driver for bioinformatics tools: the case for BLAST Renato Campos Mauro and Sérgio Lifschitz Departamento de Informática PUC-RIO, Pontifícia
More informationFastA and the chaining problem, Gunnar Klau, December 1, 2005, 10:
FastA and the chaining problem, Gunnar Klau, December 1, 2005, 10:56 4001 4 FastA and the chaining problem We will discuss: Heuristics used by the FastA program for sequence alignment Chaining problem
More informationLAB 4: Operations on binary images Histograms and color tables
LAB 4: Operations on binary images Histograms an olor tables Computer Vision Laboratory Linköping University, Sween Preparations an start of the lab system You will fin a ouple of home exerises (marke
More informationMachine Vision. Laboratory Exercise Name: Student ID: S
Mahine Vision 521466S Laoratory Eerise 2011 Name: Student D: General nformation To pass these laoratory works, you should answer all questions (Q.y) with an understandale handwriting either in English
More informationPath Sharing and Predicate Evaluation for High-Performance XML Filtering*
Path Sharing and Prediate Evaluation for High-Performane XML Filtering Yanlei Diao, Mihael J. Franklin, Hao Zhang, Peter Fisher EECS, University of California, Berkeley {diaoyl, franklin, nhz, fisherp}@s.erkeley.edu
More informationThis fact makes it difficult to evaluate the cost function to be minimized
RSOURC LLOCTION N SSINMNT In the resoure alloation step the amount of resoures required to exeute the different types of proesses is determined. We will refer to the time interval during whih a proess
More informationCompilation Lecture 11a. Register Allocation Noam Rinetzky. Text book: Modern compiler implementation in C Andrew A.
Compilation 0368-3133 Leture 11a Text book: Modern ompiler implementation in C Andrew A. Appel Register Alloation Noam Rinetzky 1 Registers Dediated memory loations that an be aessed quikly, an have omputations
More informationLocality-Sensitive Hashing Scheme Based on p-stable Distributions
Loality-Sensitive Hashing Sheme Based on p-stable Distributions Mayur Datar Department of Computer Siene, Stanford University datar@s.stanford.edu Piotr Indyk Laboratory for Computer Siene, MIT indyk@theory.ls.mit.edu
More informationThe Happy Ending Problem
The Happy Ending Problem Neeldhara Misra STATUTORY WARNING This doument is a draft version 1 Introdution The Happy Ending problem first manifested itself on a typial wintery evening in 1933 These evenings
More informationtimestamp, if silhouette(x, y) 0 0 if silhouette(x, y) = 0, mhi(x, y) = and mhi(x, y) < timestamp - duration mhi(x, y), else
3rd International Conferene on Multimedia Tehnolog(ICMT 013) An Effiient Moving Target Traking Strateg Based on OpenCV and CAMShift Theor Dongu Li 1 Abstrat Image movement involved bakground movement and
More informationAcceleration of Algorithm of Smith-Waterman Using Recursive Variable Expansion.
www.ijarcet.org 54 Acceleration of Algorithm of Smith-Waterman Using Recursive Variable Expansion. Hassan Kehinde Bello and Kazeem Alagbe Gbolagade Abstract Biological sequence alignment is becoming popular
More informationPerformance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application
World Aademy of Siene, Engineering and Tehnology 8 009 Performane of Histogram-Based Skin Colour Segmentation for Arms Detetion in Human Motion Analysis Appliation Rosalyn R. Porle, Ali Chekima, Farrah
More informationBLAST - Basic Local Alignment Search Tool
Lecture for ic Bioinformatics (DD2450) April 11, 2013 Searching 1. Input: Query Sequence 2. Database of sequences 3. Subject Sequence(s) 4. Output: High Segment Pairs (HSPs) Sequence Similarity Measures:
More informationGrade 6. Mathematics. Student Booklet SPRING 2009 RELEASED ASSESSMENT QUESTIONS. Assessment of Reading, Writing and Mathematics, Junior Division
Grade 6 Assessment of Reading, Writing and Mathematis, Junior Division Student Booklet Mathematis SPRING 2009 RELEASED ASSESSMENT QUESTIONS Please note: The format of these booklets is slightly different
More information