Darwin-WGA. A Co-processor Provides Increased Sensitivity in Whole Genome Alignments with High Speedup
|
|
- Eleanor Riley
- 5 years ago
- Views:
Transcription
1 Darwin-WGA A Co-processor Provides Increased Sensitivity in Whole Genome Alignments with High Speedup Yatish Turakhia*, Sneha D. Goenka*, Prof. Gill Bejerano, Prof. William J. Dally * Equal contribution
2 What are whole genome alignments (WGA)? 2
3 WGA is correspondence between genomes For each segment of the target genome, corresponding segments are found in the query genome Match Deletion Insertion Mismatch Indel = Insertion + Deletion Query Chimpanzee genome Chr5 Target Human genome Chr5 Cabanettes et al.,
4 Why are whole genome alignments important? 4
5 WGA help predict functional elements Control logic for gene expression Enhancer gene Protein recipes that encode function Regions with sequence conservation (Mayor et al., 2000) 5
6 Classical alignment algorithm 6
7 Smith-Waterman Algorithm Inputs Target sequence (r) GTGTCACTA (L r = 9) Query sequence (q) GGCCAACTA (L q = 9) Scoring parameters Smith-waterman equations V i, j = max V i 1, j 1 + W(r i,q j ) V i 1, j + gap V i, j 1 + gap 0 Alignment GTGTC-A-CTA G-G-CCAACTA W = Gap penalty = 1 7
8 Smith Waterman algorithm intractable on whole genomes Smith Waterman algorithm time and space complexity ~ O(L r. L q ) Mammalian genomes ~ base-pairs Use heuristics based approaches seed-filter-extend 8
9 Seed-Filter-Extend algorithm (LASTZ) 9
10 Seeding finds small matching local patterns Query GTAGCGGGCGTCTGAC Target GTAGCAGTC GTAGCAGTA GAGCTGCAA Seed hits Seeding finds local matching patterns of fixed length s Substrings of query of length s are compared to the target Substrings start from position 0 Seed 10
11 Seeding finds small matching local patterns Query GTAGCGGGCGTCTGAC Target GTAGCAGTC GTAGCAGTA GAGCTGCAA Seed hits Seeding finds local matching patterns of fixed length s Substrings of query of length s are compared to the target Substrings start from position 0 Seed 11
12 Filtering aligns ~100bp around seed hits Target Calculate scores along the seed hit diagonal (match or mismatch) Query Track maximum score Seed hit Filter (Ungapped) Stop as soon as score falls below (max_score - x) Does not consider indels 12
13 Filtering aligns ~100bp around seed hits Target Calculate scores along the seed hit diagonal (match or mismatch) Query Track maximum score Seed hit Filter (Ungapped) Stop as soon as score falls below (max_score - x) Does not consider indels 13
14 Filtering aligns ~100bp around seed hits Target Calculate scores along the seed hit diagonal (match or mismatch) Query Track maximum score Seed hit Anchor Filter (Ungapped) Stop as soon as score falls below (max_score - x) Does not consider indels 14
15 Extension uses Y-drop algorithm Anchor Query Target Start computing score along a row when score > (max_score - y) Stop computing score along a row when score < (max_score - y) score < (max_score y) Extend 15
16 Extension provides the final alignments Target Query Final alignment = right extension + left extension Anchor Extend 16
17 Why is LASTZ less sensitive? 17
18 Increasing indel frequency => increasing need for gapped filtering -88 Mil yrs -6.4 Mil yrs indel per 625 bp 1 indel per 30 bp LASTZ s filter threshold Replacing ungapped filtering by gapped filtering slows down the software by 200x 18
19 Darwin-WGA algorithm overview 19
20 Seeding D-SOFT Target bin and Query chunk determine diagonal band Each seed hit falls in a single diagonal band At most 1 seed hit per diagonal band is extended Query Reference Target Chunk 1 Chunk 2 Chunk 3 20
21 Gapped Filtering Banded Smith Waterman Seed hit extended using banded Smith-Waterman Pre-determined band with no traceback If (max_score > threshold), the maximum score position (x max ) is the anchor or the starting point for alignment extension Gaps considered => better alignments for species further apart Query Target Seed hit! max 21
22 Extension - GACT-X algorithm Tiled (tile size T, overlap O) implementation inspired by GACT in Darwin* Origin of the next tile lies at the intersection of the current traceback path with the overlap Anchor Staring point for next tile T1 Drop these pointers Outside tile overlap Inside tile overlap On tile overlap border * Turakia et al., ASPLOS 18 22
23 ... Extension - GACT-X algorithm Extension along a direction continues until a tile is encountered with a non-positive maximum score Anchor T1 Right extension T2 Outside tile overlap Inside tile overlap On tile overlap border 23
24 ... Extension - GACT-X algorithm Final alignment combines left and right extension Staring point for next tile... T1 Anchor Left extension Right extension T1 T2 Outside tile overlap Inside tile overlap On tile overlap border 24
25 Extension - GACT-X algorithm Y-drop implementation within each tile Target Adaptive band with traceback Reduces on-chip memory requirement compared to computing whole tile Reduces compute time Anchor T1 Query Outside tile overlap Inside tile overlap On tile overlap border * Turakhia et al., ASPLOS 18 25
26 Workloads in LASTZ v/s Darwin-WGA LASTZ Seeding 13B seed hits Ungapped filtering 300k anchors Extend 150k alignments Darwin-WGA Seeding (D-SOFT) 13B seed hits Gapped filtering 1.2M anchors Extend 700k alignments 26
27 Whole genome alignment v/s Read alignment 27
28 1. WGA requires aligns less similar sequences Genomes may diverge considerably over evolutionary timescales and have low sequence similarity Read alignment deals with highly similar sequences (wellcharacterized sequencing error model) % alignments % similarity in read alignments v/s whole genome alignments % similarity Read alignments Whole genome alignments 28
29 2. WGA has longer alignments with large indels Whole genome alignments can span millions of basepairs with large indels Read alignments span not more than tens of thousands of base-pairs with much shorter indels Previous hardware accelerators would require high on-chip memory percentile alignment lengths in read alignments v/s whole genome alignments 1,000 10, ,000 1,000,000 alignment lengths WGA alignment Read alignment
30 Darwin-WGA Framework Seeding done in software Banded Smith-Waterman and GACT-X accelerated in hardware as bounded Dynamic Programming with Systolic Arrays 30
31 Hardware Acceleration 31
32 Accelerating bounded Dynamic Programming with Systolic Arrays 32
33 Accelerating bounded Dynamic Programming with Systolic Arrays 33
34 Accelerating bounded Dynamic Programming with Systolic Arrays 34
35 Accelerating bounded Dynamic Programming with Systolic Arrays 35
36 Accelerating bounded Dynamic Programming with Systolic Arrays Banded Smith-Waterman - preset band and no traceback GACT-X - adaptive band with traceback 36
37 Evaluation Framework 37
38 Experimental Setup CPU (baseline) AWS c4.8xlarge instance 36 vcpus (18 physical cores) LASTZ as software baseline Parasail to estimate isosensitive runtime $1.59/hour FPGA (Darwin-WGA) AWS f1.2xlarge instance 1 Xilinx Virtex Ultrascale+ FPGA (50 BSW and 2 GACT- X arrays with 32PEs) 8 vcpus $1.65/hour 38
39 ASIC TSMC 40nm DC synthesis (not a chip prototype) Configuration Area (mm 2 ) Power (W) BSW Logic 64 x (64PE array) GACT-X Logic 12 x (64PE array) Traceback SRAM 12 x (64PE x 16KB/PE) DRAM DDR4-2400R 4 x 32GB TOTAL Power (in W) CPU FPGA ASIC 39
40 Species and Genome Assembly Target Species C. elegans (ce11) D. melanogaster (dm6) Size (Mbp) Query Species 100 C. briggsae (cb4) 137 D. simulans (drosim1) D. Yakuba (droyak2) D. pseudoobscura (dp4) Size (Mbp) C. elegans 0.51 C. Briggsae Roundworm family D. melanogaster 0.06 D. simulans 0.05 D. Yakuba 0.11 D. pseudoobscura Fruit fly family dm6-drosim1 molecular distance comparable to human-monkey dm6-dp4 molecular distance comparable to human-chicken 40
41 Results 41
42 Darwin-WGA finds genes that LASTZ does not Seed hit 1 Seed hit 2 Seed hit 3 Indels (shown by arrows) around each seed hit dropped by ungapped filtering (LASTZ) but retained by gapped filtering (Darwin-WGA) 42
43 Darwin-WGA Sensitivity Improvement Versus LASTZ Species pair Top-10 Alignment Chain Scores Matching Base-pairs within Alignments Number of Aligning Exons (protein-coding genes) dm6-drosim % 1.25x +0.20% dm6-droyak % 1.41x +0.09% dm6-dp % 1.42x +0.41% ce11-cb % 3.12x +2.70% Represent orthologous sequences (derived from speciation ) Represent paralagous sequences (derived from duplication ) False positive rate (2-mer shuffled genome): % Represent functionally relevant orthologous sequences, under some selective pressure (at least in the target species) 43
44 Runtime and Cost Comparison Species pair LASTZ runtime (sec) Isosensitive s/w runtime (sec) Darwin-WGA runtime (sec) FPGA ASIC FPGA (Perf/$) Darwin-WGA Improvement ASIC (Perf/W) ce11-cb ,960 3, x 1,478x dm6- drosim1 dm6- droyak ,627 5, x 1,547x ,454 6, x 1,539x dm6-dp ,700 4, x 1,553x 200x slowdown 22x speedup 306x speedup 44
45 GACT-X uses 3x less space and time as compared to GACT 45
46 Summary Darwin-WGA replaces ungapped filtering in LASTZ by Banded Smith- Waterman algorithm for higher sensitivity up to 3x matching base-pairs up to 5.7% more orthologs up to 2.1% more exons Darwin-WGA outperforms iso-sensitive software FPGA: 24x performance/$ improvement ASIC: 1,500x performance/watt improvement GACT-X provides 3x improvement in speed and storage efficiency compared to GACT 46
47 Thank You! 47
Darwin: A Genomic Co-processor gives up to 15,000X speedup on long read assembly (To appear in ASPLOS 2018)
Darwin: A Genomic Co-processor gives up to 15,000X speedup on long read assembly (To appear in ASPLOS 2018) Yatish Turakhia EE PhD candidate Stanford University Prof. Bill Dally (Electrical Engineering
More informationDarwin: A Hardware-acceleration Framework for Genomic Sequence Alignment
Darwin: A Hardware-acceleration Framework for Genomic Sequence Alignment Yatish Turakhia EE PhD candidate Stanford University Prof. Bill Dally (Electrical Engineering and Computer Science) Prof. Gill Bejerano
More informationBiology 644: Bioinformatics
Find the best alignment between 2 sequences with lengths n and m, respectively Best alignment is very dependent upon the substitution matrix and gap penalties The Global Alignment Problem tries to find
More informationLAGAN and Multi-LAGAN: Efficient Tools for Large-Scale Multiple Alignment of Genomic DNA
LAGAN and Multi-LAGAN: Efficient Tools for Large-Scale Multiple Alignment of Genomic DNA Michael Brudno, Chuong B. Do, Gregory M. Cooper, et al. Presented by Xuebei Yang About Alignments Pairwise Alignments
More informationAs of August 15, 2008, GenBank contained bases from reported sequences. The search procedure should be
48 Bioinformatics I, WS 09-10, S. Henz (script by D. Huson) November 26, 2009 4 BLAST and BLAT Outline of the chapter: 1. Heuristics for the pairwise local alignment of two sequences 2. BLAST: search and
More informationOPEN MP-BASED PARALLEL AND SCALABLE GENETIC SEQUENCE ALIGNMENT
OPEN MP-BASED PARALLEL AND SCALABLE GENETIC SEQUENCE ALIGNMENT Asif Ali Khan*, Laiq Hassan*, Salim Ullah* ABSTRACT: In bioinformatics, sequence alignment is a common and insistent task. Biologists align
More informationPacific Symposium on Biocomputing 13: (2008) PASH 2.0: SCALEABLE SEQUENCE ANCHORING FOR NEXT-GENERATION SEQUENCING TECHNOLOGIES
PASH 2.0: SCALEABLE SEQUENCE ANCHORING FOR NEXT-GENERATION SEQUENCING TECHNOLOGIES CRISTIAN COARFA Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine,
More information24 Grundlagen der Bioinformatik, SS 10, D. Huson, April 26, This lecture is based on the following papers, which are all recommended reading:
24 Grundlagen der Bioinformatik, SS 10, D. Huson, April 26, 2010 3 BLAST and FASTA This lecture is based on the following papers, which are all recommended reading: D.J. Lipman and W.R. Pearson, Rapid
More informationFrom Smith-Waterman to BLAST
From Smith-Waterman to BLAST Jeremy Buhler July 23, 2015 Smith-Waterman is the fundamental tool that we use to decide how similar two sequences are. Isn t that all that BLAST does? In principle, it is
More informationBrief review from last class
Sequence Alignment Brief review from last class DNA is has direction, we will use only one (5 -> 3 ) and generate the opposite strand as needed. DNA is a 3D object (see lecture 1) but we will model it
More informationLecture Overview. Sequence search & alignment. Searching sequence databases. Sequence Alignment & Search. Goals: Motivations:
Lecture Overview Sequence Alignment & Search Karin Verspoor, Ph.D. Faculty, Computational Bioscience Program University of Colorado School of Medicine With credit and thanks to Larry Hunter for creating
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 04: Variations of sequence alignments http://www.pitt.edu/~mcs2/teaching/biocomp/tutorials/global.html Slides adapted from Dr. Shaojie Zhang (University
More informationSequence Alignment & Search
Sequence Alignment & Search Karin Verspoor, Ph.D. Faculty, Computational Bioscience Program University of Colorado School of Medicine With credit and thanks to Larry Hunter for creating the first version
More informationWilson Leung 01/03/2018 An Introduction to NCBI BLAST. Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment
An Introduction to NCBI BLAST Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment Resources: The BLAST web server is available at https://blast.ncbi.nlm.nih.gov/blast.cgi
More informationSequence comparison: Local alignment
Sequence comparison: Local alignment Genome 559: Introuction to Statistical an Computational Genomics Prof. James H. Thomas http://faculty.washington.eu/jht/gs559_217/ Review global alignment en traceback
More informationWilson Leung 05/27/2008 A Simple Introduction to NCBI BLAST
A Simple Introduction to NCBI BLAST Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment Resources: The BLAST web server is available at http://www.ncbi.nih.gov/blast/
More informationDynamic Programming User Manual v1.0 Anton E. Weisstein, Truman State University Aug. 19, 2014
Dynamic Programming User Manual v1.0 Anton E. Weisstein, Truman State University Aug. 19, 2014 Dynamic programming is a group of mathematical methods used to sequentially split a complicated problem into
More informationBurrows-Wheeler Short Read Aligner on AWS EC2 F1 Instances
University of Virginia High-Performance Low-Power Lab Prof. Dr. Mircea Stan Burrows-Wheeler Short Read Aligner on AWS EC2 F1 Instances Smith-Waterman Extension on FPGA(s) Sergiu Mosanu, Kevin Skadron and
More informationTCCAGGTG-GAT TGCAAGTGCG-T. Local Sequence Alignment & Heuristic Local Aligners. Review: Probabilistic Interpretation. Chance or true homology?
Local Sequence Alignment & Heuristic Local Aligners Lectures 18 Nov 28, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall
More informationPerformance Comparison between Linear RVE and Linear Systolic Array Implementations of the Smith-Waterman Algorithm
Performance Comparison between Linear RVE and Linear Systolic Array Implementations of the Smith-Waterman Algorithm Laiq Hasan Zaid Al-Ars Delft University of Technology Computer Engineering Laboratory
More informationBLAST & Genome assembly
BLAST & Genome assembly Solon P. Pissis Tomáš Flouri Heidelberg Institute for Theoretical Studies November 17, 2012 1 Introduction Introduction 2 BLAST What is BLAST? The algorithm 3 Genome assembly De
More informationA BANDED SMITH-WATERMAN FPGA ACCELERATOR FOR MERCURY BLASTP
A BANDED SITH-WATERAN FPGA ACCELERATOR FOR ERCURY BLASTP Brandon Harris*, Arpith C. Jacob*, Joseph. Lancaster*, Jeremy Buhler*, Roger D. Chamberlain* *Dept. of Computer Science and Engineering, Washington
More informationSequence analysis Pairwise sequence alignment
UMF11 Introduction to bioinformatics, 25 Sequence analysis Pairwise sequence alignment 1. Sequence alignment Lecturer: Marina lexandersson 12 September, 25 here are two types of sequence alignments, global
More informationSingle Pass, BLAST-like, Approximate String Matching on FPGAs*
Single Pass, BLAST-like, Approximate String Matching on FPGAs* Martin Herbordt Josh Model Yongfeng Gu Bharat Sukhwani Tom VanCourt Computer Architecture and Automated Design Laboratory Department of Electrical
More informationBLAST, Profile, and PSI-BLAST
BLAST, Profile, and PSI-BLAST Jianlin Cheng, PhD School of Electrical Engineering and Computer Science University of Central Florida 26 Free for academic use Copyright @ Jianlin Cheng & original sources
More informationGSNAP: Fast and SNP-tolerant detection of complex variants and splicing in short reads by Thomas D. Wu and Serban Nacu
GSNAP: Fast and SNP-tolerant detection of complex variants and splicing in short reads by Thomas D. Wu and Serban Nacu Matt Huska Freie Universität Berlin Computational Methods for High-Throughput Omics
More informationAcceleration of Ungapped Extension in Mercury BLAST. Joseph Lancaster Jeremy Buhler Roger Chamberlain
Acceleration of Ungapped Extension in Mercury BLAST Joseph Lancaster Jeremy Buhler Roger Chamberlain Joseph Lancaster, Jeremy Buhler, and Roger Chamberlain, Acceleration of Ungapped Extension in Mercury
More informationUnder the Hood of Alignment Algorithms for NGS Researchers
Under the Hood of Alignment Algorithms for NGS Researchers April 16, 2014 Gabe Rudy VP of Product Development Golden Helix Questions during the presentation Use the Questions pane in your GoToWebinar window
More informationDesign and Evaluation of a BLAST Ungapped Extension Accelerator, Master's Thesis
Washington University in St. Louis Washington University Open Scholarship All Computer Science and Engineering Research Computer Science and Engineering Report Number: WUCSE-2006-21 2006-01-01 Design and
More informationAlignment of Long Sequences
Alignment of Long Sequences BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2009 Mark Craven craven@biostat.wisc.edu Pairwise Whole Genome Alignment: Task Definition Given a pair of genomes (or other large-scale
More informationPPI Network Alignment Advanced Topics in Computa8onal Genomics
PPI Network Alignment 02-715 Advanced Topics in Computa8onal Genomics PPI Network Alignment Compara8ve analysis of PPI networks across different species by aligning the PPI networks Find func8onal orthologs
More informationSequence Comparison: Dynamic Programming. Genome 373 Genomic Informatics Elhanan Borenstein
Sequence omparison: Dynamic Programming Genome 373 Genomic Informatics Elhanan Borenstein quick review: hallenges Find the best global alignment of two sequences Find the best global alignment of multiple
More informationL4: Blast: Alignment Scores etc.
L4: Blast: Alignment Scores etc. Why is Blast Fast? Silly Question Prove or Disprove: There are two people in New York City with exactly the same number of hairs. Large database search Database (n) Query
More informationAccelerating the Prediction of Protein Interactions
Accelerating the Prediction of Protein Interactions Alex Rodionov, Jonathan Rose, Elisabeth R.M. Tillier, Alexandr Bezginov October 21 21 Motivation The human genome is sequenced, but we don't know what
More informationAn Analysis of Pairwise Sequence Alignment Algorithm Complexities: Needleman-Wunsch, Smith-Waterman, FASTA, BLAST and Gapped BLAST
An Analysis of Pairwise Sequence Alignment Algorithm Complexities: Needleman-Wunsch, Smith-Waterman, FASTA, BLAST and Gapped BLAST Alexander Chan 5075504 Biochemistry 218 Final Project An Analysis of Pairwise
More informationCUDAlign 4.0: Incremental Speculative Traceback for Exact Chromosome-Wide Alignment in GPU Clusters
216 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising
More informationBLAST & Genome assembly
BLAST & Genome assembly Solon P. Pissis Tomáš Flouri Heidelberg Institute for Theoretical Studies May 15, 2014 1 BLAST What is BLAST? The algorithm 2 Genome assembly De novo assembly Mapping assembly 3
More informationAlgorithms and Tools for Bioinformatics on GPUs. Bertil SCHMIDT
Algorithms and Tools for Bioinformatics on GPUs Bertil SCHMIDT Contents Motivation Pairwise Sequence Alignment Multiple Sequence Alignment Short Read Error Correction using CUDA Some other CUDA-enabled
More informationBLAST. Basic Local Alignment Search Tool. Used to quickly compare a protein or DNA sequence to a database.
BLAST Basic Local Alignment Search Tool Used to quickly compare a protein or DNA sequence to a database. There is no such thing as a free lunch BLAST is fast and highly sensitive compared to competitors.
More informationSequence Alignment. part 2
Sequence Alignment part 2 Dynamic programming with more realistic scoring scheme Using the same initial sequences, we ll look at a dynamic programming example with a scoring scheme that selects for matches
More informationAccurate Long-Read Alignment using Similarity Based Multiple Pattern Alignment and Prefix Tree Indexing
Proposal for diploma thesis Accurate Long-Read Alignment using Similarity Based Multiple Pattern Alignment and Prefix Tree Indexing Astrid Rheinländer 01-09-2010 Supervisor: Prof. Dr. Ulf Leser Motivation
More informationAccelerating Genomic Sequence Alignment Workload with Scalable Vector Architecture
Accelerating Genomic Sequence Alignment Workload with Scalable Vector Architecture Dong-hyeon Park, Jon Beaumont, Trevor Mudge University of Michigan, Ann Arbor Genomics Past Weeks ~$3 billion Human Genome
More informationAligners. J Fass 21 June 2017
Aligners J Fass 21 June 2017 Definitions Assembly: I ve found the shredded remains of an important document; put it back together! UC Davis Genome Center Bioinformatics Core J Fass Aligners 2017-06-21
More informationDynamic Programming (cont d) CS 466 Saurabh Sinha
Dynamic Programming (cont d) CS 466 Saurabh Sinha Spliced Alignment Begins by selecting either all putative exons between potential acceptor and donor sites or by finding all substrings similar to the
More informationSequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p Multiple alignment
Sequence lignment (chapter 6) p The biological problem p lobal alignment p Local alignment p Multiple alignment Local alignment: rationale p Otherwise dissimilar proteins may have local regions of similarity
More informationToday s Lecture. Edit graph & alignment algorithms. Local vs global Computational complexity of pairwise alignment Multiple sequence alignment
Today s Lecture Edit graph & alignment algorithms Smith-Waterman algorithm Needleman-Wunsch algorithm Local vs global Computational complexity of pairwise alignment Multiple sequence alignment 1 Sequence
More informationHeterogeneous Hardware/Software Acceleration of the BWA-MEM DNA Alignment Algorithm
Heterogeneous Hardware/Software Acceleration of the BWA-MEM DNA Alignment Algorithm Nauman Ahmed, Vlad-Mihai Sima, Ernst Houtgast, Koen Bertels and Zaid Al-Ars Computer Engineering Lab, Delft University
More informationAlignment ABC. Most slides are modified from Serafim s lectures
Alignment ABC Most slides are modified from Serafim s lectures Complete genomes Evolution Evolution at the DNA level C ACGGTGCAGTCACCA ACGTTGCAGTCCACCA SEQUENCE EDITS REARRANGEMENTS Sequence conservation
More informationAccelerating Smith Waterman (SW) Algorithm on Altera Cyclone II Field Programmable Gate Array
Accelerating Smith Waterman (SW) Algorithm on Altera yclone II Field Programmable Gate Array NUR DALILAH AHMAD SABRI, NUR FARAH AIN SALIMAN, SYED ABDUL MUALIB AL JUNID, ABDUL KARIMI HALIM Faculty Electrical
More informationMapping Reads to Reference Genome
Mapping Reads to Reference Genome DNA carries genetic information DNA is a double helix of two complementary strands formed by four nucleotides (bases): Adenine, Cytosine, Guanine and Thymine 2 of 31 Gene
More informationComparative Analysis of Protein Alignment Algorithms in Parallel environment using CUDA
Comparative Analysis of Protein Alignment Algorithms in Parallel environment using BLAST versus Smith-Waterman Shadman Fahim shadmanbracu09@gmail.com Shehabul Hossain rudrozzal@gmail.com Gulshan Jubaed
More informationIntroduction to Computational Molecular Biology
18.417 Introduction to Computational Molecular Biology Lecture 13: October 21, 2004 Scribe: Eitan Reich Lecturer: Ross Lippert Editor: Peter Lee 13.1 Introduction We have been looking at algorithms to
More informationUSING AN EXTENDED SUFFIX TREE TO SPEED-UP SEQUENCE ALIGNMENT
IADIS International Conference Applied Computing 2006 USING AN EXTENDED SUFFIX TREE TO SPEED-UP SEQUENCE ALIGNMENT Divya R. Singh Software Engineer Microsoft Corporation, Redmond, WA 98052, USA Abdullah
More informationPairwise alignment II
Pairwise alignment II Agenda - Previous Lesson: Minhala + Introduction - Review Dynamic Programming - Pariwise Alignment Biological Motivation Today: - Quick Review: Sequence Alignment (Global, Local,
More informationBGGN 213 Foundations of Bioinformatics Barry Grant
BGGN 213 Foundations of Bioinformatics Barry Grant http://thegrantlab.org/bggn213 Recap From Last Time: 25 Responses: https://tinyurl.com/bggn213-02-f17 Why ALIGNMENT FOUNDATIONS Why compare biological
More informationMouse, Human, Chimpanzee
More Alignments 1 Mouse, Human, Chimpanzee Mouse to Human Chimpanzee to Human 2 Mouse v.s. Human Chromosome X of Mouse to Human 3 Local Alignment Given: two sequences S and T Find: substrings of S and
More informationScalable Accelerator Architecture for Local Alignment of DNA Sequences
Scalable Accelerator Architecture for Local Alignment of DNA Sequences Nuno Sebastião, Nuno Roma, Paulo Flores INESC-ID / IST-TU Lisbon Rua Alves Redol, 9, Lisboa PORTUGAL {Nuno.Sebastiao, Nuno.Roma, Paulo.Flores}
More informationShort Read Alignment. Mapping Reads to a Reference
Short Read Alignment Mapping Reads to a Reference Brandi Cantarel, Ph.D. & Daehwan Kim, Ph.D. BICF 05/2018 Introduction to Mapping Short Read Aligners DNA vs RNA Alignment Quality Pitfalls and Improvements
More informationBioinformatics explained: Smith-Waterman
Bioinformatics Explained Bioinformatics explained: Smith-Waterman May 1, 2007 CLC bio Gustav Wieds Vej 10 8000 Aarhus C Denmark Telephone: +45 70 22 55 09 Fax: +45 70 22 55 19 www.clcbio.com info@clcbio.com
More informationScoring and heuristic methods for sequence alignment CG 17
Scoring and heuristic methods for sequence alignment CG 17 Amino Acid Substitution Matrices Used to score alignments. Reflect evolution of sequences. Unitary Matrix: M ij = 1 i=j { 0 o/w Genetic Code Matrix:
More informationWITH the advent of the Next-Generation Sequencing
1262 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 7, JULY 2012 Integrated Hardware Architecture for Efficient Computation of the n-best Bio-Sequence Local Alignments in
More informationNear Memory Key/Value Lookup Acceleration MemSys 2017
Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy
More informationFASTA. Besides that, FASTA package provides SSEARCH, an implementation of the optimal Smith- Waterman algorithm.
FASTA INTRODUCTION Definition (by David J. Lipman and William R. Pearson in 1985) - Compares a sequence of protein to another sequence or database of a protein, or a sequence of DNA to another sequence
More informationCompares a sequence of protein to another sequence or database of a protein, or a sequence of DNA to another sequence or library of DNA.
Compares a sequence of protein to another sequence or database of a protein, or a sequence of DNA to another sequence or library of DNA. Fasta is used to compare a protein or DNA sequence to all of the
More informationComputational Molecular Biology
Computational Molecular Biology Erwin M. Bakker Lecture 3, mainly from material by R. Shamir [2] and H.J. Hoogeboom [4]. 1 Pairwise Sequence Alignment Biological Motivation Algorithmic Aspect Recursive
More informationConfigurable and scalable class of high performance hardware accelerators for simultaneous DNA sequence alignment
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2013; 25:1319 1339 Published online 12 October 2012 in Wiley Online Library (wileyonlinelibrary.com)..2934 SPECIAL
More informationLong Read RNA-seq Mapper
UNIVERSITY OF ZAGREB FACULTY OF ELECTRICAL ENGENEERING AND COMPUTING MASTER THESIS no. 1005 Long Read RNA-seq Mapper Josip Marić Zagreb, February 2015. Table of Contents 1. Introduction... 1 2. RNA Sequencing...
More informationGenome 373: Mapping Short Sequence Reads I. Doug Fowler
Genome 373: Mapping Short Sequence Reads I Doug Fowler Two different strategies for parallel amplification BRIDGE PCR EMULSION PCR Two different strategies for parallel amplification BRIDGE PCR EMULSION
More informationBIOL591: Introduction to Bioinformatics Alignment of pairs of sequences
BIOL591: Introduction to Bioinformatics Alignment of pairs of sequences Reading in text (Mount Bioinformatics): I must confess that the treatment in Mount of sequence alignment does not seem to me a model
More informationBLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio. 1990. CS 466 Saurabh Sinha Motivation Sequence homology to a known protein suggest function of newly sequenced protein Bioinformatics
More informationall M 2M_gt_15 2M_8_15 2M_1_7 gt_2m TopHat2
Pairs processed per second 6, 4, 2, 6, 4, 2, 6, 4, 2, 6, 4, 2, 6, 4, 2, 6, 4, 2, 72,318 418 1,666 49,495 21,123 69,984 35,694 1,9 71,538 3,5 17,381 61,223 69,39 55 19,579 44,79 65,126 96 5,115 33,6 61,787
More informationHIGH LEVEL SYNTHESIS OF SMITH-WATERMAN DATAFLOW IMPLEMENTATIONS
HIGH LEVEL SYNTHESIS OF SMITH-WATERMAN DATAFLOW IMPLEMENTATIONS S. Casale-Brunet 1, E. Bezati 1, M. Mattavelli 2 1 Swiss Institute of Bioinformatics, Lausanne, Switzerland 2 École Polytechnique Fédérale
More informationBioinformatics for Biologists
Bioinformatics for Biologists Sequence Analysis: Part I. Pairwise alignment and database searching Fran Lewitter, Ph.D. Director Bioinformatics & Research Computing Whitehead Institute Topics to Cover
More informationAccelerating Next Generation Genome Reassembly in FPGAs: Alignment Using Dynamic Programming Algorithms
Accelerating Next Generation Genome Reassembly in FPGAs: Alignment Using Dynamic Programming Algorithms Maria Kim A thesis submitted in partial fulfillment of the requirements for the degree of Master
More informationSEQUENCE alignment is one of the most widely used operations
A parallel FPGA design of the Smith-Waterman traceback Zubair Nawaz #1, Muhammad Nadeem #2, Hans van Someren 3, Koen Bertels #4 # Computer Engineering Lab, Delft University of Technology The Netherlands
More informationSimilarity Searches on Sequence Databases
Similarity Searches on Sequence Databases Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Zürich, October 2004 Swiss Institute of Bioinformatics Swiss EMBnet node Outline Importance of
More informationGPUBwa -Parallelization of Burrows Wheeler Aligner using Graphical Processing Units
GPUBwa -Parallelization of Burrows Wheeler Aligner using Graphical Processing Units Abstract A very popular discipline in bioinformatics is Next-Generation Sequencing (NGS) or DNA sequencing. It specifies
More informationCOS 551: Introduction to Computational Molecular Biology Lecture: Oct 17, 2000 Lecturer: Mona Singh Scribe: Jacob Brenner 1. Database Searching
COS 551: Introduction to Computational Molecular Biology Lecture: Oct 17, 2000 Lecturer: Mona Singh Scribe: Jacob Brenner 1 Database Searching In database search, we typically have a large sequence database
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2015 1 Sequence Alignment Dannie Durand Pairwise Sequence Alignment The goal of pairwise sequence alignment is to establish a correspondence between the
More informationAlgorithms in Bioinformatics: A Practical Introduction. Database Search
Algorithms in Bioinformatics: A Practical Introduction Database Search Biological databases Biological data is double in size every 15 or 16 months Increasing in number of queries: 40,000 queries per day
More informationCISC 636 Computational Biology & Bioinformatics (Fall 2016)
CISC 636 Computational Biology & Bioinformatics (Fall 2016) Sequence pairwise alignment Score statistics: E-value and p-value Heuristic algorithms: BLAST and FASTA Database search: gene finding and annotations
More informationDatabase Searching Using BLAST
Mahidol University Objectives SCMI512 Molecular Sequence Analysis Database Searching Using BLAST Lecture 2B After class, students should be able to: explain the FASTA algorithm for database searching explain
More informationCBMF W4761 Final Project Report
CBMF W4761 Final Project Report Christopher Fenton CBMF W4761 8/16/09 Introduction Performing large scale database searches of genomic data is one of the largest problems in computational genomics. When
More informationAn FPGA-Based Systolic Array to Accelerate the BWA-MEM Genomic Mapping Algorithm
An FPGA-Based Systolic Array to Accelerate the BWA-MEM Genomic Mapping Algorithm Ernst Joachim Houtgast, Vlad-Mihai Sima, Koen Bertels and Zaid Al-Ars Faculty of EEMCS, Delft University of Technology,
More informationWhen we search a nucleic acid databases, there is no need for you to carry out your own six frame translation. Mascot always performs a 6 frame
1 When we search a nucleic acid databases, there is no need for you to carry out your own six frame translation. Mascot always performs a 6 frame translation on the fly. That is, 3 reading frames from
More informationMasher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs
Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs Anas Abu-Doleh 1,2, Erik Saule 1, Kamer Kaya 1 and Ümit V. Çatalyürek 1,2 1 Department of Biomedical Informatics 2 Department of Electrical
More informationSequence Alignment. Ulf Leser
Sequence Alignment Ulf Leser his Lecture Approximate String Matching Edit distance and alignment Computing global alignments Local alignment Ulf Leser: Bioinformatics, Summer Semester 2016 2 ene Function
More informationAligners. J Fass 23 August 2017
Aligners J Fass 23 August 2017 Definitions Assembly: I ve found the shredded remains of an important document; put it back together! UC Davis Genome Center Bioinformatics Core J Fass Aligners 2017-08-23
More informationDRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric
DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based
More informationPairwise Sequence Alignment: Dynamic Programming Algorithms. COMP Spring 2015 Luay Nakhleh, Rice University
Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 - Spring 2015 Luay Nakhleh, Rice University DP Algorithms for Pairwise Alignment The number of all possible pairwise alignments (if
More informationFast Sequence Alignment Method Using CUDA-enabled GPU
Fast Sequence Alignment Method Using CUDA-enabled GPU Yeim-Kuan Chang Department of Computer Science and Information Engineering National Cheng Kung University Tainan, Taiwan ykchang@mail.ncku.edu.tw De-Yu
More informationBioSEAL: In-Memory Biological Sequence Alignment Accelerator for Large-Scale Genomic Data
BioSEAL: In-Memory Biological Sequence Alignment Accelerator for Large-Scale Genomic Data Roman Kaplan, Leonid Yavits and Ran Ginosar Abstract Genome sequences contain hundreds of millions of DNA base
More informationInternational Journal of Engineering and Scientific Research
Research Article ISSN: XXXX XXXX International Journal of Engineering and Scientific Research Journal home page: www.ijesr.info AN EFFICIENT RETOUCHED BLOOM FILTER BASED WORD-MATCHING STAGE OF BLASTN R.
More informationSlopMap: a software application tool for quick and flexible identification of similar sequences using exact k-mer matching
SlopMap: a software application tool for quick and flexible identification of similar sequences using exact k-mer matching Ilya Y. Zhbannikov 1, Samuel S. Hunter 1,2, Matthew L. Settles 1,2, and James
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationPairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 Luay Nakhleh, Rice University
1 Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 Luay Nakhleh, Rice University DP Algorithms for Pairwise Alignment 2 The number of all possible pairwise alignments (if gaps are allowed)
More informationScalable RNA Sequencing on Clusters of Multicore Processors
JOAQUÍN DOPAZO JOAQUÍN TARRAGA SERGIO BARRACHINA MARÍA ISABEL CASTILLO HÉCTOR MARTÍNEZ ENRIQUE S. QUINTANA ORTÍ IGNACIO MEDINA INTRODUCTION DNA Exon 0 Exon 1 Exon 2 Intron 0 Intron 1 Reads Sequencing RNA
More informationA Scalable Coprocessor for Bioinformatic Sequence Alignments
A Scalable Coprocessor for Bioinformatic Sequence Alignments Scott F. Smith Department of Electrical and Computer Engineering Boise State University Boise, ID, U.S.A. Abstract A hardware coprocessor for
More informationFastA & the chaining problem
FastA & the chaining problem We will discuss: Heuristics used by the FastA program for sequence alignment Chaining problem 1 Sources for this lecture: Lectures by Volker Heun, Daniel Huson and Knut Reinert,
More informationOmega: an Overlap-graph de novo Assembler for Metagenomics
Omega: an Overlap-graph de novo Assembler for Metagenomics B a h l e l H a i d e r, Ta e - H y u k A h n, B r i a n B u s h n e l l, J u a n j u a n C h a i, A l e x C o p e l a n d, C h o n g l e Pa n
More information