Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs
|
|
- Ashlyn Phelps
- 5 years ago
- Views:
Transcription
1 Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs Anas Abu-Doleh 1,2, Erik Saule 1, Kamer Kaya 1 and Ümit V. Çatalyürek 1,2 1 Department of Biomedical Informatics 2 Department of Electrical and Computer Engineering The Ohio State University
2 Outline I. Introduction Motivation Contribution Related Work II. Masher Workflow Index Construction Mapping III. Experiments and Results IV. Conclusion and Future Work A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 2
3 Motivation The read length of next generation sequencing (NGS) devices is continuously increasing so there is a wide interest in efficient and accurate mapping of long(er) reads. Utilizing the powerful capabilities of GPUs to improve the mapping of NGS reads. A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 3
4 Related Work and Contributions Contribution A novel hash-based indexing technique by which: For large genomes, the memory footprint small enough to be stored in a restricted-memory device such as a GPU. The index data structure is more suitable for GPU parallelization Related Work Burrows-Wheeler Transform (BWT) o Bowtie2 o CUSHAW2 o Soap3-dp Hash Indexing o SeqAlto o BFAST ) A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 4
5 Masher workflow A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 5
6 Index Construction Processing genome file Base pairs to 2 bit format. Replacing each N with A. A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 6
7 Index Construction Processing genome file Base pairs to 2 bit format. Replacing each N with A. Indexing Seed length L S Indexing step size G A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 7
8 Index Construction Index arrays - Locations array Genome length, N Stores the indexed locations in order for each seed Location array size = log 2 (N) x N/ G Size 2.9 GB, hg19, G = 4 A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 8
9 Index Construction Index arrays - Count array Stores the number of occurrences for each seed Size = 4 Ls x log 2 N/ G Store at most 255 locations. Appear more than 255, do uniform selection. Size = 1 GB, L S = 15. A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 9
10 Index Construction Index arrays - Ptrs array Stores the starting index at locs array for a group of seeds Seed group size, δ. Group id = seed/δ Size = 4 L / δ x log 2 ( N/ G Size = 0.5 GB, δ = 8, G = 4. A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 10
11 Index Construction Index arrays L S = 15, G = 4, δ = 8, hg19 Total indexing arrays size = = 4.4 GB. Space time tradeoff A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 11
12 Index Construction Accessing the Index Count array Assume seed = i + 4 Belongs to seed group (i, i + δ 1 ), δ = 8, i mod δ = 0. Seed index in group, k = (i +4) mod δ C k=4 = count[i + 4 ] Ptrs array j = seed /δ, Locs group index (Lgi) = ptrs[ j ] Locs seed index (Lsi) = Lgi + n=0 k 1 C n Locs array Extract locations from (Lsi, Lsi + C k - 1 ) A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 12
13 Pr(count <= x) Index Construction Seeds count A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 13
14 Mapping Seed & hash Read step size, R Read length, L R N seeds = G x (L R L S )/ R Locate candidate alignment locations (CALs) Each thread is assigned to a specific seed. A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 14
15 Mapping Merge CALs and weights In merging CALs, if two CALs are within a threshold distance, the second weight will be added to the first weight. For efficiency purpose, Masher consists of two main loops. A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 15
16 Mapping Sorting and Batching CALs Sorting and setting the CALs in batches with respect to their weights. At this stage, a filter operation for CALs with low weight could be applied. A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 16
17 Mapping Sorting and Batching CALs Sorting and setting the CALs in batches with respect to their weights. At this stage, a filter operation for CALs with low weight could be applied. Bounded local Alignment A parameterized variant of Smith-Waterman (SW) algorithm supporting affinity gap scoring. Bounded alignment, only the matrix cells (i, j) where i - j <= w are visited and scored. Masher does two passes and sets w to 4 and 16 respectively GPU block performs multiple SWs in parallel. A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 17
18 Experiments and Results Platform Intel core i7-960 CPU clocked at 3.2 Ghz. 4 Hyper-Threading cores, 24GB of DDR3 memory. Tesla K20c GPU, 4.8GB of global memory. CUDA 5.0 and GCC Human genome and Simulated Reads Human genome hg19 Wgsim simulator, 100K reads of length 100, 300, 500, and 1000 with error rates 2%, 4%, 6%, and 8%. A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 18
19 Experiments and Results Metrics for comparison Sensitivity, is the percentage of the aligned reads. Accuracy, is the percentage of the reads correctly aligned to simulator locations among all aligned reads. Execution time: Only alignment time was measured. The lower bound for a valid alignment score is set to score LB = L R x ( x Error Rate) Two modes of Masher Normal mode, R = 0.7 L R Fast mode, R = L R Comparison with Bowtie2 (sensitive and fast), 8 threads SOAP3-dp CUSHAW2-GPU. A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 19
20 Accuracy % Sensitivity % Experiments and Results L R = 100 bps. Masher Masher-fast Bowtie2 Bowtie2-fast SOAP3-dp CUSHAW2-GPU % 4% 6% 8% Error rate A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 20
21 Accuracy % Sensitivity % Experiments and Results L R = 500 bps. Masher Masher-fast Bowtie2 Bowtie2-fast SOAP3-dp % 4% 6% 8% Error rate A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 21
22 Accuracy % Sensitivity % Experiments and Results L R = 1000 bps. Masher Masher-fast Bowtie2 Bowtie2-fast SOAP3-dp % 4% 6% 8% Error rate A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 22
23 Execution time (sec.) in log scale Experiments and Results L R = 100 bps. Masher Masher-fast Bowtie2 Bowtie2-fast SOAP3-dp CUSHAW2-GPU % 4% 6% 8% Error rate A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 23
24 Execution time (sec.) in log scale Experiments and Results L R = 500 bps. Masher Masher-fast Bowtie2 Bowtie2-fast SOAP3-dp % 4% 6% 8% Error rate A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 24
25 Execution time (sec.) in log scale Experiments and Results L R = 1000 bps. Masher Masher-fast Bowtie2 Bowtie2-fast SOAP3-dp % 4% 6% 8% Error rate A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 25
26 Execution time (sec.) in log scale Experiments and Results Masher Masher-fast Bowtie2 Bowtie2-fast Sensitivity % Accuracy % SOAP3-dp L R = 1000 bps, Error rate 2% A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 26
27 Conclusion and future work Conclusion Masher, a fast and accurate short/long read mapper, which uses memory efficient indexing scheme to reduce the size of a human genome index and to make it fit to the memory of a GPU. The results show that Masher produces accurate alignments. Its speed is competitive with the tested state-of-the-art tools for reads of length less than 500 and an order of magnitude faster when the reads are longer than 500. Future work Making the software publicly available. Improving Masher s performance further by using GPU-specific optimizations and with a better CPU/GPU pipelining. Adding new features such as a support for paired-end sequences or fastq format. A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 27
28 Thanks For more information Visit Acknowledgement of Support A Abu-Doleh Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs" 28
Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs
Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs Anas Abu-Doleh Erik Saule Kamer Kaya Ümit V. Çatalyürek Dept. of Biomedical Informatics Dept. of Electrical and Computer Engineering
More informationScalable RNA Sequencing on Clusters of Multicore Processors
JOAQUÍN DOPAZO JOAQUÍN TARRAGA SERGIO BARRACHINA MARÍA ISABEL CASTILLO HÉCTOR MARTÍNEZ ENRIQUE S. QUINTANA ORTÍ IGNACIO MEDINA INTRODUCTION DNA Exon 0 Exon 1 Exon 2 Intron 0 Intron 1 Reads Sequencing RNA
More informationINTRODUCING NVBIO: HIGH PERFORMANCE PRIMITIVES FOR COMPUTATIONAL GENOMICS. Jonathan Cohen, NVIDIA Nuno Subtil, NVIDIA Jacopo Pantaleoni, NVIDIA
INTRODUCING NVBIO: HIGH PERFORMANCE PRIMITIVES FOR COMPUTATIONAL GENOMICS Jonathan Cohen, NVIDIA Nuno Subtil, NVIDIA Jacopo Pantaleoni, NVIDIA SEQUENCING AND MOORE S LAW Slide courtesy Illumina DRAM I/F
More informationMapping NGS reads for genomics studies
Mapping NGS reads for genomics studies Valencia, 28-30 Sep 2015 BIER Alejandro Alemán aaleman@cipf.es Genomics Data Analysis CIBERER Where are we? Fastq Sequence preprocessing Fastq Alignment BAM Visualization
More informationGPUBwa -Parallelization of Burrows Wheeler Aligner using Graphical Processing Units
GPUBwa -Parallelization of Burrows Wheeler Aligner using Graphical Processing Units Abstract A very popular discipline in bioinformatics is Next-Generation Sequencing (NGS) or DNA sequencing. It specifies
More informationReview of Recent NGS Short Reads Alignment Tools BMI-231 final project, Chenxi Chen Spring 2014
Review of Recent NGS Short Reads Alignment Tools BMI-231 final project, Chenxi Chen Spring 2014 Deciphering the information contained in DNA sequences began decades ago since the time of Sanger sequencing.
More informationAccelrys Pipeline Pilot and HP ProLiant servers
Accelrys Pipeline Pilot and HP ProLiant servers A performance overview Technical white paper Table of contents Introduction... 2 Accelrys Pipeline Pilot benchmarks on HP ProLiant servers... 2 NGS Collection
More informationGPU Accelerated API for Alignment of Genomics Sequencing Data
GPU Accelerated API for Alignment of Genomics Sequencing Data Nauman Ahmed, Hamid Mushtaq, Koen Bertels and Zaid Al-Ars Computer Engineering Laboratory, Delft University of Technology, Delft, The Netherlands
More informationSEASHORE / SARUMAN. Short Read Matching using GPU Programming. Tobias Jakobi
SEASHORE SARUMAN Summary 1 / 24 SEASHORE / SARUMAN Short Read Matching using GPU Programming Tobias Jakobi Center for Biotechnology (CeBiTec) Bioinformatics Resource Facility (BRF) Bielefeld University
More informationUnder the Hood of Alignment Algorithms for NGS Researchers
Under the Hood of Alignment Algorithms for NGS Researchers April 16, 2014 Gabe Rudy VP of Product Development Golden Helix Questions during the presentation Use the Questions pane in your GoToWebinar window
More informationHigh-performance short sequence alignment with GPU acceleration
Distrib Parallel Databases (2012) 30:385 399 DOI 10.1007/s10619-012-7099-x High-performance short sequence alignment with GPU acceleration Mian Lu Yuwei Tan Ge Bai Qiong Luo Published online: 10 August
More informationImplementing Modern Short Read DNA Alignment Algorithms in CUDA. Jonathan Cohen Senior Manager, CUDA Libraries and Algorithms
Implementing Modern Short Read DNA Alignment Algorithms in CUDA Jonathan Cohen Senior Manager, CUDA Libraries and Algorithms Next-Gen DNA Sequencing In 4 slides DNA Sample Replication C A T G Sensing Circuitry
More informationHeterogeneous Hardware/Software Acceleration of the BWA-MEM DNA Alignment Algorithm
Heterogeneous Hardware/Software Acceleration of the BWA-MEM DNA Alignment Algorithm Nauman Ahmed, Vlad-Mihai Sima, Ernst Houtgast, Koen Bertels and Zaid Al-Ars Computer Engineering Lab, Delft University
More informationGSNAP: Fast and SNP-tolerant detection of complex variants and splicing in short reads by Thomas D. Wu and Serban Nacu
GSNAP: Fast and SNP-tolerant detection of complex variants and splicing in short reads by Thomas D. Wu and Serban Nacu Matt Huska Freie Universität Berlin Computational Methods for High-Throughput Omics
More informationRunning SNAP. The SNAP Team October 2012
Running SNAP The SNAP Team October 2012 1 Introduction SNAP is a tool that is intended to serve as the read aligner in a gene sequencing pipeline. Its theory of operation is described in Faster and More
More informationSequence mapping and assembly. Alistair Ward - Boston College
Sequence mapping and assembly Alistair Ward - Boston College Sequenced a genome? Fragmented a genome -> DNA library PCR amplification Sequence reads (ends of DNA fragment for mate pairs) We no longer have
More informationNext generation sequencing: assembly by mapping reads. Laurent Falquet, Vital-IT Helsinki, June 3, 2010
Next generation sequencing: assembly by mapping reads Laurent Falquet, Vital-IT Helsinki, June 3, 2010 Overview What is assembly by mapping? Methods BWT File formats Tools Issues Visualization Discussion
More informationA Fast Read Alignment Method based on Seed-and-Vote For Next GenerationSequencing
A Fast Read Alignment Method based on Seed-and-Vote For Next GenerationSequencing Song Liu 1,2, Yi Wang 3, Fei Wang 1,2 * 1 Shanghai Key Lab of Intelligent Information Processing, Shanghai, China. 2 School
More informationNextGenMap and the impact of hhighly polymorphic regions. Arndt von Haeseler
NextGenMap and the impact of hhighly polymorphic regions Arndt von Haeseler Joint work with: The Technological Revolution Wetterstrand KA. DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program
More informationGPU-accelerated similarity searching in a database of short DNA sequences
S7367 GPU-accelerated similarity Richard Wilton Department of Physics and Astronomy Johns Hopkins University GPU vs Database What kinds of database queries are amenable to GPU acceleration? Compute intensive
More informationRNA-seq Data Analysis
Seyed Abolfazl Motahari RNA-seq Data Analysis Basics Next Generation Sequencing Biological Samples Data Cost Data Volume Big Data Analysis in Biology تحلیل داده ها کنترل سیستمهای بیولوژیکی تشخیص بیماریها
More informationBLAST & Genome assembly
BLAST & Genome assembly Solon P. Pissis Tomáš Flouri Heidelberg Institute for Theoretical Studies May 15, 2014 1 BLAST What is BLAST? The algorithm 2 Genome assembly De novo assembly Mapping assembly 3
More informationPerformance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi
Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi Erik Saule 1, Kamer Kaya 1 and Ümit V. Çatalyürek 1,2 esaule@uncc.edu, {kamer,umit}@bmi.osu.edu 1 Department of Biomedical
More informationNGS Data and Sequence Alignment
Applications and Servers SERVER/REMOTE Compute DB WEB Data files NGS Data and Sequence Alignment SSH WEB SCP Manpreet S. Katari App Aug 11, 2016 Service Terminal IGV Data files Window Personal Computer/Local
More informationHardware Acceleration of Genetic Sequence Alignment
Hardware Acceleration of Genetic Sequence Alignment J. Arram 1,K.H.Tsoi 1, Wayne Luk 1,andP.Jiang 2 1 Department of Computing, Imperial College London, United Kingdom 2 Department of Chemical Pathology,
More informationResequencing and Mapping. Andreas Gisel Inernational Institute of Tropical Agriculture (IITA) Ibadan, Nigeria
Resequencing and Mapping Andreas Gisel Inernational Institute of Tropical Agriculture (IITA) Ibadan, Nigeria The Principle of Mapping reads good, ood_, d_mo, morn, orni, ning, ing_, g_be, beau, auti, utif,
More informationBRAT-BW: Efficient and accurate mapping of bisulfite-treated reads [Supplemental Material]
BRAT-BW: Efficient and accurate mapping of bisulfite-treated reads [Supplemental Material] Elena Y. Harris 1, Nadia Ponts 2,3, Karine G. Le Roch 2 and Stefano Lonardi 1 1 Department of Computer Science
More informationEfficient Alignment of Next Generation Sequencing Data Using MapReduce on the Cloud
212 Cairo International Biomedical Engineering Conference (CIBEC) Cairo, Egypt, December 2-21, 212 Efficient Alignment of Next Generation Sequencing Data Using MapReduce on the Cloud Rawan AlSaad and Qutaibah
More informationSlopMap: a software application tool for quick and flexible identification of similar sequences using exact k-mer matching
SlopMap: a software application tool for quick and flexible identification of similar sequences using exact k-mer matching Ilya Y. Zhbannikov 1, Samuel S. Hunter 1,2, Matthew L. Settles 1,2, and James
More informationRunning SNAP. The SNAP Team February 2012
Running SNAP The SNAP Team February 2012 1 Introduction SNAP is a tool that is intended to serve as the read aligner in a gene sequencing pipeline. Its theory of operation is described in Faster and More
More informationRead Mapping. Slides by Carl Kingsford
Read Mapping Slides by Carl Kingsford Bowtie Ultrafast and memory-efficient alignment of short DNA sequences to the human genome Ben Langmead, Cole Trapnell, Mihai Pop and Steven L Salzberg, Genome Biology
More informationBLAST & Genome assembly
BLAST & Genome assembly Solon P. Pissis Tomáš Flouri Heidelberg Institute for Theoretical Studies November 17, 2012 1 Introduction Introduction 2 BLAST What is BLAST? The algorithm 3 Genome assembly De
More informationAchieving High Throughput Sequencing with Graphics Processing Units
Achieving High Throughput Sequencing with Graphics Processing Units Su Chen 1, Chaochao Zhang 1, Feng Shen 1, Ling Bai 1, Hai Jiang 1, and Damir Herman 2 1 Department of Computer Science, Arkansas State
More informationShort Read Alignment Algorithms
Short Read Alignment Algorithms Raluca Gordân Department of Biostatistics and Bioinformatics Department of Computer Science Department of Molecular Genetics and Microbiology Center for Genomic and Computational
More informationShort Read Alignment. Mapping Reads to a Reference
Short Read Alignment Mapping Reads to a Reference Brandi Cantarel, Ph.D. & Daehwan Kim, Ph.D. BICF 05/2018 Introduction to Mapping Short Read Aligners DNA vs RNA Alignment Quality Pitfalls and Improvements
More informationMapping Reads to Reference Genome
Mapping Reads to Reference Genome DNA carries genetic information DNA is a double helix of two complementary strands formed by four nucleotides (bases): Adenine, Cytosine, Guanine and Thymine 2 of 31 Gene
More informationMICA: A fast short-read aligner that takes full advantage of Many Integrated Core Architecture (MIC)
RESEARCH Open Access MICA: A fast short-read aligner that takes full advantage of Many Integrated Core Architecture (MIC) Ruibang Luo 1, Jeanno Cheung 1, Edward Wu 1, Heng Wang 1,2, Sze-Hang Chan 1, Wai-Chun
More informationLatency Masking Threads on FPGAs
Latency Masking Threads on FPGAs Walid Najjar UC Riverside & Jacquard Computing Inc. Credits } Edward B. Fernandez (UCR) } Dr. Jason Villarreal (Jacquard Computing) } Adrian Park (Jacquard Computing) }
More informationSequencing. Short Read Alignment. Sequencing. Paired-End Sequencing 6/10/2010. Tobias Rausch 7 th June 2010 WGS. ChIP-Seq. Applied Biosystems.
Sequencing Short Alignment Tobias Rausch 7 th June 2010 WGS RNA-Seq Exon Capture ChIP-Seq Sequencing Paired-End Sequencing Target genome Fragments Roche GS FLX Titanium Illumina Applied Biosystems SOLiD
More informationKart: a divide-and-conquer algorithm for NGS read alignment
Bioinformatics, 33(15), 2017, 2281 2287 doi: 10.1093/bioinformatics/btx189 Advance Access Publication Date: 4 April 2017 Original Paper Sequence analysis Kart: a divide-and-conquer algorithm for NGS read
More informationA High Performance Architecture for an Exact Match Short-Read Aligner Using Burrows-Wheeler Aligner on FPGAs
Western Michigan University ScholarWorks at WMU Master's Theses Graduate College 12-2015 A High Performance Architecture for an Exact Match Short-Read Aligner Using Burrows-Wheeler Aligner on FPGAs Dana
More informationAMAS: optimizing the partition and filtration of adaptive seeds to speed up read mapping
AMAS: optimizing the partition and filtration of adaptive seeds to speed up read mapping Ngoc Hieu Tran 1, * Email: nhtran@ntu.edu.sg Xin Chen 1 Email: chenxin@ntu.edu.sg 1 School of Physical and Mathematical
More informationGPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC
GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC MIKE GOWANLOCK NORTHERN ARIZONA UNIVERSITY SCHOOL OF INFORMATICS, COMPUTING & CYBER SYSTEMS BEN KARSIN UNIVERSITY OF HAWAII AT MANOA DEPARTMENT
More informationNEXT Generation sequencers have a very high demand
1358 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 5, MAY 2016 Hardware-Acceleration of Short-Read Alignment Based on the Burrows-Wheeler Transform Hasitha Muthumala Waidyasooriya,
More informationBurrows-Wheeler Short Read Aligner on AWS EC2 F1 Instances
University of Virginia High-Performance Low-Power Lab Prof. Dr. Mircea Stan Burrows-Wheeler Short Read Aligner on AWS EC2 F1 Instances Smith-Waterman Extension on FPGA(s) Sergiu Mosanu, Kevin Skadron and
More informationOn enhancing variation detection through pan-genome indexing
Standard approach...t......t......t......acgatgctagtgcatgt......t......t......t... reference genome Variation graph reference SNP: A->T...ACGATGCTTGTGCATGT donor genome Can we boost variation detection
More informationFPGA Acceleration of Short Read Alignment
TECHNICAL REPORT 1 FPGA Acceleration of Short Read Alignment Nathaniel McVicar, Akina Hoshino, Anna La Torre, Thomas A. Reh, Walter L. Ruzzo and Scott Hauck Abstract Aligning millions of short DNA or RNA
More informationAccelerating Genomic Sequence Alignment Workload with Scalable Vector Architecture
Accelerating Genomic Sequence Alignment Workload with Scalable Vector Architecture Dong-hyeon Park, Jon Beaumont, Trevor Mudge University of Michigan, Ann Arbor Genomics Past Weeks ~$3 billion Human Genome
More informationRNA-seq. Manpreet S. Katari
RNA-seq Manpreet S. Katari Evolution of Sequence Technology Normalizing the Data RPKM (Reads per Kilobase of exons per million reads) Score = R NT R = # of unique reads for the gene N = Size of the gene
More informationParallel Mapping Approaches for GNUMAP
2011 IEEE International Parallel & Distributed Processing Symposium Parallel Mapping Approaches for GNUMAP Nathan L. Clement, Mark J. Clement, Quinn Snell and W. Evan Johnson Department of Computer Science
More informationSupplementary Information. Detecting and annotating genetic variations using the HugeSeq pipeline
Supplementary Information Detecting and annotating genetic variations using the HugeSeq pipeline Hugo Y. K. Lam 1,#, Cuiping Pan 1, Michael J. Clark 1, Phil Lacroute 1, Rui Chen 1, Rajini Haraksingh 1,
More informationGPU 3 Smith-Waterman
129 211 11 GPU 3 Smith-Waterman Saori SUDO 1 GPU Graphic Processing Unit GPU GPGPUGeneral-purpose computing on GPU 1) CPU GPU GPU GPGPU NVIDIA C CUDACompute Unified Device Architecture 2) OpenCL 3) DNA
More informationAlgorithms and Tools for Bioinformatics on GPUs. Bertil Schmidt
Algorithms and Tools for Bioinformatics on GPUs Bertil Schmidt Data Explosion Contents Overview CUDA-enabled HPC Bioinformatics Software developed by my group Pairwise Sequence Alignment Multiple Sequence
More informationHypergraph Sparsifica/on and Its Applica/on to Par//oning
Hypergraph Sparsifica/on and Its Applica/on to Par//oning Mehmet Deveci 1,3, Kamer Kaya 1, Ümit V. Çatalyürek 1,2 1 Dept. of Biomedical Informa/cs, The Ohio State University 2 Dept. of Electrical & Computer
More informationNVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield
NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host
More informationIndexing. UCSB 290N. Mainly based on slides from the text books of Croft/Metzler/Strohman and Manning/Raghavan/Schutze
Indexing UCSB 290N. Mainly based on slides from the text books of Croft/Metzler/Strohman and Manning/Raghavan/Schutze All slides Addison Wesley, 2008 Table of Content Inverted index with positional information
More informationGalaxy Platform For NGS Data Analyses
Galaxy Platform For NGS Data Analyses Weihong Yan wyan@chem.ucla.edu Collaboratory Web Site http://qcb.ucla.edu/collaboratory Collaboratory Workshops Workshop Outline ü Day 1 UCLA galaxy and user account
More informationUNIVERSITY OF OSLO. Department of informatics. Parallel alignment of short sequence reads on graphics processors. Master thesis. Bjørnar Andreas Ruud
UNIVERSITY OF OSLO Department of informatics Parallel alignment of short sequence reads on graphics processors Master thesis Bjørnar Andreas Ruud April 29, 2011 2 Table of Contents 1 Abstract... 7 2 Acknowledgements...
More informationFast and efficient short read mapping based on a succinct hash index
Zhang et al. BMC Bioinformatics (2018) 19:92 https://doi.org/10.1186/s12859-018-2094-5 RESEARCH ARTICLE Fast and efficient short read mapping based on a succinct hash index Haowen Zhang 3, Yuandong Chan
More informationAligners. J Fass 23 August 2017
Aligners J Fass 23 August 2017 Definitions Assembly: I ve found the shredded remains of an important document; put it back together! UC Davis Genome Center Bioinformatics Core J Fass Aligners 2017-08-23
More informationBWT Indexing: Big Data from Next Generation Sequencing and GPU
GPU Technology Conference 2014 BWT Indexing: Big Data from Next Generation Sequencing and GPU Jeanno Cheung HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory University of Hong
More informationPresenting: Comparing the Power and Performance of Intel's SCC to State-of-the-Art CPUs and GPUs
Presenting: Comparing the Power and Performance of Intel's SCC to State-of-the-Art CPUs and GPUs A paper comparing modern architectures Joakim Skarding Christian Chavez Motivation Continue scaling of performance
More informationLong Read RNA-seq Mapper
UNIVERSITY OF ZAGREB FACULTY OF ELECTRICAL ENGENEERING AND COMPUTING MASTER THESIS no. 1005 Long Read RNA-seq Mapper Josip Marić Zagreb, February 2015. Table of Contents 1. Introduction... 1 2. RNA Sequencing...
More informationPLB-HeC: A Profile-based Load-Balancing Algorithm for Heterogeneous CPU-GPU Clusters
PLB-HeC: A Profile-based Load-Balancing Algorithm for Heterogeneous CPU-GPU Clusters IEEE CLUSTER 2015 Chicago, IL, USA Luis Sant Ana 1, Daniel Cordeiro 2, Raphael Camargo 1 1 Federal University of ABC,
More informationGenome 373: Mapping Short Sequence Reads I. Doug Fowler
Genome 373: Mapping Short Sequence Reads I Doug Fowler Two different strategies for parallel amplification BRIDGE PCR EMULSION PCR Two different strategies for parallel amplification BRIDGE PCR EMULSION
More informationRead Mapping and Assembly
Statistical Bioinformatics: Read Mapping and Assembly Stefan Seemann seemann@rth.dk University of Copenhagen April 9th 2019 Why sequencing? Why sequencing? Which organism does the sample comes from? Assembling
More informationEfficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed
Efficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed Marco Aldinucci Computer Science Dept. - University of Torino - Italy Marco Danelutto, Massimiliano Meneghin,
More informationShepard - A Fast Exact Match Short Read Aligner
Shepard A Fast Exact Match Short Read Aligner Chad Nelson, Kevin Townsend, Bhavani Rao, Phillip Jones, Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Ames, IA,
More informationFast Parallel Longest Common Subsequence with General Integer Scoring Support
Fast Parallel Longest Common Subsequence with General Integer Scoring Support Adnan Ozsoy, Arun Chauhan, Martin Swany School of Informatics and Computing Indiana University, Bloomington, USA 1 Fast Parallel
More informationBIOINFORMATICS ORIGINAL PAPER
BIOINFORMATICS ORIGINAL PAPER Vol. 27 no. 10 2011, pages 1351 1358 doi:10.1093/bioinformatics/btr151 Sequence analysis Advance Access publication March 30, 2011 Exact and complete short-read alignment
More informationAtlas-SNP2 DOCUMENTATION V1.1 April 26, 2010
Atlas-SNP2 DOCUMENTATION V1.1 April 26, 2010 Contact: Jin Yu (jy2@bcm.tmc.edu), and Fuli Yu (fyu@bcm.tmc.edu) Human Genome Sequencing Center (HGSC) at Baylor College of Medicine (BCM) Houston TX, USA 1
More informationDarwin: A Genomic Co-processor gives up to 15,000X speedup on long read assembly (To appear in ASPLOS 2018)
Darwin: A Genomic Co-processor gives up to 15,000X speedup on long read assembly (To appear in ASPLOS 2018) Yatish Turakhia EE PhD candidate Stanford University Prof. Bill Dally (Electrical Engineering
More informationAligners. J Fass 21 June 2017
Aligners J Fass 21 June 2017 Definitions Assembly: I ve found the shredded remains of an important document; put it back together! UC Davis Genome Center Bioinformatics Core J Fass Aligners 2017-06-21
More informationCS6200 Information Retrieval. David Smith College of Computer and Information Science Northeastern University
CS6200 Information Retrieval David Smith College of Computer and Information Science Northeastern University Indexing Process!2 Indexes Storing document information for faster queries Indexes Index Compression
More informationIntegrating GPU-Accelerated Sequence Alignment and SNP Detection for Genome Resequencing Analysis
Integrating GPU-Accelerated Sequence Alignment and SNP Detection for Genome Resequencing Analysis Mian Lu, Yuwei Tan, Jiuxin Zhao, Ge Bai, and Qiong Luo Hong Kong University of Science and Technology {lumian,ytan,zhaojx,gbai,luo}@cse.ust.hk
More informationHighly Efficient Compensationbased Parallelism for Wavefront Loops on GPUs
Highly Efficient Compensationbased Parallelism for Wavefront Loops on GPUs Kaixi Hou, Hao Wang, Wu chun Feng {kaixihou, hwang121, wfeng}@vt.edu Jeffrey S. Vetter, Seyong Lee vetter@computer.org, lees2@ornl.gov
More informationBioinformatics. Anatomy of a Hash-based Long Read Sequence Mapping Algorithm for Next Generation DNA Sequencing
Bioinformatics Anatomy of a Hash-based Long Read Sequence Mapping Algorithm for Next Generation DNA Sequencing Journal: Bioinformatics Manuscript ID: BIOINF-0-0 Category: Original Paper Date Submitted
More informationX10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management
X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management Hideyuki Shamoto, Tatsuhiro Chiba, Mikio Takeuchi Tokyo Institute of Technology IBM Research Tokyo Programming for large
More informationDarwin-WGA. A Co-processor Provides Increased Sensitivity in Whole Genome Alignments with High Speedup
Darwin-WGA A Co-processor Provides Increased Sensitivity in Whole Genome Alignments with High Speedup Yatish Turakhia*, Sneha D. Goenka*, Prof. Gill Bejerano, Prof. William J. Dally * Equal contribution
More informationSTREAMER: a Distributed Framework for Incremental Closeness Centrality
STREAMER: a Distributed Framework for Incremental Closeness Centrality Computa@on A. Erdem Sarıyüce 1,2, Erik Saule 4, Kamer Kaya 1, Ümit V. Çatalyürek 1,3 1 Department of Biomedical InformaBcs 2 Department
More informationCoordinating More Than 3 Million CUDA Threads for Social Network Analysis. Adam McLaughlin
Coordinating More Than 3 Million CUDA Threads for Social Network Analysis Adam McLaughlin Applications of interest Computational biology Social network analysis Urban planning Epidemiology Hardware verification
More informationZFS for NGS data analysis
ZFS for NGS data analysis saving space from the galactic expansion Davide Cittaro - Cogentech (Milan, Italy) Galaxy DevCon 2010 - CHSL NY Motivation Motivation Deploy Galaxy to serve a small NGS facility
More informationNA12878 Platinum Genome GENALICE MAP Analysis Report
NA12878 Platinum Genome GENALICE MAP Analysis Report Bas Tolhuis, PhD Jan-Jaap Wesselink, PhD GENALICE B.V. INDEX EXECUTIVE SUMMARY...4 1. MATERIALS & METHODS...5 1.1 SEQUENCE DATA...5 1.2 WORKFLOWS......5
More informationVarious Algorithms for High Throughput Sequencing. Vladimir Yanovsky
Various Algorithms for High Throughput Sequencing by Vladimir Yanovsky A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Computer Science
More informationREPORT. NA12878 Platinum Genome. GENALICE MAP Analysis Report. Bas Tolhuis, PhD GENALICE B.V.
REPORT NA12878 Platinum Genome GENALICE MAP Analysis Report Bas Tolhuis, PhD GENALICE B.V. INDEX EXECUTIVE SUMMARY...4 1. MATERIALS & METHODS...5 1.1 SEQUENCE DATA...5 1.2 WORKFLOWS......5 1.3 ACCURACY
More informationAAlign: A SIMD Framework for Pairwise Sequence Alignment on x86-based Multi- and Many-core Processors
AAlign: A SIMD Framework for Pairwise Sequence Alignment on x86-based Multi- and Many-core Processors Kaixi Hou, Hao Wang, Wu-chun Feng {kaixihou,hwang121,wfeng}@vt.edu Pairwise Sequence Alignment Algorithms
More informationUsing MPI One-sided Communication to Accelerate Bioinformatics Applications
Using MPI One-sided Communication to Accelerate Bioinformatics Applications Hao Wang (hwang121@vt.edu) Department of Computer Science, Virginia Tech Next-Generation Sequencing (NGS) Data Analysis NGS Data
More informationTutorial. Find Very Low Frequency Variants With QIAGEN GeneRead Panels. Sample to Insight. November 21, 2017
Find Very Low Frequency Variants With QIAGEN GeneRead Panels November 21, 2017 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com
More informationOptimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink
Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink Rajesh Bordawekar IBM T. J. Watson Research Center bordaw@us.ibm.com Pidad D Souza IBM Systems pidsouza@in.ibm.com 1 Outline
More informationHigh Performance Technique for Database Applications Using a Hybrid GPU/CPU Platform
High Performance Technique for Database Applications Using a Hybrid GPU/CPU Platform M. Affan Zidan, Talal Bonny, and Khaled N. Salama Electrical Engineering Program King Abdullah University of Science
More informationWelcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page.
Welcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page. In this page you will learn to use the tools of the MAPHiTS suite. A little advice before starting : rename your
More informationA Design of a Hybrid System for DNA Sequence Alignment
IMECS 2008, 9-2 March, 2008, Hong Kong A Design of a Hybrid System for DNA Sequence Alignment Heba Khaled, Hossam M. Faheem, Tayseer Hasan, Saeed Ghoneimy Abstract This paper describes a parallel algorithm
More informationSparkBurst: An Efficient and Faster Sequence Mapping Tool on Apache Spark Platform
SparkBurst: An Efficient and Faster Sequence Mapping Tool on Apache Spark Platform MTP End-Sem Report submitted to Indian Institute of Technology, Mandi for partial fulfillment of the degree of B. Tech.
More informationPerformance analysis of parallel de novo genome assembly in shared memory system
IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS Performance analysis of parallel de novo genome assembly in shared memory system To cite this article: Syam Budi Iryanto et al 2018
More informationReads Alignment and Variant Calling
Reads Alignment and Variant Calling CB2-201 Computational Biology and Bioinformatics February 22, 2016 Emidio Capriotti http://biofold.org/ Institute for Mathematical Modeling of Biological Systems Department
More informationFPGA Acceleration of Short Read Alignment
FPGA Acceleration of Short Read Alignment 0 NATHANIEL McVICAR, AKINA HOSHINO, ANNA LA TORRE, THOMAS A. REH, WALTER L. RUZZO and SCOTT HAUCK, University of Washington Aligning millions of short DNA or RNA
More informationSuper-Fast Genome BWA-Bam-Sort on GLAD
1 Hututa Technologies Limited Super-Fast Genome BWA-Bam-Sort on GLAD Zhiqiang Ma, Wangjun Lv and Lin Gu May 2016 1 2 Executive Summary Aligning the sequenced reads in FASTQ files and converting the resulted
More informationA GPU Algorithm for Comparing Nucleotide Histograms
A GPU Algorithm for Comparing Nucleotide Histograms Adrienne Breland Harpreet Singh Omid Tutakhil Mike Needham Dickson Luong Grant Hennig Roger Hoang Torborn Loken Sergiu M. Dascalu Frederick C. Harris,
More informationA Faster Parallel Algorithm for Analyzing Drug-Drug Interaction from MEDLINE Database
A Faster Parallel Algorithm for Analyzing Drug-Drug Interaction from MEDLINE Database Sulav Malla, Kartik Anil Reddy, Song Yang Department of Computer Science and Engineering University of South Florida
More informationRamethy: Reconfigurable Acceleration of Bisulfite Sequence Alignment
Ramethy: Reconfigurable Acceleration of Bisulfite Sequence Alignment James Arram Department of Computing Imperial College jma11@imperial.ac.uk Wayne Luk Department of Computing Imperial College wl@imperial.ac.uk
More information