Achieving High Throughput Sequencing with Graphics Processing Units

Size: px
Start display at page:

Download "Achieving High Throughput Sequencing with Graphics Processing Units"

Transcription

1 Achieving High Throughput Sequencing with Graphics Processing Units Su Chen 1, Chaochao Zhang 1, Feng Shen 1, Ling Bai 1, Hai Jiang 1, and Damir Herman 2 1 Department of Computer Science, Arkansas State University, Jonesboro, AR 72467, USA 2 Department of Internal Medicine, University of Arkansas for Medical Sciences, Little Rock, AR 7225, USA Abstract High throughput sequencing has become a powerful technique for genome analysis after this concept was raised in recent years. Currently, there is a huge demand from patients that have genetic diseases which cannot be satisfied due to the limitation of computation power. Though several softwares are developed using currently most efficient algorithm to deal with various types of sequencing problems, the CPU seems to be too expensive to process endless data economically because CPUs are not designed adaptive for data parallel problem. The latest Fermi architecture released by NVIDIA provides considerable number of streaming processors, bigger size of register file and 1 MB cache, which makes it very competitive for data parallel processing. This paper tries a simple sequence alignment method on GPU and compared the real world performance between CPU and GPU. Experiment shows that GPU may have a good potential with similar problems. Keywords: High Throughput Sequencing, Graphics Processing Unit 1. Introduction Nowadays, people are paying more and more attention to health care and advanced devices are designed to analyze the samples from patients. When it comes to the molecular level, the data amount becomes extremely large, which needs more computational power to work on it. Recently, the emerging High Throughput Sequencing (HTS) technology [6], [7] shows bioinfomatists a way to deal with this problem better and many multithreaded programs such Bowtie [3], BWA [4] and SOAP2 [7], have been raised for practical use. However, for sequential CPUs, sequence alignment is somehow too easy to deal with, which makes it too expensive to use smart chips like CPUs. As NVIDIA released its new Fermi architecture which provide 512 cores in one chip and gigabytes of memory, GPU seems to have great potential in taking over this job and doing it faster and more economically. In this paper, a simple way is proposed to do exact matching between massive DNA target fragments and mrna reference sequences, and performance comparisons between its CPU and GPU version are discussed. The paper is organized as follows: Section 2 gives our method, including indexing and searching phases, to do sequencing. Section 3 discussed about the detailed designs considering architectures. In Section 4, we will discuss on the experimental results. Section 5 is the related work and conclusions will be drawn in Section Algorithm Design The algorithm idea used in this paper comes from Burrows-Wheeler Transformation, which was first raised for data compression and was later developed to make an efficient index for sequence alignment. Fig. 1 illustrates how the original transformation works. caacg$a aacg$ac acg$aca cg$acaa g$acaac $acaacg $acaac g aacg$a c acaacg $ acg$ac a caacg$ a cg$aca a g$acaa c Fig. 1: Burrows-Wheeler Transformation gc$aaac The concept of BWT is to make an index of reference sequence by hashing the elements within sequence to a special order, which will benefit later searching phase and reduce searching time complexity from O(nlg(n)) of bruteforce method to O(lg(n)). Concrete implementation of BWT can be described as follows: 1) Put a $ at the end of reference sequence. 2) Copy the current sequence and shift the new sequence to right by 1 and put it below the last one for n times, given the original sequence length is n. 3) Sort the new generated block by the order of $, a, c, g, t for each column. 4) Get the last column of the sorted matrix. 2.1 A New Indexing Method for Test Inspired by BWT, we designed another way to make the index. Procedure of the new method is shown in Fig. 2. Next, we will explain it in a more detailed way. 1) We still add a $ at the end of the reference sequence. 2) Generate the same block as what BWT does. This time, we put order numbers for a, c, g and t separately for the first column.

2 3) In this approach, we only sort the first column of the matrix and make sure the small order numbers of a, c, g and t are on the top of the larger ones. 4) We get the last column as the new index. 1 2 aacg$ac 3 acg$aca 1 $acaac g 1 aacg$a c 2 acg$ac a 3 acaacg $ 2 cg$aca a 1 caacg$ a 1 g$acaa c 2 aacg$ac 3 acg$aca 1 gc$aaac Fig. 2: New indexing method 2.2 Searching Algorithm Search for: aac a ac aa Fig. 3: Searching with the new index 1 aacg$ac 2 acg$aca 3 aac The new proposed method has a brute-force searching nature, but by using the index well, several improvements can be achieved. The searching procedure is given in Fig. 3, which is very straightforward. 2.3 Making A Secondary Index Now, we are going to talk about how to improve the performance of our searching algorithm. We can make a secondary index based on the first level index generated by the method mentioned above. For the first column, since we will refer to the beginning and end of a, c, g and t many times, we can save some space and just record these position numbers for the four types of letter. This saves not only the searching time but also a lot of space for the index file. For the last column, since the a, c, g and t here are not clustered, we a: [1, 3] a: [4, 5] g: [6, 6] a: {2, 4, 5} c: {1, 6} g: {} t: {} Fig. 4: Secondary index generation can create four arrays for each of them and remember the occurance positions in the last column for each element. This can prevent the searching algorithm go to positions of wrong letters, for example, if we want a, we just go for 2, 4 and 5 positions in the last column and skip letter of other types. Generally, though it does not fundamentally reduces time complexity of searching algorithm, this indexing method saves much unnecessary time by generating a simple index in O(n) time, which is time-saving. In the experiment part, performances of CPU and GPU that we will be discussing about are based on this algorithm. 3. I/O Involved Program Design 3.1 Single-threaded Code Design for CPU Since the indexing phase of our algorithm costs less time compared with searching phase, in which unpredictable number of target sequences will be throughput as inputs, we add the indexing time to total searching time in this paper. Another important advantage of this is that we can save I/O time and load indices from hard-disk, which cost much more time than the indexing phase when the reference sequence file is very large. When we do everything in memory and never go to hard-disk, searching usually becomes faster. Fig. 5 illustrates how the data pertains our program flows between memory and hard-disk. 1) Load reference sequence file from hard-disk. 2) Generate index for reference sequence in 3) Remove original sequence file from memory, only leave the index there. 4) Load the next target sequence file from hard-disk to memory 5) Do searching for the current batch of target sequences and save results. 6) Remove the first batch of target sequences. 7) Repeat 4) to 6) for all target files. 3.2 CUDA C code design for single GPU Fig. 6 shows the procedure for a machine that has a CPU dealing with our problem. There are altogether nine steps of

3 (6) (4) s (1) (7) CPU (5) DRAM Ref-Index... D I S K (3) s (1) Reference Fig. 5: Data locality control for CPU implementation execution and data transfer for both indexing and searching phases, which will be explained more specifically next. H-Disk Tar... CPU (3) Host RAM (1) (6) (7) (12) Tar (8) (12) Dev RAM (4) Index Tar (5) (1) (9) (13) (13) G P U Fig. 6: Work and data scheduling for GPU implementation 1) Load reference sequence file from hard-disk to CPU 2) Copy reference sequences from CPU memory to GPU 3) Remove reference sequences from CPU 4) Generate index for reference sequence using GPU. 5) Remove original sequence file from GPU memory, only leave index there. 6) Load a target sequence file from hard-disk to CPU 7) Copy current batch of target sequences in CPU memory to GPU 8) Remove present target sequences in CPU Load the next batch of target sequences. 9) GPU does searching and save result in its 1) Remove current batch of target sequences from GPU 11) Repeat 6) to 1) for all target files. 12) Copy back result to CPU memory and save it to disk. 13) Remove results in GPU and CPU 3.3 Noteworthy Differences between CPU and GPU implementations 1) The GPU one has an initializing time for the first booting of the device, usually taking up to 2-3 seconds, where CPU one does not. So for small cases that can be run very fast on CPUs, GPUs have no advantage. 2) Data transfer time between host and device memory should be considered since data amount in our case is usually very large. 3) GPUs can do simple calculations very fast if programs are designed well, so indexing and searching phase can also be considered to do in GPUs, if the data transfer time can be ignored. If the indexing time requires only a little, there is no much need to do it in GPUs. Searching phases usually can be taken well on GPUs since target sequence numbers are always very large. Acceleration rate of dozens to hundreds can be expected for the searching phase if GPUs are adopted. 4. Experimental s Sequential code was written in C and tested on a machine with two Intel Xeon E554 Quad-Core CPUs (2.GHz, 4MB cache), where GPU code was written in CUDA C and tested on the same machine with two GPUs of NVIDIA Tesla 2-Seris C25. In the following part, performance comparison between these two will be given and speedup rate for GPU will be calculated out. Also, time proportion for each part of whole algorithm on CPUs and GPUs will be illustrated and discussed separately. 4.1 CPU vs. GPU Searching Time Block sorting is the most time consuming part in making index for reference strings. Fig. 7 gives the relational curves about time cost and combination number of reference strings (one reference string length = 3, ). From Fig. 7 we can see that for the algorithm proposed in this paper, searching time takes a big portion of total execution time on the CPU side while on the GPU side, it takes relatively smaller portion. This is because GPU runs much faster on the searching part compared to CPU, so given the I/O and data transfer time changes proportionally as the target sequence number increases, GPU saves more absolute time as the problem scale becomes larger and larger.

4 18 6 Execution Time (second) 16 GPU with I/O 14 CPU with I/O GPU search 12 CPU search Speedup Rate Speedup with I/O Number (Length = 87) x 1 Number (Length = 87) x 1 Fig. 7: CPU & GPU timing with and without I/O Fig. 9: GPU speedup rate with I/O 4.2 Speedup on GPU Fig. 8 and Fig. 9 illustrate two speedup curves about the pure searching time and searching time with I/O and data transfer. We can see that for pure searching algorithm, the GPU one can beat the CPU one for up to 14 times, where about 5 times speedup can be achieved when I/O and data transfer is taken into consideration. Actually, since the algorithm is not ultimately optimized, there should still be potential for GPUs to speed up this problem. Speedup Rate Speedup for searching Number (Length = 87) x 1 Fig. 8: GPU speedup rate without I/O 4.3 Overhead Breakdown with CPU & GPU Approaches 1) I/O from hard-disk For both CPU and GPU implementations, this part should take the same time, which is inevitable. The bandwidth from hard-disk to memory has always been a bottleneck for similar problems. However, if we are not using the local hard-disk but using InfiniBand to load data from remote database in parallel, the performance for both CPU and GPU once can be improved, where GPU one might benefit more because it processes data much faster than CPU one and need more data in a given time to meet its stronger computation power. 2) Data transfer between host and device memory Currently, NVIDIA GPUs are using PCIe bus to transfer data from and back between host and device memory, whose capacity is up to 4GB/s for one way transmission and 8GB/s for two way. This speed usually can satisfy GPU s computation power and will not be a bottleneck for now. A noteworthy thing about this is that asynchronous memory copy technique should be used when target sequence is too large to load for once by GPU Asynchronous copy between host and device memory can overlap with GPU computation, so either copy or computing time can be hidden by this overlapping. Which portion will be hidden depends on their time costs. 3) Time for indexing For the algorithm presented in this paper, indexing time can nearly be ignored since I/O and searching time dominate. However, in real applications, such as BWT, indices are usually made more efficient to use. But it also takes more time on indexing and the overhead cannot be ignored. In that case, indexing time should also be considered as an important portion of the whole system. 4) Time for searching This portion of time relies on many factors including indexing efficiency, I/O speed, choose of device and task partitioning design. Basically, more efficient indexing can reduce searching time whereas higher I/O speed can positively influence the performance. For device choosing, we can say GPU is better than CPU from the angle of economy since it provides more powerful tools for searching. However, whether a partitioning design is good or not is hard to tell if we just look at the surface of a specific problem. Calculations should be carefully done to find out the optimum selection for it.

5 5. Related Work RNA sequencing was one of the earliest forms of nucleotide sequencing. The major landmark of RNA sequencing is the sequence of the first complete gene and the complete genome of Bacteriophage MS2, identified and published by Walter Fiers et al. in 1972[8] and 1976[2]. In late 2 decade, high-throughput sequencing (HPS) emerged. Li R (28, 29) proposed several papers about BWT applications on short read alignment [6], [7]. Li H (28, 29) [5], [4] and Langmead (29) [3] also published several works about memory-efficient alignment. In recent years, several alignment programs such as Bowtie [3], BWA [4] and SOAP2 [7] were released. In 29, Sinnott-Armstrong et al. presented a paper about accelerating epistasis analysis in human genetics with Nvidia GeForce GTX-28 and PyCUDA programming tool [9]. Nicholas et al. (211) made a real-world performance comparison of SNPrank across programming platforms such as Python, Java and Matlab, and hardware environments: single threaded, multiple threaded and GPU, where GPU languages are restricted to Matlab and Python [1] and GPU brand is Nvidia Tesla-M16. They declared for small cases, CPU always performs better because of the data transfer to and from device 6. Conclusions and Future Work This paper proposes a way to implement fast sequence alignment on the latest version of NVIDIA GPU. From the experimental result, we can see that GPU speeds up more on the searching phase compared with CPU but delays a constant length of time on its necessary data transfer phase. This feature of GPU manifested that it has a good potential for high throughput sequencing. If the bandwidth bottleneck of loading data from hard-disk can be improved, the performance still has a great potential to keep growing; where for single threaded CPU, the computation power may not guarantee that. In future, we will try to parallelize the most advanced sequence alignment algorithm on GPU and keep investigating the GPU s capability on more applications that receive urgent concerns from medical and biological fields. References [1] Nicolas A. Davis, Ahwan Pandey, and B. A. McKinney. Real-world comparison of cpu and gpu implementations of snprand: a network analysis tool for gwas. Bioinfomatics, 27: , 211. [2] W Fiers, R Contreras, and F Duerinck. Complete nucleotide sequence of bacteriophage ms2 rna: primary and secondary structure of the replicase gene. Nature, 26:5 57, [3] B. Langmead, C. Trapnell, M. Pop, and S. L. Salzberg. Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome Biology, 1(3), 29. [4] H Li and R Durbin. Fast and accurate short read alignment with burrows-wheeler transform. Bioinfomatics, 25(14): , 29. [5] H Li, J Ruan, and Durbin R. Mapping short dna sequencing reads and calling variants using mapping quality scores. Genome Research, 18: , 28. [6] R. Li. Soap: short oligonucleotide alignment program. Bioinfomatics, 24(5): , 28. [7] R. Li. Soap2: an improved ultrafast toll for short read alignment. Bioinformatics, 25(15): , 29. [8] Jou W. Min, G. Haegeman, M. Ysebaert, and Fiers W. Nucleotide sequence of the gene coding for the bacteriophage ms2 coat protein. Nature, 237(369654):82, [9] Nicolas A Sinnott-Armstrong, Casey S Greene, Fabio Cancare, and Jason H Moore. Accelerating epistasis analysis in human genetics with consumer graphics hardware. Technical report, Dartmouth Medical School, NH, USA Politecnico di Milano, Milano, Italia, 29.

GPUBwa -Parallelization of Burrows Wheeler Aligner using Graphical Processing Units

GPUBwa -Parallelization of Burrows Wheeler Aligner using Graphical Processing Units GPUBwa -Parallelization of Burrows Wheeler Aligner using Graphical Processing Units Abstract A very popular discipline in bioinformatics is Next-Generation Sequencing (NGS) or DNA sequencing. It specifies

More information

High-performance short sequence alignment with GPU acceleration

High-performance short sequence alignment with GPU acceleration Distrib Parallel Databases (2012) 30:385 399 DOI 10.1007/s10619-012-7099-x High-performance short sequence alignment with GPU acceleration Mian Lu Yuwei Tan Ge Bai Qiong Luo Published online: 10 August

More information

Using MPI One-sided Communication to Accelerate Bioinformatics Applications

Using MPI One-sided Communication to Accelerate Bioinformatics Applications Using MPI One-sided Communication to Accelerate Bioinformatics Applications Hao Wang (hwang121@vt.edu) Department of Computer Science, Virginia Tech Next-Generation Sequencing (NGS) Data Analysis NGS Data

More information

SEASHORE / SARUMAN. Short Read Matching using GPU Programming. Tobias Jakobi

SEASHORE / SARUMAN. Short Read Matching using GPU Programming. Tobias Jakobi SEASHORE SARUMAN Summary 1 / 24 SEASHORE / SARUMAN Short Read Matching using GPU Programming Tobias Jakobi Center for Biotechnology (CeBiTec) Bioinformatics Resource Facility (BRF) Bielefeld University

More information

Integrating GPU-Accelerated Sequence Alignment and SNP Detection for Genome Resequencing Analysis

Integrating GPU-Accelerated Sequence Alignment and SNP Detection for Genome Resequencing Analysis Integrating GPU-Accelerated Sequence Alignment and SNP Detection for Genome Resequencing Analysis Mian Lu, Yuwei Tan, Jiuxin Zhao, Ge Bai, and Qiong Luo Hong Kong University of Science and Technology {lumian,ytan,zhaojx,gbai,luo}@cse.ust.hk

More information

Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin

Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin Department of Electrical Engineering, National Taiwan Normal University, Taipei, Taiwan Abstract String matching is the most

More information

GPUs and Emerging Architectures

GPUs and Emerging Architectures GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs

More information

Next generation sequencing: assembly by mapping reads. Laurent Falquet, Vital-IT Helsinki, June 3, 2010

Next generation sequencing: assembly by mapping reads. Laurent Falquet, Vital-IT Helsinki, June 3, 2010 Next generation sequencing: assembly by mapping reads Laurent Falquet, Vital-IT Helsinki, June 3, 2010 Overview What is assembly by mapping? Methods BWT File formats Tools Issues Visualization Discussion

More information

Review of Recent NGS Short Reads Alignment Tools BMI-231 final project, Chenxi Chen Spring 2014

Review of Recent NGS Short Reads Alignment Tools BMI-231 final project, Chenxi Chen Spring 2014 Review of Recent NGS Short Reads Alignment Tools BMI-231 final project, Chenxi Chen Spring 2014 Deciphering the information contained in DNA sequences began decades ago since the time of Sanger sequencing.

More information

Accelerating Implicit LS-DYNA with GPU

Accelerating Implicit LS-DYNA with GPU Accelerating Implicit LS-DYNA with GPU Yih-Yih Lin Hewlett-Packard Company Abstract A major hindrance to the widespread use of Implicit LS-DYNA is its high compute cost. This paper will show modern GPU,

More information

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand

More information

A GPU Algorithm for Comparing Nucleotide Histograms

A GPU Algorithm for Comparing Nucleotide Histograms A GPU Algorithm for Comparing Nucleotide Histograms Adrienne Breland Harpreet Singh Omid Tutakhil Mike Needham Dickson Luong Grant Hennig Roger Hoang Torborn Loken Sergiu M. Dascalu Frederick C. Harris,

More information

An FPGA-Based Systolic Array to Accelerate the BWA-MEM Genomic Mapping Algorithm

An FPGA-Based Systolic Array to Accelerate the BWA-MEM Genomic Mapping Algorithm An FPGA-Based Systolic Array to Accelerate the BWA-MEM Genomic Mapping Algorithm Ernst Joachim Houtgast, Vlad-Mihai Sima, Koen Bertels and Zaid Al-Ars Faculty of EEMCS, Delft University of Technology,

More information

Accelerating the Hough Transform with CUDA on Graphics Processing Units

Accelerating the Hough Transform with CUDA on Graphics Processing Units Accelerating the Hough Transform with CUDA on Graphics Processing Units Su Chen and Hai Jiang Department of Computer Science, Arkansas State University, Jonesboro, AR 72467, USA Abstract Circle detection

More information

NVIDIA s Compute Unified Device Architecture (CUDA)

NVIDIA s Compute Unified Device Architecture (CUDA) NVIDIA s Compute Unified Device Architecture (CUDA) Mike Bailey mjb@cs.oregonstate.edu Reaching the Promised Land NVIDIA GPUs CUDA Knights Corner Speed Intel CPUs General Programmability 1 History of GPU

More information

NVIDIA s Compute Unified Device Architecture (CUDA)

NVIDIA s Compute Unified Device Architecture (CUDA) NVIDIA s Compute Unified Device Architecture (CUDA) Mike Bailey mjb@cs.oregonstate.edu Reaching the Promised Land NVIDIA GPUs CUDA Knights Corner Speed Intel CPUs General Programmability History of GPU

More information

GPU Accelerated API for Alignment of Genomics Sequencing Data

GPU Accelerated API for Alignment of Genomics Sequencing Data GPU Accelerated API for Alignment of Genomics Sequencing Data Nauman Ahmed, Hamid Mushtaq, Koen Bertels and Zaid Al-Ars Computer Engineering Laboratory, Delft University of Technology, Delft, The Netherlands

More information

I519 Introduction to Bioinformatics. Indexing techniques. Yuzhen Ye School of Informatics & Computing, IUB

I519 Introduction to Bioinformatics. Indexing techniques. Yuzhen Ye School of Informatics & Computing, IUB I519 Introduction to Bioinformatics Indexing techniques Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Contents We have seen indexing technique used in BLAST Applications that rely

More information

GPU > CPU. FOR HIGH PERFORMANCE COMPUTING PRESENTATION BY - SADIQ PASHA CHETHANA DILIP

GPU > CPU. FOR HIGH PERFORMANCE COMPUTING PRESENTATION BY - SADIQ PASHA CHETHANA DILIP GPU > CPU. FOR HIGH PERFORMANCE COMPUTING PRESENTATION BY - SADIQ PASHA CHETHANA DILIP INTRODUCTION or With the exponential increase in computational power of todays hardware, the complexity of the problem

More information

Performance potential for simulating spin models on GPU

Performance potential for simulating spin models on GPU Performance potential for simulating spin models on GPU Martin Weigel Institut für Physik, Johannes-Gutenberg-Universität Mainz, Germany 11th International NTZ-Workshop on New Developments in Computational

More information

Multithreaded FPGA Acceleration of DNA Sequence Mapping

Multithreaded FPGA Acceleration of DNA Sequence Mapping Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward B. Fernandez, Walid A. Najjar, Stefano Lonardi University of California Riverside Riverside, USA {efernand,najjar,lonardi}@cs.ucr.edu Jason

More information

Bioinformatics I. Teaching assistant(s): Eudes Barbosa Markus List

Bioinformatics I. Teaching assistant(s): Eudes Barbosa Markus List Bioinformatics I Lecturer: Jan Baumbach Teaching assistant(s): Eudes Barbosa Markus List Question How can we study protein/dna binding events on a genome-wide scale? 2 Outline Short outline/intro to ChIP-Sequencing

More information

The rcuda middleware and applications

The rcuda middleware and applications The rcuda middleware and applications Will my application work with rcuda? rcuda currently provides binary compatibility with CUDA 5.0, virtualizing the entire Runtime API except for the graphics functions,

More information

Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs

Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs Anas Abu-Doleh 1,2, Erik Saule 1, Kamer Kaya 1 and Ümit V. Çatalyürek 1,2 1 Department of Biomedical Informatics 2 Department of Electrical

More information

Hardware Acceleration of Genetic Sequence Alignment

Hardware Acceleration of Genetic Sequence Alignment Hardware Acceleration of Genetic Sequence Alignment J. Arram 1,K.H.Tsoi 1, Wayne Luk 1,andP.Jiang 2 1 Department of Computing, Imperial College London, United Kingdom 2 Department of Chemical Pathology,

More information

Read Mapping. Slides by Carl Kingsford

Read Mapping. Slides by Carl Kingsford Read Mapping Slides by Carl Kingsford Bowtie Ultrafast and memory-efficient alignment of short DNA sequences to the human genome Ben Langmead, Cole Trapnell, Mihai Pop and Steven L Salzberg, Genome Biology

More information

CS427 Multicore Architecture and Parallel Computing

CS427 Multicore Architecture and Parallel Computing CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:

More information

(software agnostic) Computational Considerations

(software agnostic) Computational Considerations (software agnostic) Computational Considerations The Issues CPU GPU Emerging - FPGA, Phi, Nervana Storage Networking CPU 2 Threads core core Processor/Chip Processor/Chip Computer CPU Threads vs. Cores

More information

Latency Masking Threads on FPGAs

Latency Masking Threads on FPGAs Latency Masking Threads on FPGAs Walid Najjar UC Riverside & Jacquard Computing Inc. Credits } Edward B. Fernandez (UCR) } Dr. Jason Villarreal (Jacquard Computing) } Adrian Park (Jacquard Computing) }

More information

Applications of Berkeley s Dwarfs on Nvidia GPUs

Applications of Berkeley s Dwarfs on Nvidia GPUs Applications of Berkeley s Dwarfs on Nvidia GPUs Seminar: Topics in High-Performance and Scientific Computing Team N2: Yang Zhang, Haiqing Wang 05.02.2015 Overview CUDA The Dwarfs Dynamic Programming Sparse

More information

Processing Genomics Data: High Performance Computing meets Big Data. Jan Fostier

Processing Genomics Data: High Performance Computing meets Big Data. Jan Fostier Processing Genomics Data: High Performance Computing meets Big Data Jan Fostier Traditional HPC way of doing things Communication network (Infiniband) Lots of communication c c c c c Lots of computations

More information

BLAST & Genome assembly

BLAST & Genome assembly BLAST & Genome assembly Solon P. Pissis Tomáš Flouri Heidelberg Institute for Theoretical Studies May 15, 2014 1 BLAST What is BLAST? The algorithm 2 Genome assembly De novo assembly Mapping assembly 3

More information

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS 1 Last time Each block is assigned to and executed on a single streaming multiprocessor (SM). Threads execute in groups of 32 called warps. Threads in

More information

HPC Enabling R&D at Philip Morris International

HPC Enabling R&D at Philip Morris International HPC Enabling R&D at Philip Morris International Jim Geuther*, Filipe Bonjour, Bruce O Neel, Didier Bouttefeux, Sylvain Gubian, Stephane Cano, and Brian Suomela * Philip Morris International IT Service

More information

IBM Power AC922 Server

IBM Power AC922 Server IBM Power AC922 Server The Best Server for Enterprise AI Highlights More accuracy - GPUs access system RAM for larger models Faster insights - significant deep learning speedups Rapid deployment - integrated

More information

CMSC423: Bioinformatic Algorithms, Databases and Tools. Exact string matching: Suffix trees Suffix arrays

CMSC423: Bioinformatic Algorithms, Databases and Tools. Exact string matching: Suffix trees Suffix arrays CMSC423: Bioinformatic Algorithms, Databases and Tools Exact string matching: Suffix trees Suffix arrays Searching multiple strings Can we search multiple strings at the same time? Would it help if we

More information

Accelerating Image Feature Comparisons using CUDA on Commodity Hardware

Accelerating Image Feature Comparisons using CUDA on Commodity Hardware Accelerating Image Feature Comparisons using CUDA on Commodity Hardware Seth Warn, Wesley Emeneker, John Gauch, Jackson Cothren, Amy Apon University of Arkansas 1 Outline Background GPU kernel implementation

More information

Trends in HPC (hardware complexity and software challenges)

Trends in HPC (hardware complexity and software challenges) Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18

More information

Coordinating More Than 3 Million CUDA Threads for Social Network Analysis. Adam McLaughlin

Coordinating More Than 3 Million CUDA Threads for Social Network Analysis. Adam McLaughlin Coordinating More Than 3 Million CUDA Threads for Social Network Analysis Adam McLaughlin Applications of interest Computational biology Social network analysis Urban planning Epidemiology Hardware verification

More information

Advances of parallel computing. Kirill Bogachev May 2016

Advances of parallel computing. Kirill Bogachev May 2016 Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being

More information

PacketShader: A GPU-Accelerated Software Router

PacketShader: A GPU-Accelerated Software Router PacketShader: A GPU-Accelerated Software Router Sangjin Han In collaboration with: Keon Jang, KyoungSoo Park, Sue Moon Advanced Networking Lab, CS, KAIST Networked and Distributed Computing Systems Lab,

More information

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI CMPE 655- MULTIPLE PROCESSOR SYSTEMS OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI What is MULTI PROCESSING?? Multiprocessing is the coordinated processing

More information

On the Efficacy of Haskell for High Performance Computational Biology

On the Efficacy of Haskell for High Performance Computational Biology On the Efficacy of Haskell for High Performance Computational Biology Jacqueline Addesa Academic Advisors: Jeremy Archuleta, Wu chun Feng 1. Problem and Motivation Biologists can leverage the power of

More information

GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS

GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS Agenda Forming a GPGPU WG 1 st meeting Future meetings Activities Forming a GPGPU WG To raise needs and enhance information sharing A platform for knowledge

More information

called Hadoop Distribution file System (HDFS). HDFS is designed to run on clusters of commodity hardware and is capable of handling large files. A fil

called Hadoop Distribution file System (HDFS). HDFS is designed to run on clusters of commodity hardware and is capable of handling large files. A fil Parallel Genome-Wide Analysis With Central And Graphic Processing Units Muhamad Fitra Kacamarga mkacamarga@binus.edu James W. Baurley baurley@binus.edu Bens Pardamean bpardamean@binus.edu Abstract The

More information

CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging

CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging Saoni Mukherjee, Nicholas Moore, James Brock and Miriam Leeser September 12, 2012 1 Outline Introduction to CT Scan, 3D reconstruction

More information

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,

More information

Resolving Load Balancing Issues in BWA on NUMA Multicore Architectures

Resolving Load Balancing Issues in BWA on NUMA Multicore Architectures Resolving Load Balancing Issues in BWA on NUMA Multicore Architectures Charlotte Herzeel 1,4, Thomas J. Ashby 1,4 Pascal Costanza 3,4, and Wolfgang De Meuter 2 1 imec, Kapeldreef 75, B-3001 Leuven, Belgium,

More information

Scalable RNA Sequencing on Clusters of Multicore Processors

Scalable RNA Sequencing on Clusters of Multicore Processors JOAQUÍN DOPAZO JOAQUÍN TARRAGA SERGIO BARRACHINA MARÍA ISABEL CASTILLO HÉCTOR MARTÍNEZ ENRIQUE S. QUINTANA ORTÍ IGNACIO MEDINA INTRODUCTION DNA Exon 0 Exon 1 Exon 2 Intron 0 Intron 1 Reads Sequencing RNA

More information

Tesla GPU Computing A Revolution in High Performance Computing

Tesla GPU Computing A Revolution in High Performance Computing Tesla GPU Computing A Revolution in High Performance Computing Mark Harris, NVIDIA Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction to Tesla CUDA Architecture Programming & Memory

More information

Parallel Mapping Approaches for GNUMAP

Parallel Mapping Approaches for GNUMAP 2011 IEEE International Parallel & Distributed Processing Symposium Parallel Mapping Approaches for GNUMAP Nathan L. Clement, Mark J. Clement, Quinn Snell and W. Evan Johnson Department of Computer Science

More information

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367

More information

Approaches to Parallel Computing

Approaches to Parallel Computing Approaches to Parallel Computing K. Cooper 1 1 Department of Mathematics Washington State University 2019 Paradigms Concept Many hands make light work... Set several processors to work on separate aspects

More information

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied

More information

Accelerating image registration on GPUs

Accelerating image registration on GPUs Accelerating image registration on GPUs Harald Köstler, Sunil Ramgopal Tatavarty SIAM Conference on Imaging Science (IS10) 13.4.2010 Contents Motivation: Image registration with FAIR GPU Programming Combining

More information

Heterogeneous Hardware/Software Acceleration of the BWA-MEM DNA Alignment Algorithm

Heterogeneous Hardware/Software Acceleration of the BWA-MEM DNA Alignment Algorithm Heterogeneous Hardware/Software Acceleration of the BWA-MEM DNA Alignment Algorithm Nauman Ahmed, Vlad-Mihai Sima, Ernst Houtgast, Koen Bertels and Zaid Al-Ars Computer Engineering Lab, Delft University

More information

Accelrys Pipeline Pilot and HP ProLiant servers

Accelrys Pipeline Pilot and HP ProLiant servers Accelrys Pipeline Pilot and HP ProLiant servers A performance overview Technical white paper Table of contents Introduction... 2 Accelrys Pipeline Pilot benchmarks on HP ProLiant servers... 2 NGS Collection

More information

BLAST & Genome assembly

BLAST & Genome assembly BLAST & Genome assembly Solon P. Pissis Tomáš Flouri Heidelberg Institute for Theoretical Studies November 17, 2012 1 Introduction Introduction 2 BLAST What is BLAST? The algorithm 3 Genome assembly De

More information

Optimization solutions for the segmented sum algorithmic function

Optimization solutions for the segmented sum algorithmic function Optimization solutions for the segmented sum algorithmic function ALEXANDRU PÎRJAN Department of Informatics, Statistics and Mathematics Romanian-American University 1B, Expozitiei Blvd., district 1, code

More information

GPU Programming. Lecture 1: Introduction. Miaoqing Huang University of Arkansas 1 / 27

GPU Programming. Lecture 1: Introduction. Miaoqing Huang University of Arkansas 1 / 27 1 / 27 GPU Programming Lecture 1: Introduction Miaoqing Huang University of Arkansas 2 / 27 Outline Course Introduction GPUs as Parallel Computers Trend and Design Philosophies Programming and Execution

More information

Accelerated Machine Learning Algorithms in Python

Accelerated Machine Learning Algorithms in Python Accelerated Machine Learning Algorithms in Python Patrick Reilly, Leiming Yu, David Kaeli reilly.pa@husky.neu.edu Northeastern University Computer Architecture Research Lab Outline Motivation and Goals

More information

Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions

Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Ziming Zhong Vladimir Rychkov Alexey Lastovetsky Heterogeneous Computing

More information

Technology for a better society. hetcomp.com

Technology for a better society. hetcomp.com Technology for a better society hetcomp.com 1 J. Seland, C. Dyken, T. R. Hagen, A. R. Brodtkorb, J. Hjelmervik,E Bjønnes GPU Computing USIT Course Week 16th November 2011 hetcomp.com 2 9:30 10:15 Introduction

More information

A TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE

A TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE A TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE Abu Asaduzzaman, Deepthi Gummadi, and Chok M. Yip Department of Electrical Engineering and Computer Science Wichita State University Wichita, Kansas, USA

More information

SlopMap: a software application tool for quick and flexible identification of similar sequences using exact k-mer matching

SlopMap: a software application tool for quick and flexible identification of similar sequences using exact k-mer matching SlopMap: a software application tool for quick and flexible identification of similar sequences using exact k-mer matching Ilya Y. Zhbannikov 1, Samuel S. Hunter 1,2, Matthew L. Settles 1,2, and James

More information

Howdah. a flexible pipeline framework and applications to analyzing genomic data. Steven Lewis PhD

Howdah. a flexible pipeline framework and applications to analyzing genomic data. Steven Lewis PhD Howdah a flexible pipeline framework and applications to analyzing genomic data Steven Lewis PhD slewis@systemsbiology.org What is a Howdah? A howdah is a carrier for an elephant The idea is that multiple

More information

Efficient Computation of Radial Distribution Function on GPUs

Efficient Computation of Radial Distribution Function on GPUs Efficient Computation of Radial Distribution Function on GPUs Yi-Cheng Tu * and Anand Kumar Department of Computer Science and Engineering University of South Florida, Tampa, Florida 2 Overview Introduction

More information

Lecture 1: Introduction and Computational Thinking

Lecture 1: Introduction and Computational Thinking PASI Summer School Advanced Algorithmic Techniques for GPUs Lecture 1: Introduction and Computational Thinking 1 Course Objective To master the most commonly used algorithm techniques and computational

More information

The Optimal CPU and Interconnect for an HPC Cluster

The Optimal CPU and Interconnect for an HPC Cluster 5. LS-DYNA Anwenderforum, Ulm 2006 Cluster / High Performance Computing I The Optimal CPU and Interconnect for an HPC Cluster Andreas Koch Transtec AG, Tübingen, Deutschland F - I - 15 Cluster / High Performance

More information

Parallelism. Parallel Hardware. Introduction to Computer Systems

Parallelism. Parallel Hardware. Introduction to Computer Systems Parallelism We have been discussing the abstractions and implementations that make up an individual computer system in considerable detail up to this point. Our model has been a largely sequential one,

More information

High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs

High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs Gordon Erlebacher Department of Scientific Computing Sept. 28, 2012 with Dimitri Komatitsch (Pau,France) David Michea

More information

BlueDBM: An Appliance for Big Data Analytics*

BlueDBM: An Appliance for Big Data Analytics* BlueDBM: An Appliance for Big Data Analytics* Arvind *[ISCA, 2015] Sang-Woo Jun, Ming Liu, Sungjin Lee, Shuotao Xu, Arvind (MIT) and Jamey Hicks, John Ankcorn, Myron King(Quanta) BigData@CSAIL Annual Meeting

More information

High Performance Computing on GPUs using NVIDIA CUDA

High Performance Computing on GPUs using NVIDIA CUDA High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and

More information

Building NVLink for Developers

Building NVLink for Developers Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized

More information

INTRODUCING NVBIO: HIGH PERFORMANCE PRIMITIVES FOR COMPUTATIONAL GENOMICS. Jonathan Cohen, NVIDIA Nuno Subtil, NVIDIA Jacopo Pantaleoni, NVIDIA

INTRODUCING NVBIO: HIGH PERFORMANCE PRIMITIVES FOR COMPUTATIONAL GENOMICS. Jonathan Cohen, NVIDIA Nuno Subtil, NVIDIA Jacopo Pantaleoni, NVIDIA INTRODUCING NVBIO: HIGH PERFORMANCE PRIMITIVES FOR COMPUTATIONAL GENOMICS Jonathan Cohen, NVIDIA Nuno Subtil, NVIDIA Jacopo Pantaleoni, NVIDIA SEQUENCING AND MOORE S LAW Slide courtesy Illumina DRAM I/F

More information

Accelerating InDel Detection on Modern Multi-Core SIMD CPU Architecture

Accelerating InDel Detection on Modern Multi-Core SIMD CPU Architecture Accelerating InDel Detection on Modern Multi-Core SIMD CPU Architecture Da Zhang Collaborators: Hao Wang, Kaixi Hou, Jing Zhang Advisor: Wu-chun Feng Evolution of Genome Sequencing1 In 20032: 1 human genome

More information

Mathematical computations with GPUs

Mathematical computations with GPUs Master Educational Program Information technology in applications Mathematical computations with GPUs GPU architecture Alexey A. Romanenko arom@ccfit.nsu.ru Novosibirsk State University GPU Graphical Processing

More information

GPU for HPC. October 2010

GPU for HPC. October 2010 GPU for HPC Simone Melchionna Jonas Latt Francis Lapique October 2010 EPFL/ EDMX EPFL/EDMX EPFL/DIT simone.melchionna@epfl.ch jonas.latt@epfl.ch francis.lapique@epfl.ch 1 Moore s law: in the old days,

More information

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE WHITEPAPER DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE A Detailed Review ABSTRACT While tape has been the dominant storage medium for data protection for decades because of its low cost, it is steadily

More information

On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters

On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters 1 On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters N. P. Karunadasa & D. N. Ranasinghe University of Colombo School of Computing, Sri Lanka nishantha@opensource.lk, dnr@ucsc.cmb.ac.lk

More information

Lam, TW; Li, R; Tam, A; Wong, S; Wu, E; Yiu, SM.

Lam, TW; Li, R; Tam, A; Wong, S; Wu, E; Yiu, SM. Title High throughput short read alignment via bi-directional BWT Author(s) Lam, TW; Li, R; Tam, A; Wong, S; Wu, E; Yiu, SM Citation The IEEE International Conference on Bioinformatics and Biomedicine

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors

Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors Michael Boyer, David Tarjan, Scott T. Acton, and Kevin Skadron University of Virginia IPDPS 2009 Outline Leukocyte

More information

Accelerating the Prediction of Protein Interactions

Accelerating the Prediction of Protein Interactions Accelerating the Prediction of Protein Interactions Alex Rodionov, Jonathan Rose, Elisabeth R.M. Tillier, Alexandr Bezginov October 21 21 Motivation The human genome is sequenced, but we don't know what

More information

Architectures for Scalable Media Object Search

Architectures for Scalable Media Object Search Architectures for Scalable Media Object Search Dennis Sng Deputy Director & Principal Scientist NVIDIA GPU Technology Workshop 10 July 2014 ROSE LAB OVERVIEW 2 Large Database of Media Objects Next- Generation

More information

Introduction to Read Alignment. UCD Genome Center Bioinformatics Core Tuesday 15 September 2015

Introduction to Read Alignment. UCD Genome Center Bioinformatics Core Tuesday 15 September 2015 Introduction to Read Alignment UCD Genome Center Bioinformatics Core Tuesday 15 September 2015 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG

More information

School of Computer and Information Science

School of Computer and Information Science School of Computer and Information Science CIS Research Placement Report Multiple threads in floating-point sort operations Name: Quang Do Date: 8/6/2012 Supervisor: Grant Wigley Abstract Despite the vast

More information

Paralization on GPU using CUDA An Introduction

Paralization on GPU using CUDA An Introduction Paralization on GPU using CUDA An Introduction Ehsan Nedaaee Oskoee 1 1 Department of Physics IASBS IPM Grid and HPC workshop IV, 2011 Outline 1 Introduction to GPU 2 Introduction to CUDA Graphics Processing

More information

OPEN MP-BASED PARALLEL AND SCALABLE GENETIC SEQUENCE ALIGNMENT

OPEN MP-BASED PARALLEL AND SCALABLE GENETIC SEQUENCE ALIGNMENT OPEN MP-BASED PARALLEL AND SCALABLE GENETIC SEQUENCE ALIGNMENT Asif Ali Khan*, Laiq Hassan*, Salim Ullah* ABSTRACT: In bioinformatics, sequence alignment is a common and insistent task. Biologists align

More information

A Comprehensive Study on the Performance of Implicit LS-DYNA

A Comprehensive Study on the Performance of Implicit LS-DYNA 12 th International LS-DYNA Users Conference Computing Technologies(4) A Comprehensive Study on the Performance of Implicit LS-DYNA Yih-Yih Lin Hewlett-Packard Company Abstract This work addresses four

More information

N-Body Simulation using CUDA. CSE 633 Fall 2010 Project by Suraj Alungal Balchand Advisor: Dr. Russ Miller State University of New York at Buffalo

N-Body Simulation using CUDA. CSE 633 Fall 2010 Project by Suraj Alungal Balchand Advisor: Dr. Russ Miller State University of New York at Buffalo N-Body Simulation using CUDA CSE 633 Fall 2010 Project by Suraj Alungal Balchand Advisor: Dr. Russ Miller State University of New York at Buffalo Project plan Develop a program to simulate gravitational

More information

Processing Technology of Massive Human Health Data Based on Hadoop

Processing Technology of Massive Human Health Data Based on Hadoop 6th International Conference on Machinery, Materials, Environment, Biotechnology and Computer (MMEBC 2016) Processing Technology of Massive Human Health Data Based on Hadoop Miao Liu1, a, Junsheng Yu1,

More information

Halvade: scalable sequence analysis with MapReduce

Halvade: scalable sequence analysis with MapReduce Bioinformatics Advance Access published March 26, 2015 Halvade: scalable sequence analysis with MapReduce Dries Decap 1,5, Joke Reumers 2,5, Charlotte Herzeel 3,5, Pascal Costanza, 4,5 and Jan Fostier

More information

Better Security Tool Designs: Brainpower, Massive Threading, and Languages

Better Security Tool Designs: Brainpower, Massive Threading, and Languages Better Security Tool Designs: Brainpower, Massive Threading, and Languages Golden G. Richard III Professor and University Research Professor Department of Computer Science University of New Orleans Founder

More information

Short Read Alignment Algorithms

Short Read Alignment Algorithms Short Read Alignment Algorithms Raluca Gordân Department of Biostatistics and Bioinformatics Department of Computer Science Department of Molecular Genetics and Microbiology Center for Genomic and Computational

More information

Comparative Analysis of Protein Alignment Algorithms in Parallel environment using CUDA

Comparative Analysis of Protein Alignment Algorithms in Parallel environment using CUDA Comparative Analysis of Protein Alignment Algorithms in Parallel environment using BLAST versus Smith-Waterman Shadman Fahim shadmanbracu09@gmail.com Shehabul Hossain rudrozzal@gmail.com Gulshan Jubaed

More information

GPGPU introduction and network applications. PacketShaders, SSLShader

GPGPU introduction and network applications. PacketShaders, SSLShader GPGPU introduction and network applications PacketShaders, SSLShader Agenda GPGPU Introduction Computer graphics background GPGPUs past, present and future PacketShader A GPU-Accelerated Software Router

More information

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) NATALIA GIMELSHEIN ANSHUL GUPTA STEVE RENNICH SEID KORIC NVIDIA IBM NVIDIA NCSA WATSON SPARSE MATRIX PACKAGE (WSMP) Cholesky, LDL T, LU factorization

More information

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture

More information

JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING

JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING Larson Hogstrom, Mukarram Tahir, Andres Hasfura Massachusetts Institute of Technology, Cambridge, Massachusetts, USA 18.337/6.338

More information