Accelerating Nanopore Sequencing Using AI and Volta

Size: px

Start display at page:

Download "Accelerating Nanopore Sequencing Using AI and Volta"

Amberly Annabel Marshall
5 years ago
Views:

1 Accelerating Nanopore Sequencing Using AI and Volta Chuck Seberino Director of Accelerated Computing Roche Sequencing Solutions, Santa Clara S GPU Technology Conference 2018

2 Disclaimer This presentation contains information on products which may be in development and not yet cleared or approved, by the FDA, or available in your country. Products discussed in the presentation may be: For Life Science Research Use Only. Not for diagnostic procedures. For Research Use Only. Not for use in diagnostic procedures. A Laboratory Developed Test (LDT) offer by a laboratory certified under the Clinical Laboratory Improvement Amendments (CLIA). As with other LDTs, this test service has not been cleared or approved by the US FDA GPU Technology Conference S8947 March 29, 2018 page 1 Roche

3 What Are We Here For? Nanopore Sequencing Primer Sequencing Requirements Neural Network Approach Challenges 2018 GPU Technology Conference S8947 March 29, 2018 page 2 Roche

4 Nanopore Sequencing Primer Sequencing Requirements Neural Network Approach Challenges 2018 GPU Technology Conference S8947 March 29, 2018 page 3 Roche

Roche s Next Generation Sequencing (NGS) A powerful

Nanopore Integrated Circuit (IC) Single Molecule

Scalable, Electrical Detection Low Cost components

5 Roche s Next Generation Sequencing (NGS) A powerful combination of electronics and molecular biology Nanopore Integrated Circuit (IC) Single Molecule Sequencing Short and Long Read capabilities Scalable, Electrical Detection Low Cost components 2018 GPU Technology Conference S8947 March 29, 2018 page 4 Roche

6 Integrated Circuits A Scalable Solution Enables broad applications and throughput on a single platform 2000x 500,000x 1,250,000x 1.7 million x magnification 1,700,000x GPU Technology Conference S8947 March 29, 2018 page 5 Roche

Single Molecule Sequencing Enables short and long reads Polymerase Engineered enzyme for synthesizing DNA Nanopore Very precise opening for sensing the

7 Single Molecule Sequencing Enables short and long reads Polymerase Engineered enzyme for synthesizing DNA Nanopore Very precise opening for sensing the nucleotide tag Nucleotide Tags Serves as an extension of the DNA base for detection in the nanopore GPU Technology Conference S8947 March 29, 2018 page 6 Roche

8 Data Representation Sequencing turns into a large signal processing problem to identify two required components: 1. Ability to determine 4 separate nucleotide levels 2. Ability to distinguish nucleotide progression 2018 GPU Technology Conference S8947 March 29, 2018 page 7 Roche

9 Nanopore Sequencing Primer Sequencing Requirements Neural Network Approach Challenges 2018 GPU Technology Conference S8947 March 29, 2018 page 8 Roche

10 GPU Analysis Pipeline Data Flow FPGA GPU CPU Operations DMA to GPU Copy to Staging Buffer Pipeline Processing Transpose & Pack Copy to CPU Write to Disk Threads Input & Staging Primary Analysis Packing Output Writer Memory Locations GPU DMA Buffers Staging Buffers Computed Results Transposed Results Host Results RAW HDF5 PNG BAM 2018 GPU Technology Conference S8947 March 29, 2018 page 9 Roche

11 Nanopore Sequencing Primer Sequencing Requirements Neural Network Approach Challenges 2018 GPU Technology Conference S8947 March 29, 2018 page 10 Roche

Primary Analysis Pipeline Data stored directly

Incoming data is copied into blocks to properly

Run CNN Run RNN Run CTC Decoder Finalize and

DMA to GPU Batch Data Batch Data Batch Data

12 Primary Analysis Pipeline Data stored directly from FPGA into GPU memory using GPUDirect Incoming data is copied into blocks to properly feed to CNN. Run CNN Run RNN Run CTC Decoder Finalize and copy data to CPU and write to disk. DMA to GPU Batch Data Batch Data Batch Data Batch Data CNN RNN Decoder Copy to CPU Data Flow 2018 GPU Technology Conference S8947 March 29, 2018 page 11 Roche

Advances in GPU Generations Normalized to M6000 5 Hardware

13 Advances in GPU Generations Normalized to M Hardware Speedup on FP M6000 P6000 GP100 V100 NCHW Speedup NHWC Speedup 2018 GPU Technology Conference S8947 March 29, 2018 page 12 Roche

FP32 vs FP16 and TensorCore Acceleration

FP32 vs FP16 1.5 1.47 1 1.00 1.00 0.5 0.77 0.

00 V100 FP32 GP100 FP32 GP100 FP16 V100 FP16

V100 FP16 with TC NCHW Speedup NHWC Speedup

14 FP32 vs FP16 and TensorCore Acceleration Normalized to V100 FP32 2 Hardware Speedup FP32 vs FP V100 FP32 GP100 FP32 GP100 FP16 V100 FP16 notc TRUE HALF V100 FP16 notc PSEUDO HALF V100 FP16 with TC NCHW Speedup NHWC Speedup 2018 GPU Technology Conference S8947 March 29, 2018 page 13 Roche

15 Nanopore Sequencing Primer Sequencing Requirements Neural Network Approach Challenges 2018 GPU Technology Conference S8947 March 29, 2018 page 14 Roche

16 Challenges Must be able to complete all necessary processing within time budget CUDA currently doesn t support true hardware partitioning, making hard real-time more challenging Our cudnn batchsize is too large, which requires it to be broken into smaller blocks of work Test against different cudnn data formats to ensure maximum performance! 2018 GPU Technology Conference S8947 March 29, 2018 page 15 Roche

17 Doing now what patients need next For Roche Internal Use Only Do Not Distribute 14 March 2017 page 16 Roche

NVIDIA DATA LOADING LIBRARY (DALI)

NVIDIA DATA LOADING LIBRARY (DALI) RN-09096-001 _v01 September 2018 Release Notes TABLE OF CONTENTS Chapter Chapter Chapter Chapter Chapter 1. 2. 3. 4. 5. DALI DALI DALI DALI DALI Overview...1 Release