Accelerating Nanopore Sequencing Using AI and Volta
|
|
- Amberly Annabel Marshall
- 5 years ago
- Views:
Transcription
1 Accelerating Nanopore Sequencing Using AI and Volta Chuck Seberino Director of Accelerated Computing Roche Sequencing Solutions, Santa Clara S GPU Technology Conference 2018
2 Disclaimer This presentation contains information on products which may be in development and not yet cleared or approved, by the FDA, or available in your country. Products discussed in the presentation may be: For Life Science Research Use Only. Not for diagnostic procedures. For Research Use Only. Not for use in diagnostic procedures. A Laboratory Developed Test (LDT) offer by a laboratory certified under the Clinical Laboratory Improvement Amendments (CLIA). As with other LDTs, this test service has not been cleared or approved by the US FDA GPU Technology Conference S8947 March 29, 2018 page 1 Roche
3 What Are We Here For? Nanopore Sequencing Primer Sequencing Requirements Neural Network Approach Challenges 2018 GPU Technology Conference S8947 March 29, 2018 page 2 Roche
4 Nanopore Sequencing Primer Sequencing Requirements Neural Network Approach Challenges 2018 GPU Technology Conference S8947 March 29, 2018 page 3 Roche
5 Roche s Next Generation Sequencing (NGS) A powerful combination of electronics and molecular biology Nanopore Integrated Circuit (IC) Single Molecule Sequencing Short and Long Read capabilities Scalable, Electrical Detection Low Cost components 2018 GPU Technology Conference S8947 March 29, 2018 page 4 Roche
6 Integrated Circuits A Scalable Solution Enables broad applications and throughput on a single platform 2000x 500,000x 1,250,000x 1.7 million x magnification 1,700,000x GPU Technology Conference S8947 March 29, 2018 page 5 Roche
7 Single Molecule Sequencing Enables short and long reads Polymerase Engineered enzyme for synthesizing DNA Nanopore Very precise opening for sensing the nucleotide tag Nucleotide Tags Serves as an extension of the DNA base for detection in the nanopore GPU Technology Conference S8947 March 29, 2018 page 6 Roche
8 Data Representation Sequencing turns into a large signal processing problem to identify two required components: 1. Ability to determine 4 separate nucleotide levels 2. Ability to distinguish nucleotide progression 2018 GPU Technology Conference S8947 March 29, 2018 page 7 Roche
9 Nanopore Sequencing Primer Sequencing Requirements Neural Network Approach Challenges 2018 GPU Technology Conference S8947 March 29, 2018 page 8 Roche
10 GPU Analysis Pipeline Data Flow FPGA GPU CPU Operations DMA to GPU Copy to Staging Buffer Pipeline Processing Transpose & Pack Copy to CPU Write to Disk Threads Input & Staging Primary Analysis Packing Output Writer Memory Locations GPU DMA Buffers Staging Buffers Computed Results Transposed Results Host Results RAW HDF5 PNG BAM 2018 GPU Technology Conference S8947 March 29, 2018 page 9 Roche
11 Nanopore Sequencing Primer Sequencing Requirements Neural Network Approach Challenges 2018 GPU Technology Conference S8947 March 29, 2018 page 10 Roche
12 Primary Analysis Pipeline Data stored directly from FPGA into GPU memory using GPUDirect Incoming data is copied into blocks to properly feed to CNN. Run CNN Run RNN Run CTC Decoder Finalize and copy data to CPU and write to disk. DMA to GPU Batch Data Batch Data Batch Data Batch Data CNN RNN Decoder Copy to CPU Data Flow 2018 GPU Technology Conference S8947 March 29, 2018 page 11 Roche
13 Advances in GPU Generations Normalized to M Hardware Speedup on FP M6000 P6000 GP100 V100 NCHW Speedup NHWC Speedup 2018 GPU Technology Conference S8947 March 29, 2018 page 12 Roche
14 FP32 vs FP16 and TensorCore Acceleration Normalized to V100 FP32 2 Hardware Speedup FP32 vs FP V100 FP32 GP100 FP32 GP100 FP16 V100 FP16 notc TRUE HALF V100 FP16 notc PSEUDO HALF V100 FP16 with TC NCHW Speedup NHWC Speedup 2018 GPU Technology Conference S8947 March 29, 2018 page 13 Roche
15 Nanopore Sequencing Primer Sequencing Requirements Neural Network Approach Challenges 2018 GPU Technology Conference S8947 March 29, 2018 page 14 Roche
16 Challenges Must be able to complete all necessary processing within time budget CUDA currently doesn t support true hardware partitioning, making hard real-time more challenging Our cudnn batchsize is too large, which requires it to be broken into smaller blocks of work Test against different cudnn data formats to ensure maximum performance! 2018 GPU Technology Conference S8947 March 29, 2018 page 15 Roche
17 Doing now what patients need next For Roche Internal Use Only Do Not Distribute 14 March 2017 page 16 Roche
NVIDIA DATA LOADING LIBRARY (DALI)
NVIDIA DATA LOADING LIBRARY (DALI) RN-09096-001 _v01 September 2018 Release Notes TABLE OF CONTENTS Chapter Chapter Chapter Chapter Chapter 1. 2. 3. 4. 5. DALI DALI DALI DALI DALI Overview...1 Release
More informationTESLA V100 PERFORMANCE GUIDE. Life Sciences Applications
TESLA V100 PERFORMANCE GUIDE Life Sciences Applications NOVEMBER 2017 TESLA V100 PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important
More informationDeep Learning on Modern Architectures. Keren Zhou 4/17/2017
Deep Learning on Modern Architectures Keren Zhou 4/17/2017 HPC Software Stack Application Algorithm Data Layout CPU GPU MIC Others HPC Software Stack Deep Learning Algorithm Data Layout CPU GPU MIC Others
More informationAccelerating Convolutional Neural Nets. Yunming Zhang
Accelerating Convolutional Neural Nets Yunming Zhang Focus Convolutional Neural Nets is the state of the art in classifying the images The models take days to train Difficult for the programmers to tune
More informationGUNREAL: GPU-accelerated UNsupervised REinforcement and Auxiliary Learning
GUNREAL: GPU-accelerated UNsupervised REinforcement and Auxiliary Learning Koichi Shirahata, Youri Coppens, Takuya Fukagai, Yasumoto Tomita, and Atsushi Ike FUJITSU LABORATORIES LTD. March 27, 2018 0 Deep
More informationResearch Faculty Summit Systems Fueling future disruptions
Research Faculty Summit 2018 Systems Fueling future disruptions Wolong: A Back-end Optimizer for Deep Learning Computation Jilong Xue Researcher, Microsoft Research Asia System Challenge in Deep Learning
More informationCUDA. Matthew Joyner, Jeremy Williams
CUDA Matthew Joyner, Jeremy Williams Agenda What is CUDA? CUDA GPU Architecture CPU/GPU Communication Coding in CUDA Use cases of CUDA Comparison to OpenCL What is CUDA? What is CUDA? CUDA is a parallel
More informationTACKLING THE CHALLENGES OF NEXT GENERATION HEALTHCARE
TACKLING THE CHALLENGES OF NEXT GENERATION HEALTHCARE Nicola Rieke, Senior Deep Learning Solution Architect Healthcare EMEA Fausto Milletari, Senior Deep Learning Solution Architect Healthcare NALA INTRODUCTION
More informationMachine Learning on VMware vsphere with NVIDIA GPUs
Machine Learning on VMware vsphere with NVIDIA GPUs Uday Kurkure, Hari Sivaraman, Lan Vu GPU Technology Conference 2017 2016 VMware Inc. All rights reserved. Gartner Hype Cycle for Emerging Technology
More informationLegUp: Accelerating Memcached on Cloud FPGAs
0 LegUp: Accelerating Memcached on Cloud FPGAs Xilinx Developer Forum December 10, 2018 Andrew Canis & Ruolong Lian LegUp Computing Inc. 1 COMPUTE IS BECOMING SPECIALIZED 1 GPU Nvidia graphics cards are
More informationPersistent RNNs. (stashing recurrent weights on-chip) Gregory Diamos. April 7, Baidu SVAIL
(stashing recurrent weights on-chip) Baidu SVAIL April 7, 2016 SVAIL Think hard AI. Goal Develop hard AI technologies that impact 100 million users. Deep Learning at SVAIL 100 GFLOP/s 1 laptop 6 TFLOP/s
More informationTENSORRT. RN _v01 January Release Notes
TENSORRT RN-08624-030_v01 January 2018 Release Notes TABLE OF CONTENTS Chapter Chapter Chapter Chapter 1. 2. 3. 4. Overview...1 Release 3.0.2... 2 Release 3.0.1... 4 Release 2.1... 10 RN-08624-030_v01
More informationHigh-Performance Data Loading and Augmentation for Deep Neural Network Training
High-Performance Data Loading and Augmentation for Deep Neural Network Training Trevor Gale tgale@ece.neu.edu Steven Eliuk steven.eliuk@gmail.com Cameron Upright c.upright@samsung.com Roadmap 1. The General-Purpose
More informationProcessing Genomics Data: High Performance Computing meets Big Data. Jan Fostier
Processing Genomics Data: High Performance Computing meets Big Data Jan Fostier Traditional HPC way of doing things Communication network (Infiniband) Lots of communication c c c c c Lots of computations
More informationNeural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks
Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks Charles Eckert Xiaowei Wang Jingcheng Wang Arun Subramaniyan Ravi Iyer Dennis Sylvester David Blaauw Reetuparna Das M-Bits Research
More informationGPU FOR DEEP LEARNING. 周国峰 Wuhan University 2017/10/13
GPU FOR DEEP LEARNING chandlerz@nvidia.com 周国峰 Wuhan University 2017/10/13 Why Deep Learning Boost Today? Nvidia SDK for Deep Learning? Agenda CUDA 8.0 cudnn TensorRT (GIE) NCCL DIGITS 2 Why Deep Learning
More informationAccelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs
Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Ritchie Zhao 1, Weinan Song 2, Wentao Zhang 2, Tianwei Xing 3, Jeng-Hau Lin 4, Mani Srivastava 3, Rajesh Gupta 4, Zhiru
More informationScaling Deep Learning. Bryan
Scaling Deep Learning @ctnzr What do we want AI to do? Guide us to content Keep us organized Help us find things Help us communicate 帮助我们沟通 Drive us to work Serve drinks? Image Q&A Baidu IDL Sample questions
More informationImplementing Long-term Recurrent Convolutional Network Using HLS on POWER System
Implementing Long-term Recurrent Convolutional Network Using HLS on POWER System Xiaofan Zhang1, Mohamed El Hadedy1, Wen-mei Hwu1, Nam Sung Kim1, Jinjun Xiong2, Deming Chen1 1 University of Illinois Urbana-Champaign
More informationTOWARDS ACCELERATED DEEP LEARNING IN HPC AND HYPERSCALE ARCHITECTURES Environnement logiciel pour l apprentissage profond dans un contexte HPC
TOWARDS ACCELERATED DEEP LEARNING IN HPC AND HYPERSCALE ARCHITECTURES Environnement logiciel pour l apprentissage profond dans un contexte HPC TERATECH Juin 2017 Gunter Roth, François Courteille DRAMATIC
More informationNGS NEXT GENERATION SEQUENCING
NGS NEXT GENERATION SEQUENCING Paestum (Sa) 15-16 -17 maggio 2014 Relatore Dr Cataldo Senatore Dr.ssa Emilia Vaccaro Sanger Sequencing Reactions For given template DNA, it s like PCR except: Uses only
More informationefficient data ingestion March 27th 2018
efficient data ingestion March 27th 2018 Data Processing at the Speed of Thought fastdata.io inc. Santa Monica Seattle Performance Goals!Must be limited to hardware constraint!disk, Network and PCI bus
More informationSpeculations about Computer Architecture in Next Three Years. Jan. 20, 2018
Speculations about Computer Architecture in Next Three Years shuchang.zhou@gmail.com Jan. 20, 2018 About me https://zsc.github.io/ Source-to-source transformation Cache simulation Compiler Optimization
More informationMaximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman
Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency with ML accelerators Michael
More informationWorld s most advanced data center accelerator for PCIe-based servers
NVIDIA TESLA P100 GPU ACCELERATOR World s most advanced data center accelerator for PCIe-based servers HPC data centers need to support the ever-growing demands of scientists and researchers while staying
More informationNVIDIA TESLA V100 GPU ARCHITECTURE THE WORLD S MOST ADVANCED DATA CENTER GPU
NVIDIA TESLA V100 GPU ARCHITECTURE THE WORLD S MOST ADVANCED DATA CENTER GPU WP-08608-001_v1.1 August 2017 WP-08608-001_v1.1 TABLE OF CONTENTS Introduction to the NVIDIA Tesla V100 GPU Architecture...
More informationNVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS
TECHNICAL OVERVIEW NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS A Guide to the Optimized Framework Containers on NVIDIA GPU Cloud Introduction Artificial intelligence is helping to solve some of the most
More informationTackling the Challenges of Big Data! Tackling The Challenges of Big Data. This Module. Samuel Madden. Samuel Madden. Visualizing Twitter
Samuel Madden Professor and Director of Big Data at CSAIL Massachusetts Institute of Technology Introduction to Twitter Data Samuel Madden Professor and Director of Big Data at CSAIL Massachusetts Institute
More informationAdaptable Computing The Future of FPGA Acceleration. Dan Gibbons, VP Software Development June 6, 2018
Adaptable Computing The Future of FPGA Acceleration Dan Gibbons, VP Software Development June 6, 2018 Adaptable Accelerated Computing Page 2 Three Big Trends The Evolution of Computing Trend to Heterogeneous
More informationNVJPEG. DA _v0.1.4 August nvjpeg Libary Guide
NVJPEG DA-06762-001_v0.1.4 August 2018 Libary Guide TABLE OF CONTENTS Chapter 1. Introduction...1 Chapter 2. Using the Library... 3 2.1. Single Image Decoding... 3 2.3. Batched Image Decoding... 6 2.4.
More informationSnapdragon NPE Overview
March 2018 Linaro Connect Hong Kong Snapdragon NPE Overview Mark Charlebois Director, Engineering Qualcomm Technologies, Inc. Caffe2 Snapdragon Neural Processing Engine Efficient execution on Snapdragon
More informationCafeGPI. Single-Sided Communication for Scalable Deep Learning
CafeGPI Single-Sided Communication for Scalable Deep Learning Janis Keuper itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Deep Neural Networks
More informationTESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications
TESLA P PERFORMANCE GUIDE HPC and Deep Learning Applications MAY 217 TESLA P PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important
More informationDEEP LEARNING ACCELERATOR UNIT WITH HIGH EFFICIENCY ON FPGA
DEEP LEARNING ACCELERATOR UNIT WITH HIGH EFFICIENCY ON FPGA J.Jayalakshmi 1, S.Ali Asgar 2, V.Thrimurthulu 3 1 M.tech Student, Department of ECE, Chadalawada Ramanamma Engineering College, Tirupati Email
More informationS8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer
S8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer 2 100 倍以上速く 本当に可能ですか? 2 DOUGLAS ADAMS BABEL FISH Neural Machine Translation Unit 3 4 OVER 100X FASTER, IS IT REALLY POSSIBLE?
More informationAccelerating Data Centers Using NVMe and CUDA
Accelerating Data Centers Using NVMe and CUDA Stephen Bates, PhD Technical Director, CSTO, PMC-Sierra Santa Clara, CA 1 Project Donard @ PMC-Sierra Donard is a PMC CTO project that leverages NVM Express
More informationM.Tech Student, Department of ECE, S.V. College of Engineering, Tirupati, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 5 ISSN : 2456-3307 High Performance Scalable Deep Learning Accelerator
More informationRecurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks
Deep neural networks have enabled major advances in machine learning and AI Computer vision Language translation Speech recognition Question answering And more Problem: DNNs are challenging to serve and
More informationImmersive Out-of-Core Visualization of Large-Size and Long-Timescale Molecular Dynamics Trajectories
Immersive Out-of-Core Visualization of Large-Size and Long-Timescale Molecular Dynamics Trajectories J. Stone, K. Vandivort, K. Schulten Theoretical and Computational Biophysics Group Beckman Institute
More informationGPU LIBRARY ADVISOR. DA _v8.0 September Application Note
GPU LIBRARY ADVISOR DA-06762-001_v8.0 September 2016 Application Note TABLE OF CONTENTS Chapter 1. Overview... 1 Chapter 2. Usage... 2 DA-06762-001_v8.0 ii Chapter 1. OVERVIEW The NVIDIA is a cross-platform
More informationCloud Computing with FPGA-based NVMe SSDs
Cloud Computing with FPGA-based NVMe SSDs Bharadwaj Pudipeddi, CTO NVXL Santa Clara, CA 1 Choice of NVMe Controllers ASIC NVMe: Fully off-loaded, consistent performance, M.2 or U.2 form factor ASIC OpenChannel:
More informationBandwidth-Centric Deep Learning Processing through Software-Hardware Co-Design
Bandwidth-Centric Deep Learning Processing through Software-Hardware Co-Design Song Yao 姚颂 Founder & CEO DeePhi Tech 深鉴科技 song.yao@deephi.tech Outline - About DeePhi Tech - Background - Bandwidth Matters
More informationNVJPEG. DA _v0.2.0 October nvjpeg Libary Guide
NVJPEG DA-06762-001_v0.2.0 October 2018 Libary Guide TABLE OF CONTENTS Chapter 1. Introduction...1 Chapter 2. Using the Library... 3 2.1. Single Image Decoding... 3 2.3. Batched Image Decoding... 6 2.4.
More informationMIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius
MIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius What is Mixed Precision Training? Reduced precision tensor math with FP32 accumulation, FP16 storage Successfully used to train a variety
More informationAsynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features
Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features Xu SUN ( 孙栩 ) Peking University xusun@pku.edu.cn Motivation Neural networks -> Good Performance CNN, RNN, LSTM
More informationCMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN
CMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN Graphics Processing Unit Accelerate the creation of images in a frame buffer intended for the output
More informationEFFICIENT INFERENCE WITH TENSORRT. Han Vanholder
EFFICIENT INFERENCE WITH TENSORRT Han Vanholder AI INFERENCING IS EXPLODING 2 Trillion Messages Per Day On LinkedIn 500M Daily active users of iflytek 140 Billion Words Per Day Translated by Google 60
More informationSPARSE PERSISTENT RNN. Feiwen Zhu, 5/9/2017
SPARSE PERSISTENT RNN Feiwen Zhu, 5/9/2017 Motivation Introduction Algorithm AGENDA Naïve Implementation Optimizations Experiments Conclusion 2 MOTIVATION Exploit sparsity for faster, larger networks Recurrent
More informationPorting CPU-based Multiprocessing Algorithms to GPU for Distributed Acoustic Sensing
GTC2014 S4470 Porting CPU-based Multiprocessing Algorithms to GPU for Distributed Acoustic Sensing Steve Jankly Halliburton Energy Services, Inc. Introduction Halliburton Halliburton is one of the world
More informationA 3-D CPU-FPGA-DRAM Hybrid Architecture for Low-Power Computation
A 3-D CPU-FPGA-DRAM Hybrid Architecture for Low-Power Computation Abstract: The power budget is expected to limit the portion of the chip that we can power ON at the upcoming technology nodes. This problem,
More informationFlash Storage Complementing a Data Lake for Real-Time Insight
Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum
More informationDistributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability
Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability Janis Keuper Itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern,
More informationTUNING CUDA APPLICATIONS FOR MAXWELL
TUNING CUDA APPLICATIONS FOR MAXWELL DA-07173-001_v7.0 March 2015 Application Note TABLE OF CONTENTS Chapter 1. Maxwell Tuning Guide... 1 1.1. NVIDIA Maxwell Compute Architecture... 1 1.2. CUDA Best Practices...2
More informationParallel Computing with MATLAB
Parallel Computing with MATLAB CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University
More informationUnrolling parallel loops
Unrolling parallel loops Vasily Volkov UC Berkeley November 14, 2011 1 Today Very simple optimization technique Closely resembles loop unrolling Widely used in high performance codes 2 Mapping to GPU:
More informationCharacterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager
Characterization and Benchmarking of Deep Learning Natalia Vassilieva, PhD Sr. Research Manager Deep learning applications Vision Speech Text Other Search & information extraction Security/Video surveillance
More informationTENSORRT. RN _v01 June Release Notes
TENSORRT RN-08624-030_v01 June 2018 Release Notes TABLE OF CONTENTS Chapter Chapter Chapter Chapter Chapter Chapter 1. 2. 3. 4. 5. 6. Overview...1 Release 4.0.1... 2 Release 3.0.4... 6 Release 3.0.2...
More informationTuring Architecture and CUDA 10 New Features. Minseok Lee, Developer Technology Engineer, NVIDIA
Turing Architecture and CUDA 10 New Features Minseok Lee, Developer Technology Engineer, NVIDIA Turing Architecture New SM Architecture Multi-Precision Tensor Core RT Core Turing MPS Inference Accelerated,
More informationIntel PSG (Altera) Enabling the SKA Community. Lance Brown Sr. Strategic & Technical Marketing Mgr.
Intel PSG (Altera) Enabling the SKA Community Lance Brown Sr. Strategic & Technical Marketing Mgr. lbrown@altera.com, 719-291-7280 Agenda Intel Programmable Solutions Group (Altera) PSG s COTS Strategy
More informationRenderscript Accelerated Advanced Image and Video Processing on ARM Mali T-600 GPUs. Lihua Zhang, Ph.D. MulticoreWare Inc.
Renderscript Accelerated Advanced Image and Video Processing on ARM Mali T-600 GPUs Lihua Zhang, Ph.D. MulticoreWare Inc. lihua@multicorewareinc.com Overview More & more mobile apps are beginning to require
More informationBuilding the Most Efficient Machine Learning System
Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide
More informationCUDNN. DU _v07 May Developer Guide
CUDNN DU-06702-001_v07 May 2018 Developer Guide TABLE OF CONTENTS Chapter 1. Overview... 1 Chapter 2. General Description... 2 2.1. Programming Model...2 2.2. Notation... 2 2.3. Tensor Descriptor... 3
More informationTESLA P6. PB _v02 August Product Brief
TESLA P6 PB-08482-001_v02 August 2017 Product Brief DOCUMENT CHANGE HISTORY PB-08482-001_v02 Version Date Authors Description of Change 01 March 24, 2017 VK, DV Initial release 02 August 31, 2017 VK, DV
More informationGPUBwa -Parallelization of Burrows Wheeler Aligner using Graphical Processing Units
GPUBwa -Parallelization of Burrows Wheeler Aligner using Graphical Processing Units Abstract A very popular discipline in bioinformatics is Next-Generation Sequencing (NGS) or DNA sequencing. It specifies
More informationUsing MPI One-sided Communication to Accelerate Bioinformatics Applications
Using MPI One-sided Communication to Accelerate Bioinformatics Applications Hao Wang (hwang121@vt.edu) Department of Computer Science, Virginia Tech Next-Generation Sequencing (NGS) Data Analysis NGS Data
More informationDell EMC PowerEdge R740xd as a Dedicated Milestone Server, Using Nvidia GPU Hardware Acceleration
Dell EMC PowerEdge R740xd as a Dedicated Milestone Server, Using Nvidia GPU Hardware Acceleration Dell IP Video Platform Design and Calibration Lab June 2018 H17250 Reference Architecture Abstract This
More informationDeep Learning Compiler
Deep Learning Compiler AWS AI Acknowledgement Amazon Sagemaker Neo Enables developers to train machine learning models once and run them anywhere in the cloud and at the edge Hardware targets Intel CPU,
More informationAsynchronous Peer-to-Peer Device Communication
13th ANNUAL WORKSHOP 2017 Asynchronous Peer-to-Peer Device Communication Feras Daoud, Leon Romanovsky [ 28 March, 2017 ] Agenda Peer-to-Peer communication PeerDirect technology PeerDirect and PeerDirect
More informationIBM Spectrum Scale IO performance
IBM Spectrum Scale 5.0.0 IO performance Silverton Consulting, Inc. StorInt Briefing 2 Introduction High-performance computing (HPC) and scientific computing are in a constant state of transition. Artificial
More informationTUNING CUDA APPLICATIONS FOR MAXWELL
TUNING CUDA APPLICATIONS FOR MAXWELL DA-07173-001_v6.5 August 2014 Application Note TABLE OF CONTENTS Chapter 1. Maxwell Tuning Guide... 1 1.1. NVIDIA Maxwell Compute Architecture... 1 1.2. CUDA Best Practices...2
More informationLAMMPSCUDA GPU Performance. April 2011
LAMMPSCUDA GPU Performance April 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Dell, Intel, Mellanox Compute resource - HPC Advisory Council
More informationDeep learning in MATLAB From Concept to CUDA Code
Deep learning in MATLAB From Concept to CUDA Code Roy Fahn Applications Engineer Systematics royf@systematics.co.il 03-7660111 Ram Kokku Principal Engineer MathWorks ram.kokku@mathworks.com 2017 The MathWorks,
More informationELE 375 Final Exam Fall, 2000 Prof. Martonosi
ELE 375 Final Exam Fall, 2000 Prof. Martonosi Question Score 1 /10 2 /20 3 /15 4 /15 5 /10 6 /20 7 /20 8 /25 9 /30 10 /30 11 /30 12 /15 13 /10 Total / 250 Please write your answers clearly in the space
More informationIBM Deep Learning Solutions
IBM Deep Learning Solutions Reference Architecture for Deep Learning on POWER8, P100, and NVLink October, 2016 How do you teach a computer to Perceive? 2 Deep Learning: teaching Siri to recognize a bicycle
More informationInterconnect Your Future
Interconnect Your Future Paving the Road to Exascale August 2017 Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric (Offload) Must Wait for the Data
More informationCUDA Optimizations WS Intelligent Robotics Seminar. Universität Hamburg WS Intelligent Robotics Seminar Praveen Kulkarni
CUDA Optimizations WS 2014-15 Intelligent Robotics Seminar 1 Table of content 1 Background information 2 Optimizations 3 Summary 2 Table of content 1 Background information 2 Optimizations 3 Summary 3
More informationMACHINE LEARNING WITH NVIDIA AND IBM POWER AI
MACHINE LEARNING WITH NVIDIA AND IBM POWER AI July 2017 Joerg Krall Sr. Business Ddevelopment Manager MFG EMEA jkrall@nvidia.com A NEW ERA OF COMPUTING AI & IOT Deep Learning, GPU 100s of billions of devices
More informationParallel Processing SIMD, Vector and GPU s cont.
Parallel Processing SIMD, Vector and GPU s cont. EECS4201 Fall 2016 York University 1 Multithreading First, we start with multithreading Multithreading is used in GPU s 2 1 Thread Level Parallelism ILP
More informationIN-MEMORY ASSOCIATIVE COMPUTING
IN-MEMORY ASSOCIATIVE COMPUTING AVIDAN AKERIB, GSI TECHNOLOGY AAKERIB@GSITECHNOLOGY.COM AGENDA The AI computational challenge Introduction to associative computing Examples An NLP use case What s next?
More informationENDURING DIFFERENTIATION Timothy Lanfear
ENDURING DIFFERENTIATION Timothy Lanfear WHERE ARE WE? 2 LIFE AFTER DENNARD SCALING GPU-ACCELERATED PERFORMANCE 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 10 4 10 3 10 2 Single-threaded perf
More informationENDURING DIFFERENTIATION. Timothy Lanfear
ENDURING DIFFERENTIATION Timothy Lanfear WHERE ARE WE? 2 LIFE AFTER DENNARD SCALING 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 10 4 Transistors (thousands) 1.1X per year 10 3 10 2 Single-threaded
More informationPRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory
Scalable and Energy-Efficient Architecture Lab (SEAL) PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in -based Main Memory Ping Chi *, Shuangchen Li *, Tao Zhang, Cong
More informationXPU A Programmable FPGA Accelerator for Diverse Workloads
XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for
More informationBlazer Pro V2.0 Software Requirements & Hardware Performance
Blazer Pro V2.0 Software Requirements & Hardware Performance Contents Chapter 1 Software Requirements... 2 Chapter 2 Client Performance... 3 Chapter 3 Server Performance... 6 3.1 Video Surveillance Management
More informationNVDIA DGX Data Center Reference Design
White Paper NVDIA DGX Data Center Reference Design Easy Deployment of DGX Servers for Deep Learning 2018-07-19 2018 NVIDIA Corporation. Contents Abstract ii 1. AI Workflow and Sizing 1 2. NVIDIA AI Software
More informationDNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs
IBM Research AI Systems Day DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs Xiaofan Zhang 1, Junsong Wang 2, Chao Zhu 2, Yonghua Lin 2, Jinjun Xiong 3, Wen-mei
More informationMartin Dubois, ing. Contents
Martin Dubois, ing Contents Without OpenNet vs With OpenNet Technical information Possible applications Artificial Intelligence Deep Packet Inspection Image and Video processing Network equipment development
More informationGraph Streaming Processor
Graph Streaming Processor A Next-Generation Computing Architecture Val G. Cook Chief Software Architect Satyaki Koneru Chief Technology Officer Ke Yin Chief Scientist Dinakar Munagala Chief Executive Officer
More informationIntroduction to Computer Science Lecture 2: Data Manipulation
Introduction to Computer Science Lecture 2: Data Manipulation Tian-Li Yu Taiwan Evolutionary Intelligence Laboratory (TEIL) Department of Electrical Engineering National Taiwan University tianliyu@cc.ee.ntu.edu.tw
More informationBuilding a Controller That Can Handle Any Type of Flash
Building a Controller That Can Handle Any Type of Flash Brent Przybus Sr. Director easic Corporation Santa Clara, CA 1 A Changing Flash Enabled Landscape Flash Is Needed for Enterprise Storage Flash is
More informationNVIDIA nforce 790i SLI Chipsets
Technical Brief NVIDIA nforce 790i SLI Chipsets Reducing Latencies & Bandwidth Utilization March 2008 TB-03897-001_v01 Introduction The NVIDIA nforce 790i SLI chipset features an improved communication
More informationTwo FPGA-DNN Projects: 1. Low Latency Multi-Layer Perceptrons using FPGAs 2. Acceleration of CNN Training on FPGA-based Clusters
Two FPGA-DNN Projects: 1. Low Latency Multi-Layer Perceptrons using FPGAs 2. Acceleration of CNN Training on FPGA-based Clusters *Argonne National Lab +BU & USTC Presented by Martin Herbordt Work by Ahmed
More informationDeep Learning: Transforming Engineering and Science The MathWorks, Inc.
Deep Learning: Transforming Engineering and Science 1 2015 The MathWorks, Inc. DEEP LEARNING: TRANSFORMING ENGINEERING AND SCIENCE A THE NEW RISE ERA OF OF GPU COMPUTING 3 NVIDIA A IS NEW THE WORLD S ERA
More informationDesign methodology for multi processor systems design on regular platforms
Design methodology for multi processor systems design on regular platforms Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri Outline
More informationBuilding Real-Time Professional Visualization Solutions on GPUs. Kristof Denolf Samuel Maroy Ronny Dewaele
Building Real-Time Professional Visualization Solutions on GPUs Kristof Denolf Samuel Maroy Ronny Dewaele Page 2 Outline Barco s professional visualization solutions The need for performance portability
More informationTESLA P100 PERFORMANCE GUIDE. Deep Learning and HPC Applications
TESLA P PERFORMANCE GUIDE Deep Learning and HPC Applications SEPTEMBER 217 TESLA P PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important
More informationKaisen Lin and Michael Conley
Kaisen Lin and Michael Conley Simultaneous Multithreading Instructions from multiple threads run simultaneously on superscalar processor More instruction fetching and register state Commercialized! DEC
More informationDeep Learning Accelerators
Deep Learning Accelerators Abhishek Srivastava (as29) Samarth Kulshreshtha (samarth5) University of Illinois, Urbana-Champaign Submitted as a requirement for CS 433 graduate student project Outline Introduction
More informationHardware/Software Co-Design
1 / 27 Hardware/Software Co-Design Miaoqing Huang University of Arkansas Fall 2011 2 / 27 Outline 1 2 3 3 / 27 Outline 1 2 3 CSCE 5013-002 Speical Topic in Hardware/Software Co-Design Instructor Miaoqing
More informationGPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC
GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of
More information