ENABLING NEW SCIENCE GPU SOLUTIONS

Size: px

Start display at page:

Download "ENABLING NEW SCIENCE GPU SOLUTIONS"

Shavonne Evans
6 years ago
Views:

ENABLING NEW SCIENCE TESLA BIO Workbench The NVIDIA Tesla Bio Workbench enables biophysicists and computational chemists to push the boundaries of life sciences research.

1 ENABLING NEW SCIENCE TESLA BIO Workbench The NVIDIA Tesla Bio Workbench enables biophysicists and computational chemists to push the boundaries of life sciences research. It turns a standard PC into a computational laboratory capable of running complex bioscience codes, in fields such as drug discovery and DNA sequencing, more than -2 times faster through the use of NVIDIA Tesla GPUs. It consists of bioscience applications; a community site for downloading, discussing, and viewing the results of these applications; and GPU-based platforms that enable these applications to run at /th the cost of CPU-only computers. Complex molecular simulations that had been only possible using supercomputing resources can now be run on an individual workstation, optimizing the scientific workflow and accelerating the pace of research. These simulations can also be scaled up to GPU-based clusters of servers to simulate large molecules and systems that would have otherwise required a supercomputer. GPU SOLUTIONS The Tesla Bio Workbench applications can be deployed on GPU-based desktop personal supercomputers or in data center solutions. Built on the revolutionary, massively parallel CUDA GPU computing architecture, these solutions are designed to accelerate the pace of computational science and are available today. Visit to find out more about: WORKSTATION SOLUTIONS: Applications that are accelerated on GPUs include: TESLA PERSONAL SUPERCOMPUTER For personal supercomputing at your desk Molecular Dynamics & Quantum Chemistry DATA CENTER SOLUTIONS: - AMBER, GROMACS, HOOMD, LAMMPS, NAMD, TeraChem (Quantum Chemistry), VMD Bio Informatics - CUDA-BLASTP, CUDA-EC, CUDA-MEME, CUDASW++ (Smith-Waterman), GPU-HMMER, MUMmerGPU For more information, visit TESLA GPU COMPUTING CLUSTERS For computing with large-scale installations

GPU-BASED MOLECULAR DYNAMICS & QUANTUM CHEMISTRY APPLICATIONS AMBER The explicit solvent and implicit solvent simulations in AMBER have been accelerated using CUDA-enabled GPUs.

2 GPU-BASED MOLECULAR DYNAMICS & QUANTUM CHEMISTRY APPLICATIONS AMBER The explicit solvent and implicit solvent simulations in AMBER have been accelerated using CUDA-enabled GPUs. Paired with a Tesla GPU computing solution built on the CUDA architecture, it enables more than x speedup compared to a single quad-core CPU. Benchmark data for explicit solvent PME and implicit solvent GB Benchmarks. Details can be found on the AMBER NVIDIA benchmark page courtesy of the San Diego Supercomputing Center. HOOMD HOOMD-blue is a general purpose particle dynamics package written from ground-up to take advantage of the revolutionary CUDA architecture in NVIDIA GPUs. It contains a number of different force fields and integrators and its object oriented design enables adding additional ones easily. Detailed benchmarks are available on the HOOMD-blue page. One Tesla GPU running HOOMD-blue can outperform 32 CPU cores. HOOMD on GPUs vs LAMMPS on 32 CPU cores ns/day DHFR NVE 23,8 atoms 2.7 ns/day Nucleosome 2,9 atoms GB-.4 Time steps/sec CPU cores GPU 32 CPU cores GPU LAMMS on 32 x 86 CPU cores HOOMD-BLUE on Tesla -series GPU HOOMD-BLUE on 2 Tesla -series GPUs 8 CPU cores Tesla C6 Tesla C2 CPU = 2x Quad-core Intel Xeon E 462, ifort., MKL., MPICHZ GPU = Tesla C6 / Tesla C2, Gfortran 4., nvcc v3. Polymer (64K particles) Lennard-Jones Liquid (64K particles) Data courtesy of University of Michigan GROMACS GROMACS is a molecular dynamics package designed primarily for simulation of biochemical molecules like proteins, lipids, and nucleic acids that have a lot complicated bonded interactions. The CUDA port of GROMACS enabling GPU acceleration is now available in beta and supports Particle- Mesh-Ewald (PME), arbitrary forms of non-bonded interactions, and implicit solvent Generalized Born methods. The CUDA version of GROMACS currently supports a single GPU and produces the following results: steps/sec Particle-Mesh-Ewald (PME) Solvated Spectrin (37.8K atoms) 2x Intel Quad Core E462 Tesla C6 GPU Spectrin Dimer (7.6K atoms) steps/sec Reaction-Field Cutoffs x Faster NaCl (894 atoms) LAMMPS LAMMPS is a classical molecular dynamics package written to run well on parallel machines. The CUDA version of LAMMPS is accelerated by moving the force calculations to the GPU. LAMMPS on the GPU scales very well as seen by the results below. Two Tesla GPUs running GPU-LAMMPS outperforms 24 CPUs. Femtosec simulated/sec LAMMPS Results on CPU vs GPU Clusters Number of Quad-Core CPUs or Number of Tesla GPUs = 24 CPUs CPUs Data courtesy of Scott Hampton & Pratul K. Agarwal, Oak Ridge National Laboratory Data courtesy of Stockholm Center for Biomembrane Research

3 GPU-BASED MOLECULAR DYNAMICS & QUANTUM CHEMISTRY APPLICATIONS NAMD The Team at University of Illinois at Urbana-Champaign (UIUC) has been enabling CUDA-acceleration on NAMD since 27. They have performed scaling experiments on the NCSA Tesla-based Lincoln cluster and demonstrated that 4 Tesla GPUs can outperform a cluster with 6 quad-core CPUs. NAMD scales very well on a Tesla GPU cluster as demonstrated by these results. VMD VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting. Several key kernels and applications in VMD now take advantage of the massively parallel CUDA architecture of NVIDIA s GPUs. These applications run 2x to x faster when using a NVIDIA CUDA GPU compared to running them on a CPU only. NAMD Results on CPU vs GPU Clusters Molecular Orbital Computate in VMD STMV Step/s GPUs = 6 CPUs 3 Lattice points/sec C6-b (carbon-6) C6-a (carbon-6) Thr-b Thr-a (threonine-7) (threonine-7) Intel Core 2 Q66 (Quad) 2.4 GHz Tesla C6 GPU Data courtesy of Theoretical and Computational Bio-physics Group, UIUC Number of Quad-Core CPUs or Number of Tesla GPUs Data courtesy of Theoretical and Computation Bio-physics Group, UIUC TeraChem TeraChem is the first general purpose quantum chemistry software package designed ground up specifically to take advantage of the massively parallel CUDA architecture of NVIDIA GPUs. TeraChem currently supports density functional theory (DFT) and Hartree-Fock (HF) methods. TeraChem supports multiple GPUs using p-threads and will soon be MPI enabled. A workstation with 4 Tesla GPUs running TeraChem outperforms 26 quad-core CPUs running GAMESS. of TeraChem on GPU vs GAMESS on CPU Olestra x Vallnomyclin x Taxol 7x Buckyball 8x 2x Intel Quad Core 2 CPU 4x Tesla C6 GPU Data courtesy of Petachem

4 GPU-BASED BIO-INFORMATICS APPLICATIONS CUDA-BLASTP CUDA-BLASTP is designed to accelerate NCBI BLAST for scanning protein sequence databases by taking advantage of the massively parallel CUDA architecture of NVIDIA Tesla GPUs. CUDA-BLASTP also has a utility to convert FASTA format database into files readable by CUDA-BLASTP. CUDA-BLASTP running on a workstation with two Tesla C6 GPUs is x faster than NCBI BLAST (2.2.22) running on an Intel i7-92 CPU. This cuts compute time from minutes on CPUs to seconds using GPUs. CUDA-MEME CUDA-MEME is a motif discovery software based on MEME (version 3..4). It accelerates MEME by taking advantage of the massively parallel CUDA architecture of NVIDIA Tesla GPUs. It supports the OOPS and ZOOPS models in MEME. CUDA-MEME running on one Tesla C6 GPU is up to 23x faster than MEME running on a x86 CPU. This cuts compute time from hours on CPUs to minutes using GPUs. The data in the chart below are for the OOPS (one occurrence per sequence) model for 4 datasets. CUDA-BLASTP vs NCBI BLASTP s CUDA-MEME vs MEME s sec 3.6 sec 8. sec 2 mins 7 mins 4 hours sec 2 sec NCBI BLASTP: Thread Intel i7-92 CPU NCBI BLASTP: 4 Threads Intel i7-92 CPU CUDA BLASTP: Tesla C6 GPU CUDA BLASTP: 2 Tesla C6 GPUs 2 22 mins Query Sequence Length Mini-drosoph 4 sequences HS_ HS_2 sequences 2 sequences HS_4 4 sequences AMD Opteron 2.2 GHz Tesla C6 GPU CUDA-EC CUDA-EC is a fast parallel sequence error correction tool for short reads. It corrects sequencing errors in high-throughput short-read (HTSR) data and accelerates HTSR by taking advantage of the massively parallel CUDA architecture of NVIDIA Tesla GPUs. Error correction is a preprocessing step for many DNA fragment assembly tools and is very useful for the new high-throughput sequencing machines. CUDA-EC running on one Tesla C6 GPU is up to 2x faster than Euler-SR running on a x86 CPU. This cuts compute time from minutes on CPUs to seconds using GPUs. The data in the chart below are error rates of %, 2%, and 3% denoted by A, A2, A3 and so on for different reference genomes. CUDASW++ (Smith-Waterman) CUDASW++ is a bio-informatics software for Smith-Waterman protein database searches that takes advantage of the massively parallel CUDA architecture of NVIDIA Tesla GPUs to perform sequence searches x-x faster than NCBI BLAST. CUDASW++ supports query lengths up to 9K. CUDASW++ can take advantage of multiple Tesla GPUs and runs between x-x faster than NCBI BLAST, getting up to 3 GCUPs on query lengths of over. Whereas BLAST is a heuristic algorithm, CUDASW++ does optimal local alignment using the Smith-Waterman algorithm. CUDASW++ vs NCBI BLAST 2 CUDA-EC vs Euler-SR s CUDASW++ 2. : BLOSUM4 2 GCUPS 2 GPU: BLOSUM BLAST on CPU: BLOSUM62 BLAST on CPU: BLOSUM BLAST BLOSUM -3k BLAST BLOSUM62-2k Tesla C6 GPU (CUDASW++) 2 Tesla C6 GPUs (CUDASW++) Euler-SR: AMD Opteron 2.2 GHz CUDA-EC: Tesla C6 Query Sequence Length A A2 A3 B B2 B3 C C2 C3 D D2 E S. cer.8m S. cer 7.M H. inf.9m E. col 4.7M S. aures 4.7M Read Length =3 Error rates of %, 2%, 3%

5 GPU-HMMER GPU-HMMER is a bioinformatics software that does protein sequence alignment using profile HMMs by taking advantage of the massively parallel CUDA architecture of NVIDIA Tesla GPUs. GPU-HMMER is 6-x faster than HMMER (2.). GPU-HMMER accelerates the hmmsearch tool using GPUs and gets speed-ups ranging from 6-x. GPU-HMMER can take advantage of multiple Tesla GPUs in a workstation to reduce the search from hours on a CPU to minutes using a GPU. MUMmerGPU MUMmerGPU is a bio-informatics software for high-throughput sequence alignment using GPUs. It accelerates the alignment of multiple query sequences against a single reference sequence by taking advantage of the massively parallel CUDA architecture of NVIDIA Tesla GPUs. MUMmerGPU (version 2.) provides 3x-4x speedup compared to running MUMmer on CPUs. The data shown in the chart below is based on MUMmerGPU version. and will soon be updated. HMMER: 6-x faster MUMmerGPU vs MUMmer 2 mins 8 mins 8 2 mins mins 27 mins 3 GPUs 6 4 mins 37 mins 4 2 mins 69 mins GPUs 2 mins 2 mins 28 mins CPU # of States (HMM Size) Data courtesy of Scalable Informatics and University of Buffalo Opteron GHz Tesla C6 GPU 2 Tesla C6 GPUs 3 Tesla C6 GPUs S. suis L. monocytogenes C. briggsae 3 mins 6 mins mins mins Time (minutes) 22 mins Tesla GPU (8-Series) Intel Xeon 2 (3. GHz) mins Data courtesy of University of Maryland To learn more about the NVIDIA Tesla Bio Workbench, go to Partner Logo 2 NVIDIA Corporation. All rights reserved. NVIDIA, the NVIDIA logo, NVIDIA Tesla, and CUDA, are trademarks and/or registered trademarks of NVIDIA Corporation. All company and product names are trademarks or registered trademarks of the respective owners with which they are associated.

TESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications

TESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications TESLA P PERFORMANCE GUIDE HPC and Deep Learning Applications MAY 217 TESLA P PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important