Particle Simulations with HOOMD-blue

Size: px
Start display at page:

Download "Particle Simulations with HOOMD-blue"

Transcription

1 Particle Simulations with HOOMD-blue Joshua A. Anderson S6256 NVIDIA GTC April 7, :00-9:50am

2 HOOMD-blue General purpose particle simulation toolkit Molecular dynamics, hard particle Monte Carlo Executes simulations fast on GPUs (also runs on CPUs) Open source >127 peer-reviewed articles 2

3 History CUDA public beta APS March meeting Paper submitted v0.6: C++ benchmarks only, lj and harmonic bonds only v0.7: Python job scripts v0.8: First external contribution - FENE bonds, Langevin dynamics 3

4 History v1.2: Bounding volume hierarchy neighbor list v1.3: Anisotropic particle integrators, wall potentials, load balancing 2016-?? - v2.0: Hard particle Monte Carlo 4

5 Research using HOOMD Quasicrystal formation Active particles Microspheres Digital alchemy Shape allophiles Hexatic phases 5

6 Molecular dynamics ~r i (t) ~q i (t) ~v i (t) ~! i (t) Compute interactions ~r i (t + t) ~q i (t + t) ~v i (t + t) ~! i (t + t) 6

7 Hard particle Monte Carlo (HPMC) ~r i (A) ~q i (A) Random trial move Accept or reject ~r i (B) ~q i (B) Joshua A Anderson, M Eric Irrgang, Sharon C Glotzer (2016) Scalable Metropolis Monte Carlo for simulation of hard shapes, Computer Physics Communications 7

8 HPMC Shape overlap checks + See GTC 2014 talk 8

9 HPMC block level queue Thread ) Time Initialization Trial move Circumsphere check Overlap check Overlap divergence Early exit divergence Time (0.5 ms total) 9

10 HPMC on P100 10

11 GPU strategy in HOOMD Perform all computations on the GPU (see GTC talks) Keep data on the GPU all the time Run multiple threads per particle and warp-level reduction/scan Auto-tune launch parameters during run time (see GTC 2015 talk) 11

12 Architecture User's python script HOOMD python library matplotlib scipy mpi4py... HOOMD C++ library CUDA MPI Hardware 12

13 Installing binaries with conda 13

14 Compile from source Requirements: C++ compiler Python Numpy Boost C++ library CMake Optional: CUDA MPI git sphinx 14

15 Compile from source 15

16 Example: DPD polymers import hoomd from hoomd import md hoomd.context.initialize(); hoomd.init.read_gsd(filename='polymers.gsd') nl = hoomd.md.nlist.cell() dpd = md.pair.dpd(r_cut=1.0, T=1.0, nlist=nl) nl.reset_exclusions() dpd.pair_coeff.set('a', 'A', A=25.0, gamma = 4.5) dpd.pair_coeff.set('a', 'B', A=80.0, gamma = 4.5) dpd.pair_coeff.set('b', 'B', A=25.0, gamma = 4.5) harmonic = hoomd.md.bond.harmonic() harmonic.bond_coeff.set('polymer', k=330.0, r0=0.84) md.integrate.mode_standard(dt=0.02) md.integrate.nve(group=hoomd.group.all()) hoomd.run(10000) 16

17 Running the example 17

18

19 Example: Active matter import hoomd, numpy from hoomd import md hoomd.context.initialize() hoomd.init.read_gsd('active.gsd') all = hoomd.group.all() N = len(all) nl = hoomd.md.nlist.cell() lj = md.pair.lj(r_cut=1.0, nlist=nl) lj.pair_coeff.set('a', 'A', epsilon=1.0, sigma=1.0/2**(1.0/6.0)) activity = [ (((numpy.random.rand(3) - 0.5) * 2.0)) for i in range(n)] for i in range(n): activity[i][2] = 0; activity[i] = tuple(activity[i]) active_force = md.force.active(group=all, seed=123, f_lst=activity, rotation_diff=0.005, orientation_link=false) md.integrate.mode_standard(dt=0.001) bd = md.integrate.brownian(group=all, T=0.0, seed=123, dscale=1.0) hoomd.run(100000) 19

20 20

21 BVH tree overview BVH trees are common in ray tracing HOOMD uses them for neighbor searches (optional) Generate and query on the GPU M. P. Howard, J. A. Anderson, A. Nikoubashman, S. C. Glotzer, and A. Z. Panagiotopoulos, Comput. Phys. Commun., Mar

22 Example: Large and small particles import hoomd, numpy from hoomd import md hoomd.context.initialize() hoomd.init.read_gsd('bigsmall.gsd') nl = md.nlist.tree() lj = md.pair.slj(r_cut=2**(1.0/6.0), nlist=nl) lj.pair_coeff.set('a', 'A', epsilon=1.0, sigma=1.0) md.integrate.mode_standard(dt=0.005) lvn = md.integrate.langevin(group=hoomd.group.all(), T=1.0, seed=123) lvn.set_gamma('a', 1.0) hoomd.run( ) 22

23 23

24 Example: Hard Hexagons import hoomd from hoomd import hpmc hoomd.context.initialize(); hoomd.init.read_gsd(filename='hexagons.gsd') hexagon = [[0.5,0], [0.25, ], [-0.25, ], [-0.5,0], [-0.25, ], [0.25, ]]; mc = hpmc.integrate.convex_polygon(seed=123, d=0.15, a=1); mc.shape_param.set('a', vertices=hexagon); hoomd.run(200) 24

25 25

26 Example: Jupyter notebook 26

27 Additional features Integration: NVE, NVT, NPT, NPH, langevin, brownian, Berendsen, DPD, FIRE minimization Orientational degrees of freedom Pair potentials: CGCMM, DPD, LJ, Gaussian, Mie, Moliere, Morse, Yukawa, ZBL, table Bond: FENE, Harmonic, table, OPLS Anisotropic potentials: Gay-berne, dipole Wall potentials PPPM electrostatics HPMC shapes: convex polygon, simple polygon, convex spheropolygon, convex polyhedron, convex spheropolyhedron, general polyhedron, ellipsoid, differences of spheres, unions of spheres, faceted spheres 27

28 Software engineering Object oriented design Template functors Code review / pull requests Unit testing Validation testing Conda recipes Read the docs 28

29

30

31 Read The Docs 31

32 Acknowledgements HOOMD-blue + HPMC: codeblue.umich.edu/hoomd-blue/ 2.0 will be available soon... Thanks to all HOOMD-blue users and contributors! Research supported by the National Science Foundation, Division of Materials Research Award # DMR

Performance of Applications on Comet GPU Nodes Utilizing MVAPICH2-GDR. Mahidhar Tatineni MVAPICH User Group Meeting August 16, 2017

Performance of Applications on Comet GPU Nodes Utilizing MVAPICH2-GDR. Mahidhar Tatineni MVAPICH User Group Meeting August 16, 2017 Performance of Applications on Comet GPU Nodes Utilizing MVAPICH2-GDR Mahidhar Tatineni MVAPICH User Group Meeting August 16, 2017 This work supported by the National Science Foundation, award ACI-1341698.

More information

Strong Scaling for Molecular Dynamics Applications

Strong Scaling for Molecular Dynamics Applications Strong Scaling for Molecular Dynamics Applications Scaling and Molecular Dynamics Strong Scaling How does solution time vary with the number of processors for a fixed problem size Classical Molecular Dynamics

More information

GPU Optimized Monte Carlo

GPU Optimized Monte Carlo GPU Optimized Monte Carlo Jason Mick, Eyad Hailat, Kamel Rushaidat, Yuanzhe Li, Loren Schwiebert, and Jeffrey J. Potoff Department of Chemical Engineering & Materials Science, and Department of Computer

More information

Machine Learning Software ROOT/TMVA

Machine Learning Software ROOT/TMVA Machine Learning Software ROOT/TMVA LIP Data Science School / 12-14 March 2018 ROOT ROOT is a software toolkit which provides building blocks for: Data processing Data analysis Data visualisation Data

More information

HOOMD-blue Documentation

HOOMD-blue Documentation HOOMD-blue Documentation Release 2.0.0 The Regents of the University of Michigan April 29, 2016 Concepts 1 Compiling HOOMD-blue 3 1.1 Software Prerequisites.......................................... 3

More information

Heterogeneous CPU+GPU Molecular Dynamics Engine in CHARMM

Heterogeneous CPU+GPU Molecular Dynamics Engine in CHARMM Heterogeneous CPU+GPU Molecular Dynamics Engine in CHARMM 25th March, GTC 2014, San Jose CA AnE- Pekka Hynninen ane.pekka.hynninen@nrel.gov NREL is a na*onal laboratory of the U.S. Department of Energy,

More information

Using jupyter notebooks on Blue Waters. Roland Haas (NCSA / University of Illinois)

Using jupyter notebooks on Blue Waters.   Roland Haas (NCSA / University of Illinois) Using jupyter notebooks on Blue Waters https://goo.gl/4eb7qw Roland Haas (NCSA / University of Illinois) Email: rhaas@ncsa.illinois.edu Jupyter notebooks 2/18 interactive, browser based interface to Python

More information

7 DAYS AND 8 NIGHTS WITH THE CARMA DEV KIT

7 DAYS AND 8 NIGHTS WITH THE CARMA DEV KIT 7 DAYS AND 8 NIGHTS WITH THE CARMA DEV KIT Draft Printed for SECO Murex S.A.S 2012 all rights reserved Murex Analytics Only global vendor of trading, risk management and processing systems focusing also

More information

CSinParallel Workshop. OnRamp: An Interactive Learning Portal for Parallel Computing Environments

CSinParallel Workshop. OnRamp: An Interactive Learning Portal for Parallel Computing Environments CSinParallel Workshop : An Interactive Learning for Parallel Computing Environments Samantha Foley ssfoley@cs.uwlax.edu http://cs.uwlax.edu/~ssfoley Josh Hursey jjhursey@cs.uwlax.edu http://cs.uwlax.edu/~jjhursey/

More information

Warped parallel nearest neighbor searches using kd-trees

Warped parallel nearest neighbor searches using kd-trees Warped parallel nearest neighbor searches using kd-trees Roman Sokolov, Andrei Tchouprakov D4D Technologies Kd-trees Binary space partitioning tree Used for nearest-neighbor search, range search Application:

More information

Algorithm Engineering Lab: Ray Tracing. 8. Februar 2018

Algorithm Engineering Lab: Ray Tracing. 8. Februar 2018 Algorithm Engineering Lab: Ray Tracing Jenette Sellin Markus Pawellek 8. Februar 2018 Gliederung Goal of the Project Ray Tracing Background Starting Point Setting up the Environment Implementation Serialization

More information

OPEN MPI WITH RDMA SUPPORT AND CUDA. Rolf vandevaart, NVIDIA

OPEN MPI WITH RDMA SUPPORT AND CUDA. Rolf vandevaart, NVIDIA OPEN MPI WITH RDMA SUPPORT AND CUDA Rolf vandevaart, NVIDIA OVERVIEW What is CUDA-aware History of CUDA-aware support in Open MPI GPU Direct RDMA support Tuning parameters Application example Future work

More information

Duksu Kim. Professional Experience Senior researcher, KISTI High performance visualization

Duksu Kim. Professional Experience Senior researcher, KISTI High performance visualization Duksu Kim Assistant professor, KORATEHC Education Ph.D. Computer Science, KAIST Parallel Proximity Computation on Heterogeneous Computing Systems for Graphics Applications Professional Experience Senior

More information

Interval arithmetic on graphics processing units

Interval arithmetic on graphics processing units Interval arithmetic on graphics processing units Sylvain Collange*, Jorge Flórez** and David Defour* RNC'8 July 7 9, 2008 * ELIAUS, Université de Perpignan Via Domitia ** GILab, Universitat de Girona How

More information

Improving Uintah s Scalability Through the Use of Portable

Improving Uintah s Scalability Through the Use of Portable Improving Uintah s Scalability Through the Use of Portable Kokkos-Based Data Parallel Tasks John Holmen1, Alan Humphrey1, Daniel Sunderland2, Martin Berzins1 University of Utah1 Sandia National Laboratories2

More information

TESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications

TESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications TESLA P PERFORMANCE GUIDE HPC and Deep Learning Applications MAY 217 TESLA P PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important

More information

Intel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python

Intel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python Intel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python Python Landscape Adoption of Python continues to grow among domain specialists and developers for its productivity benefits Challenge#1:

More information

General Purpose GPU Computing in Partial Wave Analysis

General Purpose GPU Computing in Partial Wave Analysis JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data

More information

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES Cliff Woolley, NVIDIA PREFACE This talk presents a case study of extracting parallelism in the UMT2013 benchmark for 3D unstructured-mesh

More information

ENABLING NEW SCIENCE GPU SOLUTIONS

ENABLING NEW SCIENCE GPU SOLUTIONS ENABLING NEW SCIENCE TESLA BIO Workbench The NVIDIA Tesla Bio Workbench enables biophysicists and computational chemists to push the boundaries of life sciences research. It turns a standard PC into a

More information

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture

More information

OpenACC Course. Office Hour #2 Q&A

OpenACC Course. Office Hour #2 Q&A OpenACC Course Office Hour #2 Q&A Q1: How many threads does each GPU core have? A: GPU cores execute arithmetic instructions. Each core can execute one single precision floating point instruction per cycle

More information

Enterprise. Breadth-First Graph Traversal on GPUs. November 19th, 2015

Enterprise. Breadth-First Graph Traversal on GPUs. November 19th, 2015 Enterprise Breadth-First Graph Traversal on GPUs Hang Liu H. Howie Huang November 9th, 5 Graph is Ubiquitous Breadth-First Search (BFS) is Important Wide Range of Applications Single Source Shortest Path

More information

2006: Short-Range Molecular Dynamics on GPU. San Jose, CA September 22, 2010 Peng Wang, NVIDIA

2006: Short-Range Molecular Dynamics on GPU. San Jose, CA September 22, 2010 Peng Wang, NVIDIA 2006: Short-Range Molecular Dynamics on GPU San Jose, CA September 22, 2010 Peng Wang, NVIDIA Overview The LAMMPS molecular dynamics (MD) code Cell-list generation and force calculation Algorithm & performance

More information

MontePython. Thejs Brinckmann, Deanna C. Hooper, Julien Lesgourgues. MontePython + CLASS Kavli workshop

MontePython. Thejs Brinckmann, Deanna C. Hooper, Julien Lesgourgues. MontePython + CLASS Kavli workshop MontePython Thejs Brinckmann, Deanna C. Hooper, Julien Lesgourgues MontePython + CLASS Kavli workshop Code developed by Audren, Brinckmann, Lesgourgues & many others Lecture, Cambridge, 12/09/18 Overview

More information

Introduction to ESPResSo and Tcl

Introduction to ESPResSo and Tcl to ESPResSo and Tcl Institut für Computerphysik, Universität Stuttgart Stuttgart, Germany Coarse-Graining Time scale Co Finite Elements ng i n i ra G arse Soft Fluid Molecular All-Atom Quantum Length scale

More information

Getting computing into the classroom: building a cluster

Getting computing into the classroom: building a cluster Getting computing into the classroom: building a cluster Erik Spence SciNet HPC Consortium 2 April 2015 Erik Spence (SciNet HPC Consortium) MPI Cluster 2 April 2015 1 / 31 Today s class High Performance

More information

c++ and python modern programming techniques

c++ and python modern programming techniques c++ and python modern programming techniques Jakša Vučičević IPB, Tuesday, February 28 th, 2017 Outline Introduction to Python, Java and C++ programming paradigms compilation model Template programming

More information

TESLA P100 PERFORMANCE GUIDE. Deep Learning and HPC Applications

TESLA P100 PERFORMANCE GUIDE. Deep Learning and HPC Applications TESLA P PERFORMANCE GUIDE Deep Learning and HPC Applications SEPTEMBER 217 TESLA P PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important

More information

Harnessing GPU speed to accelerate LAMMPS particle simulations

Harnessing GPU speed to accelerate LAMMPS particle simulations Harnessing GPU speed to accelerate LAMMPS particle simulations Paul S. Crozier, W. Michael Brown, Peng Wang pscrozi@sandia.gov, wmbrown@sandia.gov, penwang@nvidia.com SC09, Portland, Oregon November 18,

More information

VIP Documentation. Release Carlos Alberto Gomez Gonzalez, Olivier Wertz & VORTEX team

VIP Documentation. Release Carlos Alberto Gomez Gonzalez, Olivier Wertz & VORTEX team VIP Documentation Release 0.8.9 Carlos Alberto Gomez Gonzalez, Olivier Wertz & VORTEX team Feb 17, 2018 Contents 1 Introduction 3 2 Documentation 5 3 Jupyter notebook tutorial 7 4 TL;DR setup guide 9

More information

Optimizing CUDA for GPU Architecture. CSInParallel Project

Optimizing CUDA for GPU Architecture. CSInParallel Project Optimizing CUDA for GPU Architecture CSInParallel Project August 13, 2014 CONTENTS 1 CUDA Architecture 2 1.1 Physical Architecture........................................... 2 1.2 Virtual Architecture...........................................

More information

GPU Task-Parallelism: Primitives and Applications. Stanley Tzeng, Anjul Patney, John D. Owens University of California at Davis

GPU Task-Parallelism: Primitives and Applications. Stanley Tzeng, Anjul Patney, John D. Owens University of California at Davis GPU Task-Parallelism: Primitives and Applications Stanley Tzeng, Anjul Patney, John D. Owens University of California at Davis This talk Will introduce task-parallelism on GPUs What is it? Why is it important?

More information

Analysis and Visualization Algorithms in VMD

Analysis and Visualization Algorithms in VMD 1 Analysis and Visualization Algorithms in VMD David Hardy Research/~dhardy/ NAIS: State-of-the-Art Algorithms for Molecular Dynamics (Presenting the work of John Stone.) VMD Visual Molecular Dynamics

More information

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System Alan Humphrey, Qingyu Meng, Martin Berzins Scientific Computing and Imaging Institute & University of Utah I. Uintah Overview

More information

How-To FRED MPC Software Version #: Last Updated: October 1, 2018

How-To FRED MPC Software Version #: Last Updated: October 1, 2018 Illuminating Ideas How-To FRED MPC Software Version #: 17.104.0 Last Updated: October 1, 2018 Table of Contents Document Overview... 1 Installation and Licensing... 1 GPU Hardware Configuration... 4 GPU

More information

PyPWA A Partial-Wave/Amplitude Analysis Software Framework

PyPWA A Partial-Wave/Amplitude Analysis Software Framework PyPWA A Partial-Wave/Amplitude Analysis Software Framework Carlos W. Salgado 1,2 other team members S. Bramlett 1, B. DeMello 1, M. Jones 1 W. Phelps 3 and J. Pond 1 Norfolk State University 1 The Thomas

More information

4/20/15. Blue Waters User Monthly Teleconference

4/20/15. Blue Waters User Monthly Teleconference 4/20/15 Blue Waters User Monthly Teleconference Agenda Utilization Recent events Recent changes Upcoming changes Blue Waters Data Sharing 2015 Blue Waters Symposium PUBLICATIONS! 2 System Utilization Utilization

More information

Tutorial 8: Visualization How to visualize your ESPResSo simulations while they are running

Tutorial 8: Visualization How to visualize your ESPResSo simulations while they are running Tutorial 8: Visualization How to visualize your ESPResSo simulations while they are running October 9, 2016 1 Introduction When you are running a simulation, it is often useful to see what is going on

More information

Cross Teaching Parallelism and Ray Tracing: A Project based Approach to Teaching Applied Parallel Computing

Cross Teaching Parallelism and Ray Tracing: A Project based Approach to Teaching Applied Parallel Computing and Ray Tracing: A Project based Approach to Teaching Applied Parallel Computing Chris Lupo Computer Science Cal Poly Session 0311 GTC 2012 Slide 1 The Meta Data Cal Poly is medium sized, public polytechnic

More information

Python based Data Science on Cray Platforms Rob Vesse, Alex Heye, Mike Ringenburg - Cray Inc C O M P U T E S T O R E A N A L Y Z E

Python based Data Science on Cray Platforms Rob Vesse, Alex Heye, Mike Ringenburg - Cray Inc C O M P U T E S T O R E A N A L Y Z E Python based Data Science on Cray Platforms Rob Vesse, Alex Heye, Mike Ringenburg - Cray Inc Overview Supported Technologies Cray PE Python Support Shifter Urika-XC Anaconda Python Spark Intel BigDL machine

More information

Introduction to Computer Vision Laboratories

Introduction to Computer Vision Laboratories Introduction to Computer Vision Laboratories Antonino Furnari furnari@dmi.unict.it www.dmi.unict.it/~furnari/ Computer Vision Laboratories Format: practical session + questions and homeworks. Material

More information

Hybrid Implementation of 3D Kirchhoff Migration

Hybrid Implementation of 3D Kirchhoff Migration Hybrid Implementation of 3D Kirchhoff Migration Max Grossman, Mauricio Araya-Polo, Gladys Gonzalez GTC, San Jose March 19, 2013 Agenda 1. Motivation 2. The Problem at Hand 3. Solution Strategy 4. GPU Implementation

More information

The Materials Data Facility

The Materials Data Facility The Materials Data Facility Ben Blaiszik (blaiszik@uchicago.edu), Kyle Chard (chard@uchicago.edu) Ian Foster (foster@uchicago.edu) materialsdatafacility.org What is MDF? We aim to make it simple for materials

More information

Molecular Simulations using Monte Carlo and Molecular Dynamics

Molecular Simulations using Monte Carlo and Molecular Dynamics Mitglied der Helmholtz-Gemeinschaft Molecular Simulations using Monte Carlo and Molecular Dynamics Jan H. Meinke Research Centre Jülich Jülich Jülich Wrocław Wrocław Slide 2 Mitglied der Helmholtz-Gemeinschaft

More information

Homework 01 : Deep learning Tutorial

Homework 01 : Deep learning Tutorial Homework 01 : Deep learning Tutorial Introduction to TensorFlow and MLP 1. Introduction You are going to install TensorFlow as a tutorial of deep learning implementation. This instruction will provide

More information

CUDA Development Using NVIDIA Nsight, Eclipse Edition. David Goodwin

CUDA Development Using NVIDIA Nsight, Eclipse Edition. David Goodwin CUDA Development Using NVIDIA Nsight, Eclipse Edition David Goodwin NVIDIA Nsight Eclipse Edition CUDA Integrated Development Environment Project Management Edit Build Debug Profile SC'12 2 Powered By

More information

CUDA and GPU Performance Tuning Fundamentals: A hands-on introduction. Francesco Rossi University of Bologna and INFN

CUDA and GPU Performance Tuning Fundamentals: A hands-on introduction. Francesco Rossi University of Bologna and INFN CUDA and GPU Performance Tuning Fundamentals: A hands-on introduction Francesco Rossi University of Bologna and INFN * Using this terminology since you ve already heard of SIMD and SPMD at this school

More information

Introduction to CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator

Introduction to CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator Introduction to CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator What is CUDA? Programming language? Compiler? Classic car? Beer? Coffee? CUDA Parallel Computing Platform www.nvidia.com/getcuda Programming

More information

Faster Simulations of the National Airspace System

Faster Simulations of the National Airspace System Faster Simulations of the National Airspace System PK Menon Monish Tandale Sandy Wiraatmadja Optimal Synthesis Inc. Joseph Rios NASA Ames Research Center NVIDIA GPU Technology Conference 2010, San Jose,

More information

Subtleties in the Monte Carlo simulation of lattice polymers

Subtleties in the Monte Carlo simulation of lattice polymers Subtleties in the Monte Carlo simulation of lattice polymers Nathan Clisby MASCOS, University of Melbourne July 9, 2010 Outline Critical exponents for SAWs. The pivot algorithm. Improving implementation

More information

HPC Application Porting to CUDA at BSC

HPC Application Porting to CUDA at BSC www.bsc.es HPC Application Porting to CUDA at BSC Pau Farré, Marc Jordà GTC 2016 - San Jose Agenda WARIS-Transport Atmospheric volcanic ash transport simulation Computer Applications department PELE Protein-drug

More information

MPEXS benchmark results

MPEXS benchmark results MPEXS benchmark results - phase space data - Akinori Kimura 14 February 2017 Aim To validate results of MPEXS with phase space data by comparing with Geant4 results Depth dose and lateral dose distributions

More information

Guillimin HPC Users Meeting December 14, 2017

Guillimin HPC Users Meeting December 14, 2017 Guillimin HPC Users Meeting December 14, 2017 guillimin@calculquebec.ca McGill University / Calcul Québec / Compute Canada Montréal, QC Canada Please be kind to your fellow user meeting attendees Limit

More information

Intel Distribution for Python* и Intel Performance Libraries

Intel Distribution for Python* и Intel Performance Libraries Intel Distribution for Python* и Intel Performance Libraries 1 Motivation * L.Prechelt, An empirical comparison of seven programming languages, IEEE Computer, 2000, Vol. 33, Issue 10, pp. 23-29 ** RedMonk

More information

The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy.! Thomas C.

The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy.! Thomas C. The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy! Thomas C. Schulthess ENES HPC Workshop, Hamburg, March 17, 2014 T. Schulthess!1

More information

Table of Contents. Table of Contents Job Manager for local execution of ATK scripts Serial execution Threading MPI parallelization Machine Manager

Table of Contents. Table of Contents Job Manager for local execution of ATK scripts Serial execution Threading MPI parallelization Machine Manager Table of Contents Table of Contents Job Manager for local execution of ATK scripts Serial execution Threading MPI parallelization Machine Manager 1 2 2 8 10 12 QuantumWise TRY IT! COMPANY CONTACT Docs»

More information

Intersection Acceleration

Intersection Acceleration Advanced Computer Graphics Intersection Acceleration Matthias Teschner Computer Science Department University of Freiburg Outline introduction bounding volume hierarchies uniform grids kd-trees octrees

More information

A fast and accurate GPU-based proton transport Monte Carlo simulation for validating proton therapy treatment plans

A fast and accurate GPU-based proton transport Monte Carlo simulation for validating proton therapy treatment plans A fast and accurate GPU-based proton transport Monte Carlo simulation for validating proton therapy treatment plans H. Wan Chan Tseung 1 J. Ma C. Beltran PTCOG 2014 13 June, Shanghai 1 wanchantseung.hok@mayo.edu

More information

Homework 1: Implicit Surfaces, Collision Detection, & Volumetric Data Structures. Loop Subdivision. Loop Subdivision. Questions/Comments?

Homework 1: Implicit Surfaces, Collision Detection, & Volumetric Data Structures. Loop Subdivision. Loop Subdivision. Questions/Comments? Homework 1: Questions/Comments? Implicit Surfaces,, & Volumetric Data Structures Loop Subdivision Shirley, Fundamentals of Computer Graphics Loop Subdivision SIGGRAPH 2000 course notes Subdivision for

More information

Lecture 11: Ray tracing (cont.)

Lecture 11: Ray tracing (cont.) Interactive Computer Graphics Ray tracing - Summary Lecture 11: Ray tracing (cont.) Graphics Lecture 10: Slide 1 Some slides adopted from H. Pfister, Harvard Graphics Lecture 10: Slide 2 Ray tracing -

More information

Sampling Using GPU Accelerated Sparse Hierarchical Models

Sampling Using GPU Accelerated Sparse Hierarchical Models Sampling Using GPU Accelerated Sparse Hierarchical Models Miroslav Stoyanov Oak Ridge National Laboratory supported by Exascale Computing Project (ECP) exascaleproject.org April 9, 28 Miroslav Stoyanov

More information

SCALABLE HYBRID PROTOTYPE

SCALABLE HYBRID PROTOTYPE SCALABLE HYBRID PROTOTYPE Scalable Hybrid Prototype Part of the PRACE Technology Evaluation Objectives Enabling key applications on new architectures Familiarizing users and providing a research platform

More information

PHYSICALLY BASED RENDERING FOR 3DSMAX LIGHTWORKS IRAY + FOR 3DSMAX CASSIE THIBODEAU - NVIDIA PETER DE LAPPE NVIDIA DAVID COLDRON - LIGHTWORKS

PHYSICALLY BASED RENDERING FOR 3DSMAX LIGHTWORKS IRAY + FOR 3DSMAX CASSIE THIBODEAU - NVIDIA PETER DE LAPPE NVIDIA DAVID COLDRON - LIGHTWORKS Webinar: Photorealistic visualization with Speed and Ease Using Iray+ for Autodesk 3ds Max PHYSICALLY BASED RENDERING FOR 3DSMAX LIGHTWORKS IRAY + FOR 3DSMAX CASSIE THIBODEAU - NVIDIA PETER DE LAPPE NVIDIA

More information

Dynamic Bounding Volume Hierarchies. Erin Catto, Blizzard Entertainment

Dynamic Bounding Volume Hierarchies. Erin Catto, Blizzard Entertainment Dynamic Bounding Volume Hierarchies Erin Catto, Blizzard Entertainment 1 This is one of my favorite Overwatch maps: BlizzardWorld. This is the spawn area inside the Hearthstone Tavern. 2 All the objects

More information

Big Orange Bramble. August 09, 2016

Big Orange Bramble. August 09, 2016 Big Orange Bramble August 09, 2016 Overview HPL SPH PiBrot Numeric Integration Parallel Pi Monte Carlo FDS DANNA HPL High Performance Linpack is a benchmark for clusters Created here at the University

More information

Point Cloud Collision Detection

Point Cloud Collision Detection Point Cloud Collision Detection Uni Paderborn & Gabriel Zachmann Uni Bonn Point Clouds Modern acquisition methods (scanning, sampling synthetic objects) lead to modern object representations. Efficient

More information

A Comparative Study on Exact Triangle Counting Algorithms on the GPU

A Comparative Study on Exact Triangle Counting Algorithms on the GPU A Comparative Study on Exact Triangle Counting Algorithms on the GPU Leyuan Wang, Yangzihao Wang, Carl Yang, John D. Owens University of California, Davis, CA, USA 31 st May 2016 L. Wang, Y. Wang, C. Yang,

More information

MPI: the Message Passing Interface

MPI: the Message Passing Interface 15 Parallel Programming with MPI Lab Objective: In the world of parallel computing, MPI is the most widespread and standardized message passing library. As such, it is used in the majority of parallel

More information

Getting Started with Python

Getting Started with Python Getting Started with Python A beginner course to Python Ryan Leung Updated: 2018/01/30 yanyan.ryan.leung@gmail.com Links Tutorial Material on GitHub: http://goo.gl/grrxqj 1 Learning Outcomes Python as

More information

WHAM. The Weighted Histogram Analysis Method. Alan Grossfield November 7, 2003

WHAM. The Weighted Histogram Analysis Method. Alan Grossfield November 7, 2003 WHAM The Weighted Histogram Analysis Method Alan Grossfield November 7, 2003 Outline Statistical Mechanics Non-Boltzmann Sampling WHAM equations Practical Considerations Example: Butane Statistical Mechanics

More information

Interactive and Scalable Ray-Casting of Metaballs on the GPU

Interactive and Scalable Ray-Casting of Metaballs on the GPU CIS 496 / EAS 499 Senior Project Project Proposal Specification Instructors: Norman I. Badler and Joseph T. Kider. Jr. Interactive and Scalable Ray-Casting of Metaballs on the GPU Jon McCaffrey Advisor:

More information

Incremental Migration of C and Fortran Applications to GPGPU using HMPP HPC Advisory Council China Conference 2010

Incremental Migration of C and Fortran Applications to GPGPU using HMPP HPC Advisory Council China Conference 2010 Innovative software for manycore paradigms Incremental Migration of C and Fortran Applications to GPGPU using HMPP HPC Advisory Council China Conference 2010 Introduction Many applications can benefit

More information

Parallel Programming on Larrabee. Tim Foley Intel Corp

Parallel Programming on Larrabee. Tim Foley Intel Corp Parallel Programming on Larrabee Tim Foley Intel Corp Motivation This morning we talked about abstractions A mental model for GPU architectures Parallel programming models Particular tools and APIs This

More information

Overview of Performance Prediction Tools for Better Development and Tuning Support

Overview of Performance Prediction Tools for Better Development and Tuning Support Overview of Performance Prediction Tools for Better Development and Tuning Support Universidade Federal Fluminense Rommel Anatoli Quintanilla Cruz / Master's Student Esteban Clua / Associate Professor

More information

DATA FORMATS FOR DATA SCIENCE Remastered

DATA FORMATS FOR DATA SCIENCE Remastered Budapest BI FORUM 2016 DATA FORMATS FOR DATA SCIENCE Remastered Valerio Maggio @leriomaggio Data Scientist and Researcher Fondazione Bruno Kessler (FBK) Trento, Italy WhoAmI Post Doc Researcher @ FBK Interested

More information

CUDA Optimization with NVIDIA Nsight Visual Studio Edition 3.0. Julien Demouth, NVIDIA

CUDA Optimization with NVIDIA Nsight Visual Studio Edition 3.0. Julien Demouth, NVIDIA CUDA Optimization with NVIDIA Nsight Visual Studio Edition 3.0 Julien Demouth, NVIDIA What Will You Learn? An iterative method to optimize your GPU code A way to conduct that method with Nsight VSE APOD

More information

Application-independent Autotuning for GPUs

Application-independent Autotuning for GPUs Application-independent Autotuning for GPUs Martin Tillmann, Thomas Karcher, Carsten Dachsbacher, Walter F. Tichy KARLSRUHE INSTITUTE OF TECHNOLOGY KIT University of the State of Baden-Wuerttemberg and

More information

Python ecosystem for scientific computing with ABINIT: challenges and opportunities. M. Giantomassi and the AbiPy group

Python ecosystem for scientific computing with ABINIT: challenges and opportunities. M. Giantomassi and the AbiPy group Python ecosystem for scientific computing with ABINIT: challenges and opportunities M. Giantomassi and the AbiPy group Frejus, May 9, 2017 Python package for: generating input files automatically post-processing

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware

More information

A Peta-scale LES (Large-Eddy Simulation) for Turbulent Flows Based on Lattice Boltzmann Method

A Peta-scale LES (Large-Eddy Simulation) for Turbulent Flows Based on Lattice Boltzmann Method GTC (GPU Technology Conference) 2013, San Jose, 2013, March 20 A Peta-scale LES (Large-Eddy Simulation) for Turbulent Flows Based on Lattice Boltzmann Method Takayuki Aoki Global Scientific Information

More information

Migration of Applications Across

Migration of Applications Across Migration of Applications Across Different Systems: A Case Study Marisa Gil, Edgar Juanpere, Xavier Martorell and Nacho Navarro (BSC, ) Riccardo Rossi and Pooyan Dadvand (CIMNE, ) Outline Context Overview

More information

STARTING THE DDT DEBUGGER ON MIO, AUN, & MC2. (Mouse over to the left to see thumbnails of all of the slides)

STARTING THE DDT DEBUGGER ON MIO, AUN, & MC2. (Mouse over to the left to see thumbnails of all of the slides) STARTING THE DDT DEBUGGER ON MIO, AUN, & MC2 (Mouse over to the left to see thumbnails of all of the slides) ALLINEA DDT Allinea DDT is a powerful, easy-to-use graphical debugger capable of debugging a

More information

MODELING CUDA COMPUTE APPLICATIONS BY CRITICAL PATH. PATRIC ZHAO, JIRI KRAUS, SKY WU

MODELING CUDA COMPUTE APPLICATIONS BY CRITICAL PATH. PATRIC ZHAO, JIRI KRAUS, SKY WU MODELING CUDA COMPUTE APPLICATIONS BY CRITICAL PATH PATRIC ZHAO, JIRI KRAUS, SKY WU patricz@nvidia.com AGENDA Background Collect data and Visualizations Critical Path Performance analysis and prediction

More information

Tesla GPU Computing A Revolution in High Performance Computing

Tesla GPU Computing A Revolution in High Performance Computing Tesla GPU Computing A Revolution in High Performance Computing Gernot Ziegler, Developer Technology (Compute) (Material by Thomas Bradley) Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction

More information

Lab III: MD II & Analysis Linux & OS X. In this lab you will:

Lab III: MD II & Analysis Linux & OS X. In this lab you will: Lab III: MD II & Analysis Linux & OS X In this lab you will: 1) Add water, ions, and set up and minimize a solvated protein system 2) Learn how to create fixed atom files and systematically minimize a

More information

Cluster-Algorithms for off-lattice systems Ludger Santen

Cluster-Algorithms for off-lattice systems Ludger Santen Cluster-Algorithms for off-lattice systems Fachrichtung 7.1, Theoretische Physik Universität des Saarlandes, Saarbrücken Cluster-Algorithms for off-lattice systems The Monte Carlo-Method: A reminder Cluster

More information

Using Bounding Volume Hierarchies Efficient Collision Detection for Several Hundreds of Objects

Using Bounding Volume Hierarchies Efficient Collision Detection for Several Hundreds of Objects Part 7: Collision Detection Virtuelle Realität Wintersemester 2007/08 Prof. Bernhard Jung Overview Bounding Volumes Separating Axis Theorem Using Bounding Volume Hierarchies Efficient Collision Detection

More information

Performance potential for simulating spin models on GPU

Performance potential for simulating spin models on GPU Performance potential for simulating spin models on GPU Martin Weigel Institut für Physik, Johannes-Gutenberg-Universität Mainz, Germany 11th International NTZ-Workshop on New Developments in Computational

More information

CUDA. Matthew Joyner, Jeremy Williams

CUDA. Matthew Joyner, Jeremy Williams CUDA Matthew Joyner, Jeremy Williams Agenda What is CUDA? CUDA GPU Architecture CPU/GPU Communication Coding in CUDA Use cases of CUDA Comparison to OpenCL What is CUDA? What is CUDA? CUDA is a parallel

More information

PHCpack, phcpy, and Sphinx

PHCpack, phcpy, and Sphinx PHCpack, phcpy, and Sphinx 1 the software PHCpack a package for Polynomial Homotopy Continuation polyhedral homotopies the Python interface phcpy 2 Documenting Software with Sphinx Sphinx generates documentation

More information

Splotch: High Performance Visualization using MPI, OpenMP and CUDA

Splotch: High Performance Visualization using MPI, OpenMP and CUDA Splotch: High Performance Visualization using MPI, OpenMP and CUDA Klaus Dolag (Munich University Observatory) Martin Reinecke (MPA, Garching) Claudio Gheller (CSCS, Switzerland), Marzia Rivi (CINECA,

More information

Evolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University Scalable Tools Workshop 7 August 2017

Evolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University   Scalable Tools Workshop 7 August 2017 Evolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University http://hpctoolkit.org Scalable Tools Workshop 7 August 2017 HPCToolkit 1 HPCToolkit Workflow source code compile &

More information

VMD: Immersive Molecular Visualization and Interactive Ray Tracing for Domes, Panoramic Theaters, and Head Mounted Displays

VMD: Immersive Molecular Visualization and Interactive Ray Tracing for Domes, Panoramic Theaters, and Head Mounted Displays VMD: Immersive Molecular Visualization and Interactive Ray Tracing for Domes, Panoramic Theaters, and Head Mounted Displays John E. Stone Theoretical and Computational Biophysics Group Beckman Institute

More information

A4. Intro to Parallel Computing

A4. Intro to Parallel Computing Self-Consistent Simulations of Beam and Plasma Systems Steven M. Lund, Jean-Luc Vay, Rémi Lehe and Daniel Winklehner Colorado State U., Ft. Collins, CO, 13-17 June, 2016 A4. Intro to Parallel Computing

More information

S WHAT THE PROFILER IS TELLING YOU: OPTIMIZING GPU KERNELS. Jakob Progsch, Mathias Wagner GTC 2018

S WHAT THE PROFILER IS TELLING YOU: OPTIMIZING GPU KERNELS. Jakob Progsch, Mathias Wagner GTC 2018 S8630 - WHAT THE PROFILER IS TELLING YOU: OPTIMIZING GPU KERNELS Jakob Progsch, Mathias Wagner GTC 2018 1. Know your hardware BEFORE YOU START What are the target machines, how many nodes? Machine-specific

More information

Forrest B. Brown, Yasunobu Nagaya. American Nuclear Society 2002 Winter Meeting November 17-21, 2002 Washington, DC

Forrest B. Brown, Yasunobu Nagaya. American Nuclear Society 2002 Winter Meeting November 17-21, 2002 Washington, DC LA-UR-02-3782 Approved for public release; distribution is unlimited. Title: THE MCNP5 RANDOM NUMBER GENERATOR Author(s): Forrest B. Brown, Yasunobu Nagaya Submitted to: American Nuclear Society 2002 Winter

More information

From the latency to the throughput age. Prof. Jesús Labarta Director Computer Science Dept (BSC) UPC

From the latency to the throughput age. Prof. Jesús Labarta Director Computer Science Dept (BSC) UPC From the latency to the throughput age Prof. Jesús Labarta Director Computer Science Dept (BSC) UPC ETP4HPC Post-H2020 HPC Vision Frankfurt, June 24 th 2018 To exascale... and beyond 2 Vision The multicore

More information

Windows OpenFabrics (WinOF) Update

Windows OpenFabrics (WinOF) Update Windows OpenFabrics (WinOF) Update Eric Lantz, Microsoft (elantz@microsoft.com) April 2008 Agenda OpenFabrics and Microsoft Current Events HPC Server 2008 Release NetworkDirect - RDMA for Windows 2 OpenFabrics

More information

PRNGCL: OpenCL Library of Pseudo-Random Number Generators for Monte Carlo Simulations

PRNGCL: OpenCL Library of Pseudo-Random Number Generators for Monte Carlo Simulations PRNGCL: OpenCL Library of Pseudo-Random Number Generators for Monte Carlo Simulations Vadim Demchik vadimdi@yahoo.com http://hgpu.org/ Dnipropetrovsk National University Dnipropetrovsk, Ukraine GTC 14

More information