Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1
|
|
- Prosper Oliver
- 6 years ago
- Views:
Transcription
1 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Benjamin Hernandez, PhD Advanced Data and Workflows Group Oak Ridge Leadership Computing Facility Oak Ridge National Laboratory ORNL is managed by UT-Battelle for the US Department of Energy
2 Oak Ridge Leadership Computing Facility (OLCF) Mission: Provide the computational and data resources required to solve the most challenging problems. Highly competitive user allocation programs (INCITE, ALCC). Projects receive 10x to 100x more resource than at other generally available centers. We partner with users to enable science & engineering breakthroughs. 2
3 Sight: Exploratory Visualization of Scientific Data Client/Server architecture to provide high end visualization in laptops, desktops, and powerwalls. Heterogeneous scientific visualization Take advantage of both CPU & GPU resources within a node: DGX-1 use case. Advanced shading to enable new insights into data exploration. Parallel I/O & Data Staging Pluggable for in-situ visualization Lightweight tool Load your data Perform exploratory analysis Visualize/Save results 3
4 Local/Parallel File System HPC Cluster ADIOS I/O System VTK-m VTK-m Compression Sight System Architecture (in progress) Visualization Frames OSPray Nvidia Optix CPU cores Multi-GPU Websockets Server (DGX-1 or multigpu node) HTML Client Server (DGX-1) Server (DGX-1) *OSPray and Nvidia Optix are finely tuned libraries for Ray Tracing in multicore and manycore architectures 4
5 ADIOS ADIOS is an I/O framework Provides multiple methods to stage data to a staging area (on node, off node, off machine) Data output can be anything one wants Different methods allow for different types of data movement, aggregation, and arrangement to the storage system or to stream over the local-nodes, LAN, WAN It contains our own file format if you choose to use it (ADIOS-BP) Compress/decompress data in parallel Contains mechanisms to index and query data 5
6 First Approach: OpenGL VBO (points) V.S. Apply transfer function G.S. Quads w/tex. coords F.S. Sphere gen. and Shading 6
7 OpenGL Bindless Graphics Initialization Address pointer Display Vertices start from vboaddress to vboaddress + sizeof (float)*size 7
8 Fragment Shader sphere Generation Sphere equation: r 2 = x x y y z z 0 2 (-1,1) (1,1) r = 1.0, z = 1.0 x = texcoord.x y = texcoord.y zz = 1.0 x*x y*y if (zz <= 0.0) // removes fragments outside discard; // scale to the desired radius // calculate diffuse illumination (-1,-1) (1,-1) 8
9 Results 9
10 OpenGL Multi-GPU Rendering One MPI task for each device Easy to implement Each device initialize its GLX/EGL context Multi-threading. One thread per device In EGL is possible: Create the main context in the main thread: mainctx = eglcreatecontext(display, config, 0, contextattrs) Each additional thread create a shared context: lclthrdctx = eglcreatecontext(display, config, mainctx, contextattrs); Implement some mutex/semaphores to sync any updates Vulkanize your viz! Devices are aware of other devices and can coordinate between each other That s precisely NVIDIA Optix can do 10
11 Second Approach Nvidia Optix Ray Tracing Engine The OptiX API is an application framework for achieving optimal ray tracing performance on the GPU. It provides a simple, recursive, and flexible pipeline for accelerating ray tracing algorithms. Similar to OpenGL in doing the heavy lifting of ray tracing and leaving capability and technique to the developers Plus it can use all GPUs available in your system Naturally fits material appearance and scene illumination 11
12 Nvidia Optix Programming Model Optix provides eight programmable components, some of them are: 1. Ray generation 2. Intersection 3. Shading (closest hit) Shadows (any hit) Selector Shaders are CUDA like syntax
13 Nvidia Optix Graph Nodes is defined by Graph nodes. A tree-like hierarchy where: Nodes at the bottom describes geometric objects. Nodes at the top describes collections of geometric objects. Group Acceleration Transform Selector Acceleration Group Group Group Acceleration Instance Instance Instance Instance 13
14 Nvidia Optix Graph Nodes Keep the hierarchy as flat as possible Group Acceleration Group Acceleration Instance Group Acceleration Group Acceleration Group Acceleration Particles Instance Instance Instance But not too flat! Particles Particles Particles 14
15 Results Test Systems Workstation CPU Intel Xeon 20 cores 512 GB GPU Titan Z (2 Geforce Kepler GPU, 2x6 GB VRAM), Ubuntu 16, Nvidia Driver Rhea Node CPU Intel Xeon GPU 2x Tesla K80 (4 Tesla Kepler GPU, 2x24 GB VRAM) Redhat 7. Nvidia Driver DGX-1 CPU Intel GPU 8x Tesla Pascal SMX, 8x16 GB VRAM Ubuntu 16, Nvidia Driver All systems: CUDA 8.0 Optix Acceleration Structure: Trbvh Image resolution: 1080p Shading: Phong Illumination & Ambient Occlusion 15
16 Time (ms) Results. How fast is built the acceleration structure? lower is better Workstation Rhea Node DGX-1 1 Million 10 Million 20 Million Particles 16
17 ms per frame ms per frame Results Performance, lower is better 275 Frame rate (worst case) Frame rate (best case) fps fps Workstation Rhea node DGX Million particles 30 fps 60 fps 5 0 Workstation Rhea node DGX Million particles 17
18 Results 18
19 Discussion DGX-1 can handle particle systems up to 10x larger in our test environment. For particle systems of the same size DGX-1 is 10x faster than the workstation system and 4.6x faster than Rhea node We expect for larger image resolutions DGX-1 speed up will increase. Our preliminary tests showed DGX-1 has enough compute power to drive a powerwall 3840 x fps Test larger resolution Researchers usually are happy when they can explore datasets even at 1 fps 19
20 Discussion Nvidia Optix provides multi-gpu support with no hassle Test if Nvidia Optix leaves free resources for analysis tasks. Paging was removed in Optix 4.x DGX-1 includes 40 CPU cores and 512 RAM Using Nvidia Optix & OSPRay library will allow full system allocation to handle larger systems. Summit is likely to support EGL through the Nvidia GPU Drivers (do not take it as a fact or alternative fact neither!). Best if (pre)exascale visualization tools are 100 % CUDA compliant. 20
21 Questions? Benjamin Hernandez, PhD Advanced Data and Workflows Group Oak Ridge Leadership Computing Facility Oak Ridge National Laboratory 21 Acknowledgements: Dylan Lacewell and the Nvidia Optix Team for their technical support. Datasets provided by Cheng-Yu Shi and Leonid Zhigilei from the Computational Materials Group at University of Virginia. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
22 Further Reading Tom True, Alina Alt (2013) Configuring, Programming and Debugging Applications for Multiple GPUs GTC 2013 Wil Braithwaite Multi-GPU Programming for Visual Computing SIGGRAPH 2013 Available in GTC on Demand: Adios Manual Optix Tutorial Talks submit=&select=+ 22
23 2018 INCITE Call for Proposals The 2018 INCITE Call for Proposals opened April 17, 2017 and closes June 23, Features large allocations of computer time and supporting resources at the Argonne and Oak Ridge Leadership Computing Facility (LCF) centers, operated by the US Department of Energy (DOE) Office of Science. Soliciting research proposals for awards of time on the 27-petaflop Cray XK7 Titan, and the 10-petaflop IBM Blue Gene/Q, Mira. In addition, certain 2018 INCITE awards will receive time on Argonne s new Intel/Cray system, a 9.65-petaflops system called Theta. The INCITE program seeks research proposals for capability computing Production simulations, including ensembles, that use a large fraction of the LCF systems, or Proposals that require the unique LCF architectural infrastructure for highperformance computing projects that cannot be performed elsewhere The INCITE program is open to US and non-us based researchers. The INCITE program invites you to participate in an INCITE Proposal Writing Webinar, offered on April 19, May 18, and June 6. For more information visit 23
24 Results How fast is built the acceleration structure? Workstation Rhea node DGX-1 Time(%) Time Calls Avg Min Max Name 49.63% ms us us ms [CUDA memcpy HtoD] 22.64% ms ms ms ms Megakernel_CUDA_ % ms ms us ms [CUDA memcpy DtoH] 6.74% ms ms ms ms Megakernel_CUDA_0 0.30% us us us us [CUDA memcpy HtoA] 0.21% us us us us [CUDA memset] 0.05% us us us us [CUDA memcpy DtoD] Time(%) Time Calls Avg Min Max Name 50.29% ms us us ms [CUDA memcpy HtoD] 34.61% ms us us ms [CUDA memcpy DtoH] 9.48% ms us us ms Megakernel_CUDA_1 5.03% ms ms ms ms Megakernel_CUDA_0 0.36% us us us us [CUDA memset] 0.17% us us us us [CUDA memcpy HtoA] 0.06% us us us us [CUDA memcpy DtoD] Time(%) Time Calls Avg Min Max Name 43.50% ms us us ms [CUDA memcpy HtoD] 30.77% ms us us ms [CUDA memcpy DtoH] 18.13% ms us us us [CUDA memcpy PtoP] 6.16% ms us us us Megakernel_CUDA_1 1.11% us us us us Megakernel_CUDA_0 0.14% us us us us [CUDA memcpy HtoA] 0.13% us us us us [CUDA memset] 0.06% us us us us [CUDA memcpy DtoD] 24
25 Results How fast is built the acceleration structure? Workstation Rhea node DGX-1 Time(%) Time Calls Avg Min Max Name 49.63% ms us us ms [CUDA memcpy HtoD] 22.64% ms ms ms ms Megakernel_CUDA_ % ms ms us ms [CUDA memcpy DtoH] 6.74% ms ms ms ms Megakernel_CUDA_0 0.30% us us us us [CUDA memcpy HtoA] 0.21% us us us us [CUDA memset] 0.05% us us us us [CUDA memcpy DtoD] Time(%) Time Calls Avg Min Max Name 50.29% ms us us ms [CUDA memcpy HtoD] 34.61% ms us us ms [CUDA memcpy DtoH] 9.48% ms us us ms Megakernel_CUDA_1 5.03% ms ms ms ms Megakernel_CUDA_0 0.36% us us us us [CUDA memset] 0.17% us us us us [CUDA memcpy HtoA] 0.06% us us us us [CUDA memcpy DtoD] Time(%) Time Calls Avg Min Max Name 43.50% ms us us ms [CUDA memcpy HtoD] 30.77% ms us us ms [CUDA memcpy DtoH] 18.13% ms us us us [CUDA memcpy PtoP] 6.16% ms us us us Megakernel_CUDA_1 1.11% us us us us Megakernel_CUDA_0 0.14% us us us us [CUDA memcpy HtoA] 0.13% us us us us [CUDA memset] 0.06% us us us us [CUDA memcpy DtoD] 25
26 ADIOS I/O Abstracting metadata, data types, and dimensions from the source code into an XML file C Fortran zlib, bzip2, szip, zfp, isobar Alacrity all data in adios_write() calls are buffered before writing to the file system. POSIX MPI MPI_LUSTRE PHDF5 DATASPACES DIMES FLEXPATH ICEE 26
27 ADIOS I/O Generate the c-code from the XML file gpp.py atoms.xml gwrite_atoms.ch gread_atoms.ch Both files contains code to write and read ADIOS files You only need to modify your XML file an generate new *.ch files Main code remains the same 27
28 ADIOS Write/Read example Write Read 28
SIGHT. Benjamin Hernandez, PhD Advanced Data and Workflow(s) Group
SIGHT Benjamin Hernandez, PhD Advanced Data and Workflow(s) Group hernandezarb@ornl.gov ORNL is managed by UT-Battelle for the US Department of Energy name 1 Presentation This research used resources of
More informationPreparing GPU-Accelerated Applications for the Summit Supercomputer
Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership
More informationThe Titan Tools Experience
The Titan Tools Experience Michael J. Brim, Ph.D. Computer Science Research, CSMD/NCCS Petascale Tools Workshop 213 Madison, WI July 15, 213 Overview of Titan Cray XK7 18,688+ compute nodes 16-core AMD
More informationINTRODUCTION TO OPTIX. Martin Stich, Engineering Manager
INTRODUCTION TO OPTIX Martin Stich, Engineering Manager OptiX Basics AGENDA Advanced Topics Case Studies Feature Outlook 2 OPTIX BASICS 3 IN A NUTSHELL The OptiX Ray Tracing SDK State-of-the-art performance:
More informationEarly Experiences Writing Performance Portable OpenMP 4 Codes
Early Experiences Writing Performance Portable OpenMP 4 Codes Verónica G. Vergara Larrea Wayne Joubert M. Graham Lopez Oscar Hernandez Oak Ridge National Laboratory Problem statement APU FPGA neuromorphic
More informationEnabling the Next Generation of Computational Graphics with NVIDIA Nsight Visual Studio Edition. Jeff Kiel Director, Graphics Developer Tools
Enabling the Next Generation of Computational Graphics with NVIDIA Nsight Visual Studio Edition Jeff Kiel Director, Graphics Developer Tools Computational Graphics Enabled Problem: Complexity of Computation
More informationJohannes Günther, Senior Graphics Software Engineer. Intel Data Center Group, HPC Visualization
Johannes Günther, Senior Graphics Software Engineer Intel Data Center Group, HPC Visualization Data set provided by Florida International University: Simulated fluid flow through a porous medium Large
More informationOpenMPSuperscalar: Task-Parallel Simulation and Visualization of Crowds with Several CPUs and GPUs
www.bsc.es OpenMPSuperscalar: Task-Parallel Simulation and Visualization of Crowds with Several CPUs and GPUs Hugo Pérez UPC-BSC Benjamin Hernandez Oak Ridge National Lab Isaac Rudomin BSC March 2015 OUTLINE
More informationManaging HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory
Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Quinn Mitchell HPC UNIX/LINUX Storage Systems ORNL is managed by UT-Battelle for the US Department of Energy U.S. Department
More informationWHAT S NEW IN CUDA 8. Siddharth Sharma, Oct 2016
WHAT S NEW IN CUDA 8 Siddharth Sharma, Oct 2016 WHAT S NEW IN CUDA 8 Why Should You Care >2X Run Computations Faster* Solve Larger Problems** Critical Path Analysis * HOOMD Blue v1.3.3 Lennard-Jones liquid
More informationCUDA Conference. Walter Mundt-Blum March 6th, 2008
CUDA Conference Walter Mundt-Blum March 6th, 2008 NVIDIA s Businesses Multiple Growth Engines GPU Graphics Processing Units MCP Media and Communications Processors PESG Professional Embedded & Solutions
More informationThe Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System
The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System Alan Humphrey, Qingyu Meng, Martin Berzins Scientific Computing and Imaging Institute & University of Utah I. Uintah Overview
More informationLarge scale Imaging on Current Many- Core Platforms
Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,
More informationSCIENTIFIC VISUALIZATION ON GPU CLUSTERS PETER MESSMER, NVIDIA
SCIENTIFIC VISUALIZATION ON GPU CLUSTERS PETER MESSMER, NVIDIA Visualization Rendering Visualization Isosurfaces, Isovolumes Field Operators (Gradient, Curl,.. ) Coordinate transformations Feature extraction
More informationGPU Ray Tracing at the Desktop and in the Cloud. Phillip Miller, NVIDIA Ludwig von Reiche, mental images
GPU Ray Tracing at the Desktop and in the Cloud Phillip Miller, NVIDIA Ludwig von Reiche, mental images Ray Tracing has always had an appeal Ray Tracing Prediction The future of interactive graphics is
More informationOak Ridge National Laboratory Computing and Computational Sciences
Oak Ridge National Laboratory Computing and Computational Sciences OFA Update by ORNL Presented by: Pavel Shamis (Pasha) OFA Workshop Mar 17, 2015 Acknowledgments Bernholdt David E. Hill Jason J. Leverman
More informationVMD: Immersive Molecular Visualization and Interactive Ray Tracing for Domes, Panoramic Theaters, and Head Mounted Displays
VMD: Immersive Molecular Visualization and Interactive Ray Tracing for Domes, Panoramic Theaters, and Head Mounted Displays John E. Stone Theoretical and Computational Biophysics Group Beckman Institute
More informationIs OpenMP 4.5 Target Off-load Ready for Real Life? A Case Study of Three Benchmark Kernels
National Aeronautics and Space Administration Is OpenMP 4.5 Target Off-load Ready for Real Life? A Case Study of Three Benchmark Kernels Jose M. Monsalve Diaz (UDEL), Gabriele Jost (NASA), Sunita Chandrasekaran
More informationTuring Architecture and CUDA 10 New Features. Minseok Lee, Developer Technology Engineer, NVIDIA
Turing Architecture and CUDA 10 New Features Minseok Lee, Developer Technology Engineer, NVIDIA Turing Architecture New SM Architecture Multi-Precision Tensor Core RT Core Turing MPS Inference Accelerated,
More informationRay Tracing. Kjetil Babington
Ray Tracing Kjetil Babington 21.10.2011 1 Introduction What is Ray Tracing? Act of tracing a ray through some scene Not necessarily for rendering Rendering with Ray Tracing Ray Tracing is a global illumination
More informationPresent and Future Leadership Computers at OLCF
Present and Future Leadership Computers at OLCF Al Geist ORNL Corporate Fellow DOE Data/Viz PI Meeting January 13-15, 2015 Walnut Creek, CA ORNL is managed by UT-Battelle for the US Department of Energy
More informationReview for Ray-tracing Algorithm and Hardware
Review for Ray-tracing Algorithm and Hardware Reporter: 邱敬捷博士候選人 Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Summer, 2017 1 2017/7/26 Outline
More informationPortable Heterogeneous High-Performance Computing via Domain-Specific Virtualization. Dmitry I. Lyakh.
Portable Heterogeneous High-Performance Computing via Domain-Specific Virtualization Dmitry I. Lyakh liakhdi@ornl.gov This research used resources of the Oak Ridge Leadership Computing Facility at the
More informationNVIDIA Case Studies:
NVIDIA Case Studies: OptiX & Image Space Photon Mapping David Luebke NVIDIA Research Beyond Programmable Shading 0 How Far Beyond? The continuum Beyond Programmable Shading Just programmable shading: DX,
More informationTitan - Early Experience with the Titan System at Oak Ridge National Laboratory
Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid
More informationCOMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.
COMP 4801 Final Year Project Ray Tracing for Computer Graphics Final Project Report FYP 15014 by Runjing Liu Advised by Dr. L.Y. Wei 1 Abstract The goal of this project was to use ray tracing in a rendering
More informationX10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management
X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management Hideyuki Shamoto, Tatsuhiro Chiba, Mikio Takeuchi Tokyo Institute of Technology IBM Research Tokyo Programming for large
More informationDuksu Kim. Professional Experience Senior researcher, KISTI High performance visualization
Duksu Kim Assistant professor, KORATEHC Education Ph.D. Computer Science, KAIST Parallel Proximity Computation on Heterogeneous Computing Systems for Graphics Applications Professional Experience Senior
More informationShaders. Slide credit to Prof. Zwicker
Shaders Slide credit to Prof. Zwicker 2 Today Shader programming 3 Complete model Blinn model with several light sources i diffuse specular ambient How is this implemented on the graphics processor (GPU)?
More informationInteractive Supercomputing for State-of-the-art Biomolecular Simulation
Interactive Supercomputing for State-of-the-art Biomolecular Simulation John E. Stone Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of
More informationDebugging CUDA Applications with Allinea DDT. Ian Lumb Sr. Systems Engineer, Allinea Software Inc.
Debugging CUDA Applications with Allinea DDT Ian Lumb Sr. Systems Engineer, Allinea Software Inc. ilumb@allinea.com GTC 2013, San Jose, March 20, 2013 Embracing GPUs GPUs a rival to traditional processors
More informationCS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST
CS 380 - GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8 Markus Hadwiger, KAUST Reading Assignment #5 (until March 12) Read (required): Programming Massively Parallel Processors book, Chapter
More informationSIGGRAPH 2013 Shaping the Future of Visual Computing
SIGGRAPH 2013 Shaping the Future of Visual Computing Building Ray Tracing Applications with OptiX David McAllister, Ph.D., OptiX R&D Manager Brandon Lloyd, Ph.D., OptiX Software Engineer Why ray tracing?
More informationEnhancing Traditional Rasterization Graphics with Ray Tracing. October 2015
Enhancing Traditional Rasterization Graphics with Ray Tracing October 2015 James Rumble Developer Technology Engineer, PowerVR Graphics Overview Ray Tracing Fundamentals PowerVR Ray Tracing Pipeline Using
More informationAdaptable IO System (ADIOS)
Adaptable IO System (ADIOS) http://www.cc.gatech.edu/~lofstead/adios Cray User Group 2008 May 8, 2008 Chen Jin, Scott Klasky, Stephen Hodson, James B. White III, Weikuan Yu (Oak Ridge National Laboratory)
More informationRckT: Scalable Physically Accurate Spectral Rendering in OSPRay
RckT: Scalable Physically Accurate Spectral Rendering in OSPRay Christiaan Gribble SURVICE Engineering Intel HPC Developer Conference 11 November 2017 Innovative Rendering for Simulation High-performance
More informationCSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University
CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand
More informationSiggraph Asia December 2011
Siggraph Asia December 2011 Advanced Graphics Always Core to NVIDIA Worldwide Leader in GPU Development & Professional Graphics Advanced Rendering Commitment 2007 Worldwide Leader in GPU Development &
More informationHPC Saudi Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences. Presented to: March 14, 2017
Creating an Exascale Ecosystem for Science Presented to: HPC Saudi 2017 Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences March 14, 2017 ORNL is managed by UT-Battelle
More informationCombining NVIDIA Docker and databases to enhance agile development and optimize resource allocation
Combining NVIDIA Docker and databases to enhance agile development and optimize resource allocation Chris Davis, Sophie Voisin, Devin White, Andrew Hardin Scalable and High Performance Geocomputation Team
More informationChristopher Sewell Katrin Heitmann Li-ta Lo Salman Habib James Ahrens
LA-UR- 14-25437 Approved for public release; distribution is unlimited. Title: Portable Parallel Halo and Center Finders for HACC Author(s): Christopher Sewell Katrin Heitmann Li-ta Lo Salman Habib James
More informationHDF5 I/O Performance. HDF and HDF-EOS Workshop VI December 5, 2002
HDF5 I/O Performance HDF and HDF-EOS Workshop VI December 5, 2002 1 Goal of this talk Give an overview of the HDF5 Library tuning knobs for sequential and parallel performance 2 Challenging task HDF5 Library
More informationResponsive Large Data Analysis and Visualization with the ParaView Ecosystem. Patrick O Leary, Kitware Inc
Responsive Large Data Analysis and Visualization with the ParaView Ecosystem Patrick O Leary, Kitware Inc Hybrid Computing Attribute Titan Summit - 2018 Compute Nodes 18,688 ~3,400 Processor (1) 16-core
More informationCuda C Programming Guide Appendix C Table C-
Cuda C Programming Guide Appendix C Table C-4 Professional CUDA C Programming (1118739329) cover image into the powerful world of parallel GPU programming with this down-to-earth, practical guide Table
More informationPhotorealism: Ray Tracing
Photorealism: Ray Tracing Reading Assignment: Chapter 13 Local vs. Global Illumination Local Illumination depends on local object and light sources only Global Illumination at a point can depend on any
More informationCUDA Experiences: Over-Optimization and Future HPC
CUDA Experiences: Over-Optimization and Future HPC Carl Pearson 1, Simon Garcia De Gonzalo 2 Ph.D. candidates, Electrical and Computer Engineering 1 / Computer Science 2, University of Illinois Urbana-Champaign
More informationCS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology
CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367
More informationVoxel Cone Tracing and Sparse Voxel Octree for Real-time Global Illumination. Cyril Crassin NVIDIA Research
Voxel Cone Tracing and Sparse Voxel Octree for Real-time Global Illumination Cyril Crassin NVIDIA Research Global Illumination Indirect effects Important for realistic image synthesis Direct lighting Direct+Indirect
More informationDamaris. In-Situ Data Analysis and Visualization for Large-Scale HPC Simulations. KerData Team. Inria Rennes,
Damaris In-Situ Data Analysis and Visualization for Large-Scale HPC Simulations KerData Team Inria Rennes, http://damaris.gforge.inria.fr Outline 1. From I/O to in-situ visualization 2. Damaris approach
More informationSCIENTIFIC VISUALIZATION IN HPC
April 4-7, 2016 Silicon Valley SCIENTIFIC VISUALIZATION IN HPC Peter Messmer, 4/4/2016 HIGH PERFORMANCE COMPUTING TODAY* "Yes," said Deep Thought, "I can do it." [Seven and a half million years later...
More informationImmersive Out-of-Core Visualization of Large-Size and Long-Timescale Molecular Dynamics Trajectories
Immersive Out-of-Core Visualization of Large-Size and Long-Timescale Molecular Dynamics Trajectories J. Stone, K. Vandivort, K. Schulten Theoretical and Computational Biophysics Group Beckman Institute
More informationNVIDIA Update and Directions on GPU Acceleration for Earth System Models
NVIDIA Update and Directions on GPU Acceleration for Earth System Models Stan Posey, HPC Program Manager, ESM and CFD, NVIDIA, Santa Clara, CA, USA Carl Ponder, PhD, Applications Software Engineer, NVIDIA,
More informationNVIDIA Advanced Rendering
NVIDIA Advanced Rendering and GPU Ray Tracing SIGGRAPH ASIA 2012 Singapore Phillip Miller Director of Product Management NVIDIA Advanced Rendering Agenda 1. What is NVIDIA Advanced Rendering? 2. Progress
More informationUCX: An Open Source Framework for HPC Network APIs and Beyond
UCX: An Open Source Framework for HPC Network APIs and Beyond Presented by: Pavel Shamis / Pasha ORNL is managed by UT-Battelle for the US Department of Energy Co-Design Collaboration The Next Generation
More informationPerformance and Energy Usage of Workloads on KNL and Haswell Architectures
Performance and Energy Usage of Workloads on KNL and Haswell Architectures Tyler Allen 1 Christopher Daley 2 Doug Doerfler 2 Brian Austin 2 Nicholas Wright 2 1 Clemson University 2 National Energy Research
More informationProgrammable GPUS. Last Time? Reading for Today. Homework 4. Planar Shadows Projective Texture Shadows Shadow Maps Shadow Volumes
Last Time? Programmable GPUS Planar Shadows Projective Texture Shadows Shadow Maps Shadow Volumes frame buffer depth buffer stencil buffer Stencil Buffer Homework 4 Reading for Create some geometry "Rendering
More informationVisual Analysis of Lagrangian Particle Data from Combustion Simulations
Visual Analysis of Lagrangian Particle Data from Combustion Simulations Hongfeng Yu Sandia National Laboratories, CA Ultrascale Visualization Workshop, SC11 Nov 13 2011, Seattle, WA Joint work with Jishang
More informationIntroduction to HPC Parallel I/O
Introduction to HPC Parallel I/O Feiyi Wang (Ph.D.) and Sarp Oral (Ph.D.) Technology Integration Group Oak Ridge Leadership Computing ORNL is managed by UT-Battelle for the US Department of Energy Outline
More informationAn In-Situ Visualization Approach for the K computer using Mesa 3D and KVS
An In-Situ Visualization Approach for the K computer using Mesa 3D and KVS Kengo Hayashi 1,2, Naohisa Sakamoto 1,2 Jorji Nonaka 2, Motohiko Mastuda 2, Fumiyoshi Shoji 1 Kobe Univesity, 2 RIKEN Center for
More informationTrends in HPC (hardware complexity and software challenges)
Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18
More informationThe Rasterization Pipeline
Lecture 5: The Rasterization Pipeline Computer Graphics and Imaging UC Berkeley CS184/284A, Spring 2016 What We ve Covered So Far z x y z x y (0, 0) (w, h) Position objects and the camera in the world
More informationGPU-Accelerated Analysis of Large Biomolecular Complexes
GPU-Accelerated Analysis of Large Biomolecular Complexes John E. Stone Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign
More informationReal-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010
1 Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 Presentation by Henrik H. Knutsen for TDT24, fall 2012 Om du ønsker, kan du sette inn navn, tittel på foredraget, o.l.
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Waiting for Moore s Law to save your serial code start getting bleak in 2004 Source: published SPECInt data Moore s Law is not at all
More informationNVIDIA DESIGNWORKS Ankit Patel - Prerna Dogra -
NVIDIA DESIGNWORKS Ankit Patel - ankitp@nvidia.com Prerna Dogra - pdogra@nvidia.com 1 Autonomous Driving Deep Learning Visual Effects Virtual Desktops Visual Computing is our singular mission Gaming Product
More informationCME 213 S PRING Eric Darve
CME 213 S PRING 2017 Eric Darve Summary of previous lectures Pthreads: low-level multi-threaded programming OpenMP: simplified interface based on #pragma, adapted to scientific computing OpenMP for and
More informationANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation
ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent
More informationCS427 Multicore Architecture and Parallel Computing
CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:
More informationInterconnect Your Future
Interconnect Your Future Smart Interconnect for Next Generation HPC Platforms Gilad Shainer, August 2016, 4th Annual MVAPICH User Group (MUG) Meeting Mellanox Connects the World s Fastest Supercomputer
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware
More informationAn Evaluation of Unified Memory Technology on NVIDIA GPUs
An Evaluation of Unified Memory Technology on NVIDIA GPUs Wenqiang Li 1, Guanghao Jin 2, Xuewen Cui 1, Simon See 1,3 Center for High Performance Computing, Shanghai Jiao Tong University, China 1 Tokyo
More informationOverlapping Computation and Communication for Advection on Hybrid Parallel Computers
Overlapping Computation and Communication for Advection on Hybrid Parallel Computers James B White III (Trey) trey@ucar.edu National Center for Atmospheric Research Jack Dongarra dongarra@eecs.utk.edu
More informationAlgorithm Engineering Lab: Ray Tracing. 8. Februar 2018
Algorithm Engineering Lab: Ray Tracing Jenette Sellin Markus Pawellek 8. Februar 2018 Gliederung Goal of the Project Ray Tracing Background Starting Point Setting up the Environment Implementation Serialization
More informationReal-Time Voxelization for Global Illumination
Lecture 26: Real-Time Voxelization for Global Illumination Visual Computing Systems Voxelization to regular grid Input: scene triangles Output: surface information at each voxel in 3D grid - Simple case:
More informationIllinois Proposal Considerations Greg Bauer
- 2016 Greg Bauer Support model Blue Waters provides traditional Partner Consulting as part of its User Services. Standard service requests for assistance with porting, debugging, allocation issues, and
More informationCS Computer Graphics: Introduction to Ray Tracing
CS 543 - Computer Graphics: Introduction to Ray Tracing by Robert W. Lindeman gogo@wpi.edu (with help from Peter Lohrmann ;-) View Volume View volume similar to gluperspective Angle Aspect Near? Far? But
More informationCS Computer Graphics: Introduction to Ray Tracing
CS 543 - Computer Graphics: Introduction to Ray Tracing by Robert W. Lindeman gogo@wpi.edu (with help from Peter Lohrmann ;-) View Volume View volume similar to gluperspective Angle Aspect Near? Far? But
More informationParallel Programming on Larrabee. Tim Foley Intel Corp
Parallel Programming on Larrabee Tim Foley Intel Corp Motivation This morning we talked about abstractions A mental model for GPU architectures Parallel programming models Particular tools and APIs This
More informationCSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller
Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,
More informationShadows. COMP 575/770 Spring 2013
Shadows COMP 575/770 Spring 2013 Shadows in Ray Tracing Shadows are important for realism Basic idea: figure out whether a point on an object is illuminated by a light source Easy for ray tracers Just
More informationOctree-Based Sparse Voxelization for Real-Time Global Illumination. Cyril Crassin NVIDIA Research
Octree-Based Sparse Voxelization for Real-Time Global Illumination Cyril Crassin NVIDIA Research Voxel representations Crane et al. (NVIDIA) 2007 Allard et al. 2010 Christensen and Batali (Pixar) 2004
More informationKepler Overview Mark Ebersole
Kepler Overview Mark Ebersole TFLOPS TFLOPS 3x Performance in a Single Generation 3.5 3 2.5 2 1.5 1 0.5 0 1.25 1 Single Precision FLOPS (SGEMM) 2.90 TFLOPS.89 TFLOPS.36 TFLOPS Xeon E5-2690 Tesla M2090
More informationSpring 2009 Prof. Hyesoon Kim
Spring 2009 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on
More informationA More Realistic Way of Stressing the End-to-end I/O System
A More Realistic Way of Stressing the End-to-end I/O System Verónica G. Vergara Larrea Sarp Oral Dustin Leverman Hai Ah Nam Feiyi Wang James Simmons CUG 2015 April 29, 2015 Chicago, IL ORNL is managed
More informationSplotch: High Performance Visualization using MPI, OpenMP and CUDA
Splotch: High Performance Visualization using MPI, OpenMP and CUDA Klaus Dolag (Munich University Observatory) Martin Reinecke (MPA, Garching) Claudio Gheller (CSCS, Switzerland), Marzia Rivi (CINECA,
More informationAccelerating Realism with the (NVIDIA Scene Graph)
Accelerating Realism with the (NVIDIA Scene Graph) Holger Kunz Manager, Workstation Middleware Development Phillip Miller Director, Workstation Middleware Product Management NVIDIA application acceleration
More informationIBM Power AC922 Server
IBM Power AC922 Server The Best Server for Enterprise AI Highlights More accuracy - GPUs access system RAM for larger models Faster insights - significant deep learning speedups Rapid deployment - integrated
More informationReal-Time Reyes: Programmable Pipelines and Research Challenges. Anjul Patney University of California, Davis
Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney and John D. Owens SIGGRAPH Asia
More informationUsing AWS EC2 GPU Instances for Computational Microscopy and Biomolecular Simulation
Using AWS EC2 GPU Instances for Computational Microscopy and Biomolecular Simulation John E. Stone Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University
More informationAllowing Users to Run Services at the OLCF with Kubernetes
Allowing Users to Run Services at the OLCF with Kubernetes Jason Kincl Senior HPC Systems Engineer Ryan Adamson Senior HPC Security Engineer This work was supported by the Oak Ridge Leadership Computing
More informationPreparing Scientific Software for Exascale
Preparing Scientific Software for Exascale Jack Wells Director of Science Oak Ridge Leadership Computing Facility Oak Ridge National Laboratory Mini-Symposium on Scientific Software Engineering Monday,
More informationCS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside
CS230 : Computer Graphics Lecture 4 Tamar Shinar Computer Science & Engineering UC Riverside Shadows Shadows for each pixel do compute viewing ray if ( ray hits an object with t in [0, inf] ) then compute
More information6.837 Introduction to Computer Graphics Final Exam Tuesday, December 20, :05-12pm Two hand-written sheet of notes (4 pages) allowed 1 SSD [ /17]
6.837 Introduction to Computer Graphics Final Exam Tuesday, December 20, 2011 9:05-12pm Two hand-written sheet of notes (4 pages) allowed NAME: 1 / 17 2 / 12 3 / 35 4 / 8 5 / 18 Total / 90 1 SSD [ /17]
More informationframe buffer depth buffer stencil buffer
Final Project Proposals Programmable GPUS You should all have received an email with feedback Just about everyone was told: Test cases weren t detailed enough Project was possibly too big Motivation could
More informationPhilip C. Roth. Computer Science and Mathematics Division Oak Ridge National Laboratory
Philip C. Roth Computer Science and Mathematics Division Oak Ridge National Laboratory A Tree-Based Overlay Network (TBON) like MRNet provides scalable infrastructure for tools and applications MRNet's
More informationGTC 2017 S7672. OpenACC Best Practices: Accelerating the C++ NUMECA FINE/Open CFD Solver
David Gutzwiller, NUMECA USA (david.gutzwiller@numeca.com) Dr. Ravi Srinivasan, Dresser-Rand Alain Demeulenaere, NUMECA USA 5/9/2017 GTC 2017 S7672 OpenACC Best Practices: Accelerating the C++ NUMECA FINE/Open
More informationThe Traditional Graphics Pipeline
Last Time? The Traditional Graphics Pipeline Participating Media Measuring BRDFs 3D Digitizing & Scattering BSSRDFs Monte Carlo Simulation Dipole Approximation Today Ray Casting / Tracing Advantages? Ray
More informationShifter: Fast and consistent HPC workflows using containers
Shifter: Fast and consistent HPC workflows using containers CUG 2017, Redmond, Washington Lucas Benedicic, Felipe A. Cruz, Thomas C. Schulthess - CSCS May 11, 2017 Outline 1. Overview 2. Docker 3. Shifter
More informationHeadline in Arial Bold 30pt. Visualisation using the Grid Jeff Adie Principal Systems Engineer, SAPK July 2008
Headline in Arial Bold 30pt Visualisation using the Grid Jeff Adie Principal Systems Engineer, SAPK July 2008 Agenda Visualisation Today User Trends Technology Trends Grid Viz Nodes Software Ecosystem
More informationComputing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany
Computing on GPUs Prof. Dr. Uli Göhner DYNAmore GmbH Stuttgart, Germany Summary: The increasing power of GPUs has led to the intent to transfer computing load from CPUs to GPUs. A first example has been
More information