Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1

Size: px
Start display at page:

Download "Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1"

Transcription

1 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1 Benjamin Hernandez, PhD Advanced Data and Workflows Group Oak Ridge Leadership Computing Facility Oak Ridge National Laboratory ORNL is managed by UT-Battelle for the US Department of Energy

2 Oak Ridge Leadership Computing Facility (OLCF) Mission: Provide the computational and data resources required to solve the most challenging problems. Highly competitive user allocation programs (INCITE, ALCC). Projects receive 10x to 100x more resource than at other generally available centers. We partner with users to enable science & engineering breakthroughs. 2

3 Sight: Exploratory Visualization of Scientific Data Client/Server architecture to provide high end visualization in laptops, desktops, and powerwalls. Heterogeneous scientific visualization Take advantage of both CPU & GPU resources within a node: DGX-1 use case. Advanced shading to enable new insights into data exploration. Parallel I/O & Data Staging Pluggable for in-situ visualization Lightweight tool Load your data Perform exploratory analysis Visualize/Save results 3

4 Local/Parallel File System HPC Cluster ADIOS I/O System VTK-m VTK-m Compression Sight System Architecture (in progress) Visualization Frames OSPray Nvidia Optix CPU cores Multi-GPU Websockets Server (DGX-1 or multigpu node) HTML Client Server (DGX-1) Server (DGX-1) *OSPray and Nvidia Optix are finely tuned libraries for Ray Tracing in multicore and manycore architectures 4

5 ADIOS ADIOS is an I/O framework Provides multiple methods to stage data to a staging area (on node, off node, off machine) Data output can be anything one wants Different methods allow for different types of data movement, aggregation, and arrangement to the storage system or to stream over the local-nodes, LAN, WAN It contains our own file format if you choose to use it (ADIOS-BP) Compress/decompress data in parallel Contains mechanisms to index and query data 5

6 First Approach: OpenGL VBO (points) V.S. Apply transfer function G.S. Quads w/tex. coords F.S. Sphere gen. and Shading 6

7 OpenGL Bindless Graphics Initialization Address pointer Display Vertices start from vboaddress to vboaddress + sizeof (float)*size 7

8 Fragment Shader sphere Generation Sphere equation: r 2 = x x y y z z 0 2 (-1,1) (1,1) r = 1.0, z = 1.0 x = texcoord.x y = texcoord.y zz = 1.0 x*x y*y if (zz <= 0.0) // removes fragments outside discard; // scale to the desired radius // calculate diffuse illumination (-1,-1) (1,-1) 8

9 Results 9

10 OpenGL Multi-GPU Rendering One MPI task for each device Easy to implement Each device initialize its GLX/EGL context Multi-threading. One thread per device In EGL is possible: Create the main context in the main thread: mainctx = eglcreatecontext(display, config, 0, contextattrs) Each additional thread create a shared context: lclthrdctx = eglcreatecontext(display, config, mainctx, contextattrs); Implement some mutex/semaphores to sync any updates Vulkanize your viz! Devices are aware of other devices and can coordinate between each other That s precisely NVIDIA Optix can do 10

11 Second Approach Nvidia Optix Ray Tracing Engine The OptiX API is an application framework for achieving optimal ray tracing performance on the GPU. It provides a simple, recursive, and flexible pipeline for accelerating ray tracing algorithms. Similar to OpenGL in doing the heavy lifting of ray tracing and leaving capability and technique to the developers Plus it can use all GPUs available in your system Naturally fits material appearance and scene illumination 11

12 Nvidia Optix Programming Model Optix provides eight programmable components, some of them are: 1. Ray generation 2. Intersection 3. Shading (closest hit) Shadows (any hit) Selector Shaders are CUDA like syntax

13 Nvidia Optix Graph Nodes is defined by Graph nodes. A tree-like hierarchy where: Nodes at the bottom describes geometric objects. Nodes at the top describes collections of geometric objects. Group Acceleration Transform Selector Acceleration Group Group Group Acceleration Instance Instance Instance Instance 13

14 Nvidia Optix Graph Nodes Keep the hierarchy as flat as possible Group Acceleration Group Acceleration Instance Group Acceleration Group Acceleration Group Acceleration Particles Instance Instance Instance But not too flat! Particles Particles Particles 14

15 Results Test Systems Workstation CPU Intel Xeon 20 cores 512 GB GPU Titan Z (2 Geforce Kepler GPU, 2x6 GB VRAM), Ubuntu 16, Nvidia Driver Rhea Node CPU Intel Xeon GPU 2x Tesla K80 (4 Tesla Kepler GPU, 2x24 GB VRAM) Redhat 7. Nvidia Driver DGX-1 CPU Intel GPU 8x Tesla Pascal SMX, 8x16 GB VRAM Ubuntu 16, Nvidia Driver All systems: CUDA 8.0 Optix Acceleration Structure: Trbvh Image resolution: 1080p Shading: Phong Illumination & Ambient Occlusion 15

16 Time (ms) Results. How fast is built the acceleration structure? lower is better Workstation Rhea Node DGX-1 1 Million 10 Million 20 Million Particles 16

17 ms per frame ms per frame Results Performance, lower is better 275 Frame rate (worst case) Frame rate (best case) fps fps Workstation Rhea node DGX Million particles 30 fps 60 fps 5 0 Workstation Rhea node DGX Million particles 17

18 Results 18

19 Discussion DGX-1 can handle particle systems up to 10x larger in our test environment. For particle systems of the same size DGX-1 is 10x faster than the workstation system and 4.6x faster than Rhea node We expect for larger image resolutions DGX-1 speed up will increase. Our preliminary tests showed DGX-1 has enough compute power to drive a powerwall 3840 x fps Test larger resolution Researchers usually are happy when they can explore datasets even at 1 fps 19

20 Discussion Nvidia Optix provides multi-gpu support with no hassle Test if Nvidia Optix leaves free resources for analysis tasks. Paging was removed in Optix 4.x DGX-1 includes 40 CPU cores and 512 RAM Using Nvidia Optix & OSPRay library will allow full system allocation to handle larger systems. Summit is likely to support EGL through the Nvidia GPU Drivers (do not take it as a fact or alternative fact neither!). Best if (pre)exascale visualization tools are 100 % CUDA compliant. 20

21 Questions? Benjamin Hernandez, PhD Advanced Data and Workflows Group Oak Ridge Leadership Computing Facility Oak Ridge National Laboratory 21 Acknowledgements: Dylan Lacewell and the Nvidia Optix Team for their technical support. Datasets provided by Cheng-Yu Shi and Leonid Zhigilei from the Computational Materials Group at University of Virginia. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

22 Further Reading Tom True, Alina Alt (2013) Configuring, Programming and Debugging Applications for Multiple GPUs GTC 2013 Wil Braithwaite Multi-GPU Programming for Visual Computing SIGGRAPH 2013 Available in GTC on Demand: Adios Manual Optix Tutorial Talks submit=&select=+ 22

23 2018 INCITE Call for Proposals The 2018 INCITE Call for Proposals opened April 17, 2017 and closes June 23, Features large allocations of computer time and supporting resources at the Argonne and Oak Ridge Leadership Computing Facility (LCF) centers, operated by the US Department of Energy (DOE) Office of Science. Soliciting research proposals for awards of time on the 27-petaflop Cray XK7 Titan, and the 10-petaflop IBM Blue Gene/Q, Mira. In addition, certain 2018 INCITE awards will receive time on Argonne s new Intel/Cray system, a 9.65-petaflops system called Theta. The INCITE program seeks research proposals for capability computing Production simulations, including ensembles, that use a large fraction of the LCF systems, or Proposals that require the unique LCF architectural infrastructure for highperformance computing projects that cannot be performed elsewhere The INCITE program is open to US and non-us based researchers. The INCITE program invites you to participate in an INCITE Proposal Writing Webinar, offered on April 19, May 18, and June 6. For more information visit 23

24 Results How fast is built the acceleration structure? Workstation Rhea node DGX-1 Time(%) Time Calls Avg Min Max Name 49.63% ms us us ms [CUDA memcpy HtoD] 22.64% ms ms ms ms Megakernel_CUDA_ % ms ms us ms [CUDA memcpy DtoH] 6.74% ms ms ms ms Megakernel_CUDA_0 0.30% us us us us [CUDA memcpy HtoA] 0.21% us us us us [CUDA memset] 0.05% us us us us [CUDA memcpy DtoD] Time(%) Time Calls Avg Min Max Name 50.29% ms us us ms [CUDA memcpy HtoD] 34.61% ms us us ms [CUDA memcpy DtoH] 9.48% ms us us ms Megakernel_CUDA_1 5.03% ms ms ms ms Megakernel_CUDA_0 0.36% us us us us [CUDA memset] 0.17% us us us us [CUDA memcpy HtoA] 0.06% us us us us [CUDA memcpy DtoD] Time(%) Time Calls Avg Min Max Name 43.50% ms us us ms [CUDA memcpy HtoD] 30.77% ms us us ms [CUDA memcpy DtoH] 18.13% ms us us us [CUDA memcpy PtoP] 6.16% ms us us us Megakernel_CUDA_1 1.11% us us us us Megakernel_CUDA_0 0.14% us us us us [CUDA memcpy HtoA] 0.13% us us us us [CUDA memset] 0.06% us us us us [CUDA memcpy DtoD] 24

25 Results How fast is built the acceleration structure? Workstation Rhea node DGX-1 Time(%) Time Calls Avg Min Max Name 49.63% ms us us ms [CUDA memcpy HtoD] 22.64% ms ms ms ms Megakernel_CUDA_ % ms ms us ms [CUDA memcpy DtoH] 6.74% ms ms ms ms Megakernel_CUDA_0 0.30% us us us us [CUDA memcpy HtoA] 0.21% us us us us [CUDA memset] 0.05% us us us us [CUDA memcpy DtoD] Time(%) Time Calls Avg Min Max Name 50.29% ms us us ms [CUDA memcpy HtoD] 34.61% ms us us ms [CUDA memcpy DtoH] 9.48% ms us us ms Megakernel_CUDA_1 5.03% ms ms ms ms Megakernel_CUDA_0 0.36% us us us us [CUDA memset] 0.17% us us us us [CUDA memcpy HtoA] 0.06% us us us us [CUDA memcpy DtoD] Time(%) Time Calls Avg Min Max Name 43.50% ms us us ms [CUDA memcpy HtoD] 30.77% ms us us ms [CUDA memcpy DtoH] 18.13% ms us us us [CUDA memcpy PtoP] 6.16% ms us us us Megakernel_CUDA_1 1.11% us us us us Megakernel_CUDA_0 0.14% us us us us [CUDA memcpy HtoA] 0.13% us us us us [CUDA memset] 0.06% us us us us [CUDA memcpy DtoD] 25

26 ADIOS I/O Abstracting metadata, data types, and dimensions from the source code into an XML file C Fortran zlib, bzip2, szip, zfp, isobar Alacrity all data in adios_write() calls are buffered before writing to the file system. POSIX MPI MPI_LUSTRE PHDF5 DATASPACES DIMES FLEXPATH ICEE 26

27 ADIOS I/O Generate the c-code from the XML file gpp.py atoms.xml gwrite_atoms.ch gread_atoms.ch Both files contains code to write and read ADIOS files You only need to modify your XML file an generate new *.ch files Main code remains the same 27

28 ADIOS Write/Read example Write Read 28

SIGHT. Benjamin Hernandez, PhD Advanced Data and Workflow(s) Group

SIGHT. Benjamin Hernandez, PhD Advanced Data and Workflow(s) Group SIGHT Benjamin Hernandez, PhD Advanced Data and Workflow(s) Group hernandezarb@ornl.gov ORNL is managed by UT-Battelle for the US Department of Energy name 1 Presentation This research used resources of

More information

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Preparing GPU-Accelerated Applications for the Summit Supercomputer Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership

More information

The Titan Tools Experience

The Titan Tools Experience The Titan Tools Experience Michael J. Brim, Ph.D. Computer Science Research, CSMD/NCCS Petascale Tools Workshop 213 Madison, WI July 15, 213 Overview of Titan Cray XK7 18,688+ compute nodes 16-core AMD

More information

INTRODUCTION TO OPTIX. Martin Stich, Engineering Manager

INTRODUCTION TO OPTIX. Martin Stich, Engineering Manager INTRODUCTION TO OPTIX Martin Stich, Engineering Manager OptiX Basics AGENDA Advanced Topics Case Studies Feature Outlook 2 OPTIX BASICS 3 IN A NUTSHELL The OptiX Ray Tracing SDK State-of-the-art performance:

More information

Early Experiences Writing Performance Portable OpenMP 4 Codes

Early Experiences Writing Performance Portable OpenMP 4 Codes Early Experiences Writing Performance Portable OpenMP 4 Codes Verónica G. Vergara Larrea Wayne Joubert M. Graham Lopez Oscar Hernandez Oak Ridge National Laboratory Problem statement APU FPGA neuromorphic

More information

Enabling the Next Generation of Computational Graphics with NVIDIA Nsight Visual Studio Edition. Jeff Kiel Director, Graphics Developer Tools

Enabling the Next Generation of Computational Graphics with NVIDIA Nsight Visual Studio Edition. Jeff Kiel Director, Graphics Developer Tools Enabling the Next Generation of Computational Graphics with NVIDIA Nsight Visual Studio Edition Jeff Kiel Director, Graphics Developer Tools Computational Graphics Enabled Problem: Complexity of Computation

More information

Johannes Günther, Senior Graphics Software Engineer. Intel Data Center Group, HPC Visualization

Johannes Günther, Senior Graphics Software Engineer. Intel Data Center Group, HPC Visualization Johannes Günther, Senior Graphics Software Engineer Intel Data Center Group, HPC Visualization Data set provided by Florida International University: Simulated fluid flow through a porous medium Large

More information

OpenMPSuperscalar: Task-Parallel Simulation and Visualization of Crowds with Several CPUs and GPUs

OpenMPSuperscalar: Task-Parallel Simulation and Visualization of Crowds with Several CPUs and GPUs www.bsc.es OpenMPSuperscalar: Task-Parallel Simulation and Visualization of Crowds with Several CPUs and GPUs Hugo Pérez UPC-BSC Benjamin Hernandez Oak Ridge National Lab Isaac Rudomin BSC March 2015 OUTLINE

More information

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Quinn Mitchell HPC UNIX/LINUX Storage Systems ORNL is managed by UT-Battelle for the US Department of Energy U.S. Department

More information

WHAT S NEW IN CUDA 8. Siddharth Sharma, Oct 2016

WHAT S NEW IN CUDA 8. Siddharth Sharma, Oct 2016 WHAT S NEW IN CUDA 8 Siddharth Sharma, Oct 2016 WHAT S NEW IN CUDA 8 Why Should You Care >2X Run Computations Faster* Solve Larger Problems** Critical Path Analysis * HOOMD Blue v1.3.3 Lennard-Jones liquid

More information

CUDA Conference. Walter Mundt-Blum March 6th, 2008

CUDA Conference. Walter Mundt-Blum March 6th, 2008 CUDA Conference Walter Mundt-Blum March 6th, 2008 NVIDIA s Businesses Multiple Growth Engines GPU Graphics Processing Units MCP Media and Communications Processors PESG Professional Embedded & Solutions

More information

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System Alan Humphrey, Qingyu Meng, Martin Berzins Scientific Computing and Imaging Institute & University of Utah I. Uintah Overview

More information

Large scale Imaging on Current Many- Core Platforms

Large scale Imaging on Current Many- Core Platforms Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,

More information

SCIENTIFIC VISUALIZATION ON GPU CLUSTERS PETER MESSMER, NVIDIA

SCIENTIFIC VISUALIZATION ON GPU CLUSTERS PETER MESSMER, NVIDIA SCIENTIFIC VISUALIZATION ON GPU CLUSTERS PETER MESSMER, NVIDIA Visualization Rendering Visualization Isosurfaces, Isovolumes Field Operators (Gradient, Curl,.. ) Coordinate transformations Feature extraction

More information

GPU Ray Tracing at the Desktop and in the Cloud. Phillip Miller, NVIDIA Ludwig von Reiche, mental images

GPU Ray Tracing at the Desktop and in the Cloud. Phillip Miller, NVIDIA Ludwig von Reiche, mental images GPU Ray Tracing at the Desktop and in the Cloud Phillip Miller, NVIDIA Ludwig von Reiche, mental images Ray Tracing has always had an appeal Ray Tracing Prediction The future of interactive graphics is

More information

Oak Ridge National Laboratory Computing and Computational Sciences

Oak Ridge National Laboratory Computing and Computational Sciences Oak Ridge National Laboratory Computing and Computational Sciences OFA Update by ORNL Presented by: Pavel Shamis (Pasha) OFA Workshop Mar 17, 2015 Acknowledgments Bernholdt David E. Hill Jason J. Leverman

More information

VMD: Immersive Molecular Visualization and Interactive Ray Tracing for Domes, Panoramic Theaters, and Head Mounted Displays

VMD: Immersive Molecular Visualization and Interactive Ray Tracing for Domes, Panoramic Theaters, and Head Mounted Displays VMD: Immersive Molecular Visualization and Interactive Ray Tracing for Domes, Panoramic Theaters, and Head Mounted Displays John E. Stone Theoretical and Computational Biophysics Group Beckman Institute

More information

Is OpenMP 4.5 Target Off-load Ready for Real Life? A Case Study of Three Benchmark Kernels

Is OpenMP 4.5 Target Off-load Ready for Real Life? A Case Study of Three Benchmark Kernels National Aeronautics and Space Administration Is OpenMP 4.5 Target Off-load Ready for Real Life? A Case Study of Three Benchmark Kernels Jose M. Monsalve Diaz (UDEL), Gabriele Jost (NASA), Sunita Chandrasekaran

More information

Turing Architecture and CUDA 10 New Features. Minseok Lee, Developer Technology Engineer, NVIDIA

Turing Architecture and CUDA 10 New Features. Minseok Lee, Developer Technology Engineer, NVIDIA Turing Architecture and CUDA 10 New Features Minseok Lee, Developer Technology Engineer, NVIDIA Turing Architecture New SM Architecture Multi-Precision Tensor Core RT Core Turing MPS Inference Accelerated,

More information

Ray Tracing. Kjetil Babington

Ray Tracing. Kjetil Babington Ray Tracing Kjetil Babington 21.10.2011 1 Introduction What is Ray Tracing? Act of tracing a ray through some scene Not necessarily for rendering Rendering with Ray Tracing Ray Tracing is a global illumination

More information

Present and Future Leadership Computers at OLCF

Present and Future Leadership Computers at OLCF Present and Future Leadership Computers at OLCF Al Geist ORNL Corporate Fellow DOE Data/Viz PI Meeting January 13-15, 2015 Walnut Creek, CA ORNL is managed by UT-Battelle for the US Department of Energy

More information

Review for Ray-tracing Algorithm and Hardware

Review for Ray-tracing Algorithm and Hardware Review for Ray-tracing Algorithm and Hardware Reporter: 邱敬捷博士候選人 Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Summer, 2017 1 2017/7/26 Outline

More information

Portable Heterogeneous High-Performance Computing via Domain-Specific Virtualization. Dmitry I. Lyakh.

Portable Heterogeneous High-Performance Computing via Domain-Specific Virtualization. Dmitry I. Lyakh. Portable Heterogeneous High-Performance Computing via Domain-Specific Virtualization Dmitry I. Lyakh liakhdi@ornl.gov This research used resources of the Oak Ridge Leadership Computing Facility at the

More information

NVIDIA Case Studies:

NVIDIA Case Studies: NVIDIA Case Studies: OptiX & Image Space Photon Mapping David Luebke NVIDIA Research Beyond Programmable Shading 0 How Far Beyond? The continuum Beyond Programmable Shading Just programmable shading: DX,

More information

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid

More information

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y. COMP 4801 Final Year Project Ray Tracing for Computer Graphics Final Project Report FYP 15014 by Runjing Liu Advised by Dr. L.Y. Wei 1 Abstract The goal of this project was to use ray tracing in a rendering

More information

X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management

X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management Hideyuki Shamoto, Tatsuhiro Chiba, Mikio Takeuchi Tokyo Institute of Technology IBM Research Tokyo Programming for large

More information

Duksu Kim. Professional Experience Senior researcher, KISTI High performance visualization

Duksu Kim. Professional Experience Senior researcher, KISTI High performance visualization Duksu Kim Assistant professor, KORATEHC Education Ph.D. Computer Science, KAIST Parallel Proximity Computation on Heterogeneous Computing Systems for Graphics Applications Professional Experience Senior

More information

Shaders. Slide credit to Prof. Zwicker

Shaders. Slide credit to Prof. Zwicker Shaders Slide credit to Prof. Zwicker 2 Today Shader programming 3 Complete model Blinn model with several light sources i diffuse specular ambient How is this implemented on the graphics processor (GPU)?

More information

Interactive Supercomputing for State-of-the-art Biomolecular Simulation

Interactive Supercomputing for State-of-the-art Biomolecular Simulation Interactive Supercomputing for State-of-the-art Biomolecular Simulation John E. Stone Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of

More information

Debugging CUDA Applications with Allinea DDT. Ian Lumb Sr. Systems Engineer, Allinea Software Inc.

Debugging CUDA Applications with Allinea DDT. Ian Lumb Sr. Systems Engineer, Allinea Software Inc. Debugging CUDA Applications with Allinea DDT Ian Lumb Sr. Systems Engineer, Allinea Software Inc. ilumb@allinea.com GTC 2013, San Jose, March 20, 2013 Embracing GPUs GPUs a rival to traditional processors

More information

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST CS 380 - GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8 Markus Hadwiger, KAUST Reading Assignment #5 (until March 12) Read (required): Programming Massively Parallel Processors book, Chapter

More information

SIGGRAPH 2013 Shaping the Future of Visual Computing

SIGGRAPH 2013 Shaping the Future of Visual Computing SIGGRAPH 2013 Shaping the Future of Visual Computing Building Ray Tracing Applications with OptiX David McAllister, Ph.D., OptiX R&D Manager Brandon Lloyd, Ph.D., OptiX Software Engineer Why ray tracing?

More information

Enhancing Traditional Rasterization Graphics with Ray Tracing. October 2015

Enhancing Traditional Rasterization Graphics with Ray Tracing. October 2015 Enhancing Traditional Rasterization Graphics with Ray Tracing October 2015 James Rumble Developer Technology Engineer, PowerVR Graphics Overview Ray Tracing Fundamentals PowerVR Ray Tracing Pipeline Using

More information

Adaptable IO System (ADIOS)

Adaptable IO System (ADIOS) Adaptable IO System (ADIOS) http://www.cc.gatech.edu/~lofstead/adios Cray User Group 2008 May 8, 2008 Chen Jin, Scott Klasky, Stephen Hodson, James B. White III, Weikuan Yu (Oak Ridge National Laboratory)

More information

RckT: Scalable Physically Accurate Spectral Rendering in OSPRay

RckT: Scalable Physically Accurate Spectral Rendering in OSPRay RckT: Scalable Physically Accurate Spectral Rendering in OSPRay Christiaan Gribble SURVICE Engineering Intel HPC Developer Conference 11 November 2017 Innovative Rendering for Simulation High-performance

More information

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand

More information

Siggraph Asia December 2011

Siggraph Asia December 2011 Siggraph Asia December 2011 Advanced Graphics Always Core to NVIDIA Worldwide Leader in GPU Development & Professional Graphics Advanced Rendering Commitment 2007 Worldwide Leader in GPU Development &

More information

HPC Saudi Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences. Presented to: March 14, 2017

HPC Saudi Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences. Presented to: March 14, 2017 Creating an Exascale Ecosystem for Science Presented to: HPC Saudi 2017 Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences March 14, 2017 ORNL is managed by UT-Battelle

More information

Combining NVIDIA Docker and databases to enhance agile development and optimize resource allocation

Combining NVIDIA Docker and databases to enhance agile development and optimize resource allocation Combining NVIDIA Docker and databases to enhance agile development and optimize resource allocation Chris Davis, Sophie Voisin, Devin White, Andrew Hardin Scalable and High Performance Geocomputation Team

More information

Christopher Sewell Katrin Heitmann Li-ta Lo Salman Habib James Ahrens

Christopher Sewell Katrin Heitmann Li-ta Lo Salman Habib James Ahrens LA-UR- 14-25437 Approved for public release; distribution is unlimited. Title: Portable Parallel Halo and Center Finders for HACC Author(s): Christopher Sewell Katrin Heitmann Li-ta Lo Salman Habib James

More information

HDF5 I/O Performance. HDF and HDF-EOS Workshop VI December 5, 2002

HDF5 I/O Performance. HDF and HDF-EOS Workshop VI December 5, 2002 HDF5 I/O Performance HDF and HDF-EOS Workshop VI December 5, 2002 1 Goal of this talk Give an overview of the HDF5 Library tuning knobs for sequential and parallel performance 2 Challenging task HDF5 Library

More information

Responsive Large Data Analysis and Visualization with the ParaView Ecosystem. Patrick O Leary, Kitware Inc

Responsive Large Data Analysis and Visualization with the ParaView Ecosystem. Patrick O Leary, Kitware Inc Responsive Large Data Analysis and Visualization with the ParaView Ecosystem Patrick O Leary, Kitware Inc Hybrid Computing Attribute Titan Summit - 2018 Compute Nodes 18,688 ~3,400 Processor (1) 16-core

More information

Cuda C Programming Guide Appendix C Table C-

Cuda C Programming Guide Appendix C Table C- Cuda C Programming Guide Appendix C Table C-4 Professional CUDA C Programming (1118739329) cover image into the powerful world of parallel GPU programming with this down-to-earth, practical guide Table

More information

Photorealism: Ray Tracing

Photorealism: Ray Tracing Photorealism: Ray Tracing Reading Assignment: Chapter 13 Local vs. Global Illumination Local Illumination depends on local object and light sources only Global Illumination at a point can depend on any

More information

CUDA Experiences: Over-Optimization and Future HPC

CUDA Experiences: Over-Optimization and Future HPC CUDA Experiences: Over-Optimization and Future HPC Carl Pearson 1, Simon Garcia De Gonzalo 2 Ph.D. candidates, Electrical and Computer Engineering 1 / Computer Science 2, University of Illinois Urbana-Champaign

More information

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367

More information

Voxel Cone Tracing and Sparse Voxel Octree for Real-time Global Illumination. Cyril Crassin NVIDIA Research

Voxel Cone Tracing and Sparse Voxel Octree for Real-time Global Illumination. Cyril Crassin NVIDIA Research Voxel Cone Tracing and Sparse Voxel Octree for Real-time Global Illumination Cyril Crassin NVIDIA Research Global Illumination Indirect effects Important for realistic image synthesis Direct lighting Direct+Indirect

More information

Damaris. In-Situ Data Analysis and Visualization for Large-Scale HPC Simulations. KerData Team. Inria Rennes,

Damaris. In-Situ Data Analysis and Visualization for Large-Scale HPC Simulations. KerData Team. Inria Rennes, Damaris In-Situ Data Analysis and Visualization for Large-Scale HPC Simulations KerData Team Inria Rennes, http://damaris.gforge.inria.fr Outline 1. From I/O to in-situ visualization 2. Damaris approach

More information

SCIENTIFIC VISUALIZATION IN HPC

SCIENTIFIC VISUALIZATION IN HPC April 4-7, 2016 Silicon Valley SCIENTIFIC VISUALIZATION IN HPC Peter Messmer, 4/4/2016 HIGH PERFORMANCE COMPUTING TODAY* "Yes," said Deep Thought, "I can do it." [Seven and a half million years later...

More information

Immersive Out-of-Core Visualization of Large-Size and Long-Timescale Molecular Dynamics Trajectories

Immersive Out-of-Core Visualization of Large-Size and Long-Timescale Molecular Dynamics Trajectories Immersive Out-of-Core Visualization of Large-Size and Long-Timescale Molecular Dynamics Trajectories J. Stone, K. Vandivort, K. Schulten Theoretical and Computational Biophysics Group Beckman Institute

More information

NVIDIA Update and Directions on GPU Acceleration for Earth System Models

NVIDIA Update and Directions on GPU Acceleration for Earth System Models NVIDIA Update and Directions on GPU Acceleration for Earth System Models Stan Posey, HPC Program Manager, ESM and CFD, NVIDIA, Santa Clara, CA, USA Carl Ponder, PhD, Applications Software Engineer, NVIDIA,

More information

NVIDIA Advanced Rendering

NVIDIA Advanced Rendering NVIDIA Advanced Rendering and GPU Ray Tracing SIGGRAPH ASIA 2012 Singapore Phillip Miller Director of Product Management NVIDIA Advanced Rendering Agenda 1. What is NVIDIA Advanced Rendering? 2. Progress

More information

UCX: An Open Source Framework for HPC Network APIs and Beyond

UCX: An Open Source Framework for HPC Network APIs and Beyond UCX: An Open Source Framework for HPC Network APIs and Beyond Presented by: Pavel Shamis / Pasha ORNL is managed by UT-Battelle for the US Department of Energy Co-Design Collaboration The Next Generation

More information

Performance and Energy Usage of Workloads on KNL and Haswell Architectures

Performance and Energy Usage of Workloads on KNL and Haswell Architectures Performance and Energy Usage of Workloads on KNL and Haswell Architectures Tyler Allen 1 Christopher Daley 2 Doug Doerfler 2 Brian Austin 2 Nicholas Wright 2 1 Clemson University 2 National Energy Research

More information

Programmable GPUS. Last Time? Reading for Today. Homework 4. Planar Shadows Projective Texture Shadows Shadow Maps Shadow Volumes

Programmable GPUS. Last Time? Reading for Today. Homework 4. Planar Shadows Projective Texture Shadows Shadow Maps Shadow Volumes Last Time? Programmable GPUS Planar Shadows Projective Texture Shadows Shadow Maps Shadow Volumes frame buffer depth buffer stencil buffer Stencil Buffer Homework 4 Reading for Create some geometry "Rendering

More information

Visual Analysis of Lagrangian Particle Data from Combustion Simulations

Visual Analysis of Lagrangian Particle Data from Combustion Simulations Visual Analysis of Lagrangian Particle Data from Combustion Simulations Hongfeng Yu Sandia National Laboratories, CA Ultrascale Visualization Workshop, SC11 Nov 13 2011, Seattle, WA Joint work with Jishang

More information

Introduction to HPC Parallel I/O

Introduction to HPC Parallel I/O Introduction to HPC Parallel I/O Feiyi Wang (Ph.D.) and Sarp Oral (Ph.D.) Technology Integration Group Oak Ridge Leadership Computing ORNL is managed by UT-Battelle for the US Department of Energy Outline

More information

An In-Situ Visualization Approach for the K computer using Mesa 3D and KVS

An In-Situ Visualization Approach for the K computer using Mesa 3D and KVS An In-Situ Visualization Approach for the K computer using Mesa 3D and KVS Kengo Hayashi 1,2, Naohisa Sakamoto 1,2 Jorji Nonaka 2, Motohiko Mastuda 2, Fumiyoshi Shoji 1 Kobe Univesity, 2 RIKEN Center for

More information

Trends in HPC (hardware complexity and software challenges)

Trends in HPC (hardware complexity and software challenges) Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18

More information

The Rasterization Pipeline

The Rasterization Pipeline Lecture 5: The Rasterization Pipeline Computer Graphics and Imaging UC Berkeley CS184/284A, Spring 2016 What We ve Covered So Far z x y z x y (0, 0) (w, h) Position objects and the camera in the world

More information

GPU-Accelerated Analysis of Large Biomolecular Complexes

GPU-Accelerated Analysis of Large Biomolecular Complexes GPU-Accelerated Analysis of Large Biomolecular Complexes John E. Stone Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign

More information

Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010

Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 1 Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 Presentation by Henrik H. Knutsen for TDT24, fall 2012 Om du ønsker, kan du sette inn navn, tittel på foredraget, o.l.

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Waiting for Moore s Law to save your serial code start getting bleak in 2004 Source: published SPECInt data Moore s Law is not at all

More information

NVIDIA DESIGNWORKS Ankit Patel - Prerna Dogra -

NVIDIA DESIGNWORKS Ankit Patel - Prerna Dogra - NVIDIA DESIGNWORKS Ankit Patel - ankitp@nvidia.com Prerna Dogra - pdogra@nvidia.com 1 Autonomous Driving Deep Learning Visual Effects Virtual Desktops Visual Computing is our singular mission Gaming Product

More information

CME 213 S PRING Eric Darve

CME 213 S PRING Eric Darve CME 213 S PRING 2017 Eric Darve Summary of previous lectures Pthreads: low-level multi-threaded programming OpenMP: simplified interface based on #pragma, adapted to scientific computing OpenMP for and

More information

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent

More information

CS427 Multicore Architecture and Parallel Computing

CS427 Multicore Architecture and Parallel Computing CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:

More information

Interconnect Your Future

Interconnect Your Future Interconnect Your Future Smart Interconnect for Next Generation HPC Platforms Gilad Shainer, August 2016, 4th Annual MVAPICH User Group (MUG) Meeting Mellanox Connects the World s Fastest Supercomputer

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware

More information

An Evaluation of Unified Memory Technology on NVIDIA GPUs

An Evaluation of Unified Memory Technology on NVIDIA GPUs An Evaluation of Unified Memory Technology on NVIDIA GPUs Wenqiang Li 1, Guanghao Jin 2, Xuewen Cui 1, Simon See 1,3 Center for High Performance Computing, Shanghai Jiao Tong University, China 1 Tokyo

More information

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers Overlapping Computation and Communication for Advection on Hybrid Parallel Computers James B White III (Trey) trey@ucar.edu National Center for Atmospheric Research Jack Dongarra dongarra@eecs.utk.edu

More information

Algorithm Engineering Lab: Ray Tracing. 8. Februar 2018

Algorithm Engineering Lab: Ray Tracing. 8. Februar 2018 Algorithm Engineering Lab: Ray Tracing Jenette Sellin Markus Pawellek 8. Februar 2018 Gliederung Goal of the Project Ray Tracing Background Starting Point Setting up the Environment Implementation Serialization

More information

Real-Time Voxelization for Global Illumination

Real-Time Voxelization for Global Illumination Lecture 26: Real-Time Voxelization for Global Illumination Visual Computing Systems Voxelization to regular grid Input: scene triangles Output: surface information at each voxel in 3D grid - Simple case:

More information

Illinois Proposal Considerations Greg Bauer

Illinois Proposal Considerations Greg Bauer - 2016 Greg Bauer Support model Blue Waters provides traditional Partner Consulting as part of its User Services. Standard service requests for assistance with porting, debugging, allocation issues, and

More information

CS Computer Graphics: Introduction to Ray Tracing

CS Computer Graphics: Introduction to Ray Tracing CS 543 - Computer Graphics: Introduction to Ray Tracing by Robert W. Lindeman gogo@wpi.edu (with help from Peter Lohrmann ;-) View Volume View volume similar to gluperspective Angle Aspect Near? Far? But

More information

CS Computer Graphics: Introduction to Ray Tracing

CS Computer Graphics: Introduction to Ray Tracing CS 543 - Computer Graphics: Introduction to Ray Tracing by Robert W. Lindeman gogo@wpi.edu (with help from Peter Lohrmann ;-) View Volume View volume similar to gluperspective Angle Aspect Near? Far? But

More information

Parallel Programming on Larrabee. Tim Foley Intel Corp

Parallel Programming on Larrabee. Tim Foley Intel Corp Parallel Programming on Larrabee Tim Foley Intel Corp Motivation This morning we talked about abstractions A mental model for GPU architectures Parallel programming models Particular tools and APIs This

More information

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,

More information

Shadows. COMP 575/770 Spring 2013

Shadows. COMP 575/770 Spring 2013 Shadows COMP 575/770 Spring 2013 Shadows in Ray Tracing Shadows are important for realism Basic idea: figure out whether a point on an object is illuminated by a light source Easy for ray tracers Just

More information

Octree-Based Sparse Voxelization for Real-Time Global Illumination. Cyril Crassin NVIDIA Research

Octree-Based Sparse Voxelization for Real-Time Global Illumination. Cyril Crassin NVIDIA Research Octree-Based Sparse Voxelization for Real-Time Global Illumination Cyril Crassin NVIDIA Research Voxel representations Crane et al. (NVIDIA) 2007 Allard et al. 2010 Christensen and Batali (Pixar) 2004

More information

Kepler Overview Mark Ebersole

Kepler Overview Mark Ebersole Kepler Overview Mark Ebersole TFLOPS TFLOPS 3x Performance in a Single Generation 3.5 3 2.5 2 1.5 1 0.5 0 1.25 1 Single Precision FLOPS (SGEMM) 2.90 TFLOPS.89 TFLOPS.36 TFLOPS Xeon E5-2690 Tesla M2090

More information

Spring 2009 Prof. Hyesoon Kim

Spring 2009 Prof. Hyesoon Kim Spring 2009 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

A More Realistic Way of Stressing the End-to-end I/O System

A More Realistic Way of Stressing the End-to-end I/O System A More Realistic Way of Stressing the End-to-end I/O System Verónica G. Vergara Larrea Sarp Oral Dustin Leverman Hai Ah Nam Feiyi Wang James Simmons CUG 2015 April 29, 2015 Chicago, IL ORNL is managed

More information

Splotch: High Performance Visualization using MPI, OpenMP and CUDA

Splotch: High Performance Visualization using MPI, OpenMP and CUDA Splotch: High Performance Visualization using MPI, OpenMP and CUDA Klaus Dolag (Munich University Observatory) Martin Reinecke (MPA, Garching) Claudio Gheller (CSCS, Switzerland), Marzia Rivi (CINECA,

More information

Accelerating Realism with the (NVIDIA Scene Graph)

Accelerating Realism with the (NVIDIA Scene Graph) Accelerating Realism with the (NVIDIA Scene Graph) Holger Kunz Manager, Workstation Middleware Development Phillip Miller Director, Workstation Middleware Product Management NVIDIA application acceleration

More information

IBM Power AC922 Server

IBM Power AC922 Server IBM Power AC922 Server The Best Server for Enterprise AI Highlights More accuracy - GPUs access system RAM for larger models Faster insights - significant deep learning speedups Rapid deployment - integrated

More information

Real-Time Reyes: Programmable Pipelines and Research Challenges. Anjul Patney University of California, Davis

Real-Time Reyes: Programmable Pipelines and Research Challenges. Anjul Patney University of California, Davis Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney and John D. Owens SIGGRAPH Asia

More information

Using AWS EC2 GPU Instances for Computational Microscopy and Biomolecular Simulation

Using AWS EC2 GPU Instances for Computational Microscopy and Biomolecular Simulation Using AWS EC2 GPU Instances for Computational Microscopy and Biomolecular Simulation John E. Stone Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University

More information

Allowing Users to Run Services at the OLCF with Kubernetes

Allowing Users to Run Services at the OLCF with Kubernetes Allowing Users to Run Services at the OLCF with Kubernetes Jason Kincl Senior HPC Systems Engineer Ryan Adamson Senior HPC Security Engineer This work was supported by the Oak Ridge Leadership Computing

More information

Preparing Scientific Software for Exascale

Preparing Scientific Software for Exascale Preparing Scientific Software for Exascale Jack Wells Director of Science Oak Ridge Leadership Computing Facility Oak Ridge National Laboratory Mini-Symposium on Scientific Software Engineering Monday,

More information

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside CS230 : Computer Graphics Lecture 4 Tamar Shinar Computer Science & Engineering UC Riverside Shadows Shadows for each pixel do compute viewing ray if ( ray hits an object with t in [0, inf] ) then compute

More information

6.837 Introduction to Computer Graphics Final Exam Tuesday, December 20, :05-12pm Two hand-written sheet of notes (4 pages) allowed 1 SSD [ /17]

6.837 Introduction to Computer Graphics Final Exam Tuesday, December 20, :05-12pm Two hand-written sheet of notes (4 pages) allowed 1 SSD [ /17] 6.837 Introduction to Computer Graphics Final Exam Tuesday, December 20, 2011 9:05-12pm Two hand-written sheet of notes (4 pages) allowed NAME: 1 / 17 2 / 12 3 / 35 4 / 8 5 / 18 Total / 90 1 SSD [ /17]

More information

frame buffer depth buffer stencil buffer

frame buffer depth buffer stencil buffer Final Project Proposals Programmable GPUS You should all have received an email with feedback Just about everyone was told: Test cases weren t detailed enough Project was possibly too big Motivation could

More information

Philip C. Roth. Computer Science and Mathematics Division Oak Ridge National Laboratory

Philip C. Roth. Computer Science and Mathematics Division Oak Ridge National Laboratory Philip C. Roth Computer Science and Mathematics Division Oak Ridge National Laboratory A Tree-Based Overlay Network (TBON) like MRNet provides scalable infrastructure for tools and applications MRNet's

More information

GTC 2017 S7672. OpenACC Best Practices: Accelerating the C++ NUMECA FINE/Open CFD Solver

GTC 2017 S7672. OpenACC Best Practices: Accelerating the C++ NUMECA FINE/Open CFD Solver David Gutzwiller, NUMECA USA (david.gutzwiller@numeca.com) Dr. Ravi Srinivasan, Dresser-Rand Alain Demeulenaere, NUMECA USA 5/9/2017 GTC 2017 S7672 OpenACC Best Practices: Accelerating the C++ NUMECA FINE/Open

More information

The Traditional Graphics Pipeline

The Traditional Graphics Pipeline Last Time? The Traditional Graphics Pipeline Participating Media Measuring BRDFs 3D Digitizing & Scattering BSSRDFs Monte Carlo Simulation Dipole Approximation Today Ray Casting / Tracing Advantages? Ray

More information

Shifter: Fast and consistent HPC workflows using containers

Shifter: Fast and consistent HPC workflows using containers Shifter: Fast and consistent HPC workflows using containers CUG 2017, Redmond, Washington Lucas Benedicic, Felipe A. Cruz, Thomas C. Schulthess - CSCS May 11, 2017 Outline 1. Overview 2. Docker 3. Shifter

More information

Headline in Arial Bold 30pt. Visualisation using the Grid Jeff Adie Principal Systems Engineer, SAPK July 2008

Headline in Arial Bold 30pt. Visualisation using the Grid Jeff Adie Principal Systems Engineer, SAPK July 2008 Headline in Arial Bold 30pt Visualisation using the Grid Jeff Adie Principal Systems Engineer, SAPK July 2008 Agenda Visualisation Today User Trends Technology Trends Grid Viz Nodes Software Ecosystem

More information

Computing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany

Computing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany Computing on GPUs Prof. Dr. Uli Göhner DYNAmore GmbH Stuttgart, Germany Summary: The increasing power of GPUs has led to the intent to transfer computing load from CPUs to GPUs. A first example has been

More information