Interactive HPC: Large Scale In-Situ Visualization Using NVIDIA Index in ALYA MultiPhysics
|
|
- Vernon Daniel
- 6 years ago
- Views:
Transcription
1 Interactive HPC: Large Scale In-Situ Visualization Using NVIDIA Index in ALYA MultiPhysics Christopher Lux (NV), Vishal Mehta (BSC) and Marc Nienhaus (NV) May 8 th 2017
2 Barcelona Supercomputing Center Marenostrum PetaFlop/s General Purpose Computing 3400 nodes of Xeon, 11 PF/s Emerging Technologies Power 9 + Pascal 1.5 PF/s Knights Landing and Knights Hill 0.5 PF/s 64bit ARMv8 0.5 PF/s 2
3 COMPUTER SCIENCES To influence the way machines are built, programmed and used: programming models, performance tools, Big Data, computer architecture, energy efficiency LIFE SCIENCES To understand living organisms by means of theoretical and computational methods (molecular modeling, genomics, proteomics) Research at BSC EARTH SCIENCES To develop and implement global and regional stateof-the-art models for shortterm air quality forecast and long-term climate applications CASE To develop scientific and engineering software to efficiently exploit supercomputing capabilities (biomedical, geophysics, atmospheric, energy, social and economic simulations) 3
4 ALYA System: Large Scale Computational Mechanics 4
5 5
6 ALYA HPC Context 6
7 ALYA HPC Context 7
8 ALYA RED 8
9 Computational Cardiac Model Applications Pacemaker applications Computational analysis of malfunctioning tissue patches Computational Drug testing on cardiac tissue. 9
10 Computational Cardiac Model 10
11 Pacemaker Application 11
12 Pacemaker Application 12
13 Computational Drug Testing 13
14 INTERACTIVE HPC: LARGE SCALE IN-SITU VISUALIZATION USING NVIDIA INDEX IN ALYA MULTIPHYSICS Christopher Lux (NV), Vishal Mehta (BSC) and Marc Nienhaus (NV) May 8 th 2017
15 NVIDIA INDEX Scalable, Interactive Visual Computing GPU-cluster aware solution High-quality and scalable visualization of large-scale datasets In-situ visualization Commercial software Available and deployed in production 15
16 SCALABILITY AS ENABLER NVIDIA HPC Clusters NVIDIA Quadro VCA or DGX-1 NVIDIA Quadro Workstation Performance, dataset size, number of pixels, visual quality 16
17 Scientific Data Visualization 17
18 Time-Varying Data Visualization 18
19 Time-Varying Data Visualization Simulation data source: A Numerical Study of High-Pressure Oxygen/Methane Mixing and Combustion of a Shear Coaxial Injector, Nan Zong & Vigor Yang, AIAA
20 In-Trans and In-Situ Visualization 20
21 Computational Heart 21
22 BACKGROUND 22
23 DISTRIBUTED PARALLEL RENDERING Sort-Last Rendering (multi-gpu) [..] [..] * image compositing 23
24 DISTRIBUTED PARALLEL RENDERING Sort-Last Rendering (Cluster of multi-gpu Nodes) Cluster of VCAs [..] [..] * image compositing 24
25 DISTRIBUTED DATATYPES Various Application Domains Volume datatypes Regular Sparse Unstructured/Irregular Surface-geometry datatypes Height field Triangle mesh 25
26 IN-TRANS AND IN-SITU VISUALIZATION 26
27 TRADITIONAL VISUALIZATION PIPELINE Simulation Cluster Data Storage e.g. Unstructured Data NVIDIA IndeX Visualization Cluster 5/15/2 27
28 TIME-SERIES DATA VISUALIZATION Visualize Pre-calculated ALYA Simulation Results Visualize and Animate Stream Interact and Explore Terabyte time-varying simulation data of nasal system 28
29 IN-SITU (IN-TRANS) VISUALIZATION PIPELINE Unstructured Data Unstructured Data Unstructured Data Simulation Cluster Network NVIDIA IndeX Visualization Cluster 29
30 IN-SITU VISUALIZATION PIPELINE Combined Simulation and Visualization Cluster 5/15/2 30
31 IN-SITU/IN-TRANS SUPPORT Compute Result Integration Parallel jobs executed locally or remotely Direct access to local host and device data Fast RDMA memory transfers User-defined affinity and spatial subdivision Application-driven updates Push updated data when ready Rendering-driven updates Request computation updates for active data Clustered neuron activity 31
32 IN-TRANS RESULT TRANSFERS Fast Data Transfers to Rendering Nodes/GPUs Page-Locked System Memory RDMA (over InfiniBand) Page-Locked System Memory CUDA GPU Memory CUDA GPU Memory GPUDirect RDMA NVLink high-speed interconnect between system memory and GPU (IBM and NVIDIA) 32
33 COMPUTATIONAL HEART In-Situ Simulation and Visualization Simulate/Compute (ALYA) Visualize Parameterize and Steer Interact and Explore 33
34 NVIDIA INDEX AVAILABILITY 34
35 NVIDIA INDEX 1.4 In-Situ/In-Trans Visualization Support Support of 32bit, 16bit, 8bit fixed point, 32bit floating point and RGBA (8bit) regular volumes Dynamic streaming and GPU caching of time-varying volume data NVIDIA IndeX 1.4 (released 07/2016) Irregular volumes and sparse volumes Built-in volume shading capabilities Multi-view capabilities Zero-copy RDMA/GPUDirect compute integration infrastructure User-defined affinity and spatial subdivision support Architecture and API for in-situ/in-trans visualization (compute integration) Dynamic workload balancing Advanced CUDA memory management, error handling and logging MPI/NVIDIA IndeX interprocess coupling (CUDA IPC and shared memory) 35
36 NVIDIA INDEX 1.5 OUTLOOK User-defined Rendering Kernel Components 36
37 NVIDIA INDEX 1.5 OUTLOOK User-defined Rendering Kernel Components 37
38 IN-SITU VISUALIZATIONS 38
39 Challenges Simulation Index Rendering MPI Parallel Operations Maintain frame rates Steering Simulation MPI Unstructured Mesh Uneven Spatial partitions Balanced computations MPI.. MPI Data Affinity In-trans approach MPI.. Cubical Scene region Even spatial partitions MPI Balanced rendering load 39
40 Multi-code Coupling in ALYA All spatial interpolation on spatial domain in structured and unstructured meshes Allows setting send and receive frequencies to synchronize simulation times. Allows coupling with third party codes Parallel and Asynchronous MPI coupling 40
41 Ingredients of Coupling in ALYA mpirun -np 8 Alya.x fluid : -np 4 Test.x : -np 4 Alya.x solidz WHAT The underlying variables WHERE Surface, Volume, etc. WHEN Time step, iteration step HOW Algorithmic interpolation 41
42 Coupling for IN-SITU Visualizations Allows optimizing resources for compute and render Simulation Index Rendering Application Driven Updates, push simulation data. MPI MPI Allows inter-operability between coarser and finer meshes, adjusting data updates. Maintains high frame rates and allows interaction with the volume. Can couple multiple physics apps to a single rendering app. MPI.. MPI MPI.. MPI 42
43 Steering Simulations Steering is Application Specific Coupling Steering simulations requires handling interrupts. Interrupt communicated through backward coupling. Simulation Time S1 Rendering Time S1 Interrupt handler A general approach by scalar/vector interrupts, and user defined function to handle the variables of simulation. Function applied to fields Time S2 Time S3.. Time S2 Time S3.. 43
44 Summary: In-situ Visualization Index enables better insights into simulation data through professional visualization techniques Scalability is the enabler for HPC in-situ visualization. Multiphysics coupling is the key to scalability, and resource management for in-situ. 44
45 SELF-PACED LABS Interactive HPC Volume Visualization in ParaView NVIDIA IndeX for ParaView plugin hands-on Location: Self-paced lab area on lower level Dates: Monday 1:00 5:00pm Tuesday 9: am Wednesday 1:00 5:00pm 45
46 INTERACTIVE DEMO Interactiver HPC: Large Scale In-Situ Visualization using NVIDIA IndeX in ALYA MultiPhysics Live demonstration of in-situ visualization Interactive steering of simulation parameters Location: NVIDIA demo-booth in exhibit hall 1 46
47 Christopher Lux, NVIDIA Vishal Mehta, BSC Marc Nienhaus, NVIDA
Barcelona Supercomputing Center
www.bsc.es Barcelona Supercomputing Center Centro Nacional de Supercomputación EMIT 2016. Barcelona June 2 nd, 2016 Barcelona Supercomputing Center Centro Nacional de Supercomputación BSC-CNS objectives:
More informationEU Research Infra Integration: a vision from the BSC. Josep M. Martorell, PhD Associate Director
EU Research Infra Integration: a vision from the BSC Josep M. Martorell, PhD Associate Director 11/2017 Ideas on 3 topics: 1. The BSC as a Research Infrastructure 2. The added-value of an European RI for
More informationBridging the Gap Between High Quality and High Performance for HPC Visualization
Bridging the Gap Between High Quality and High Performance for HPC Visualization Rob Sisneros National Center for Supercomputing Applications University of Illinois at Urbana Champaign Outline Why am I
More informationMulti-Physics Multi-Code Coupling On Supercomputers
Multi-Physics Multi-Code Coupling On Supercomputers J.C. Cajas 1, G. Houzeaux 1, M. Zavala 1, M. Vázquez 1, B. Uekermann 2, B. Gatzhammer 2, M. Mehl 2, Y. Fournier 3, C. Moulinec 4 1) er, Edificio NEXUS
More informationMICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE
MICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE LEVERAGE OUR EXPERTISE sales@microway.com http://microway.com/tesla NUMBERSMASHER TESLA 4-GPU SERVER/WORKSTATION Flexible form factor 4 PCI-E GPUs + 3 additional
More informationAdministrivia. Administrivia. Administrivia. CIS 565: GPU Programming and Architecture. Meeting
CIS 565: GPU Programming and Architecture Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider and Patrick Cozzi Meeting Monday and Wednesday 6:00 7:30pm Moore 212 Recorded lectures upon
More informationSharing High-Performance Devices Across Multiple Virtual Machines
Sharing High-Performance Devices Across Multiple Virtual Machines Preamble What does sharing devices across multiple virtual machines in our title mean? How is it different from virtual networking / NSX,
More informationMachine Learning In A Snap. Thomas Parnell Research Staff Member IBM Research - Zurich
Machine Learning In A Snap Thomas Parnell Research Staff Member IBM Research - Zurich What are GLMs? Ridge Regression Support Vector Machines Regression Generalized Linear Models Classification Lasso Regression
More informationS THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE. Presenter: Louis Capps, Solution Architect, NVIDIA,
S7750 - THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE Presenter: Louis Capps, Solution Architect, NVIDIA, lcapps@nvidia.com A TALE OF ENLIGHTENMENT Basic OK List 10 for x = 1 to 3 20 print
More informationSCIENTIFIC VISUALIZATION ON GPU CLUSTERS PETER MESSMER, NVIDIA
SCIENTIFIC VISUALIZATION ON GPU CLUSTERS PETER MESSMER, NVIDIA Visualization Rendering Visualization Isosurfaces, Isovolumes Field Operators (Gradient, Curl,.. ) Coordinate transformations Feature extraction
More informationJapan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS
Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS HPC User Forum, 7 th September, 2016 Outline of Talk Introduction of FLAGSHIP2020 project An Overview of post K system Concluding Remarks
More informationInterconnect Your Future
Interconnect Your Future Smart Interconnect for Next Generation HPC Platforms Gilad Shainer, August 2016, 4th Annual MVAPICH User Group (MUG) Meeting Mellanox Connects the World s Fastest Supercomputer
More informationHETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA
HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS
More informationHARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA
HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES Cliff Woolley, NVIDIA PREFACE This talk presents a case study of extracting parallelism in the UMT2013 benchmark for 3D unstructured-mesh
More informationOPTIMIZING HPC SIMULATION AND VISUALIZATION CODE USING NVIDIA NSIGHT SYSTEMS
OPTIMIZING HPC SIMULATION AND VISUALIZATION CODE USING NVIDIA NSIGHT SYSTEMS Daniel Horowitz Director of Platform Developer Tools, NVIDIA, Robert (Bob) Knight Principal System Software Engineer, NVIDIA
More informationALYA Multi-Physics System on GPUs: Offloading Large-Scale Computational Mechanics Problems
www.bsc.es ALYA Multi-Physics System on GPUs: Offloading Large-Scale Computational Mechanics Problems Vishal Mehta Engineer, Barcelona Supercomputing Center vishal.mehta@bsc.es Training BSC/UPC GPU Centre
More informationJohannes Günther, Senior Graphics Software Engineer. Intel Data Center Group, HPC Visualization
Johannes Günther, Senior Graphics Software Engineer Intel Data Center Group, HPC Visualization Data set provided by Florida International University: Simulated fluid flow through a porous medium Large
More informationOpenMPSuperscalar: Task-Parallel Simulation and Visualization of Crowds with Several CPUs and GPUs
www.bsc.es OpenMPSuperscalar: Task-Parallel Simulation and Visualization of Crowds with Several CPUs and GPUs Hugo Pérez UPC-BSC Benjamin Hernandez Oak Ridge National Lab Isaac Rudomin BSC March 2015 OUTLINE
More informationSplotch: High Performance Visualization using MPI, OpenMP and CUDA
Splotch: High Performance Visualization using MPI, OpenMP and CUDA Klaus Dolag (Munich University Observatory) Martin Reinecke (MPA, Garching) Claudio Gheller (CSCS, Switzerland), Marzia Rivi (CINECA,
More informationInterconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017
Interconnect Your Future Enabling the Best Datacenter Return on Investment TOP500 Supercomputers, November 2017 InfiniBand Accelerates Majority of New Systems on TOP500 InfiniBand connects 77% of new HPC
More informationIntroduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29
Introduction CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction Spring 2018 1 / 29 Outline 1 Preface Course Details Course Requirements 2 Background Definitions
More informationLecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, basic tasks, data types 3 Introduction to D3, basic vis
Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, basic tasks, data types 3 Introduction to D3, basic vis techniques for non-spatial data Project #1 out 4 Data
More informationThe Future of Interconnect Technology
The Future of Interconnect Technology Michael Kagan, CTO HPC Advisory Council Stanford, 2014 Exponential Data Growth Best Interconnect Required 44X 0.8 Zetabyte 2009 35 Zetabyte 2020 2014 Mellanox Technologies
More informationChallenges in large-scale graph processing on HPC platforms and the Graph500 benchmark. by Nkemdirim Dockery
Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark by Nkemdirim Dockery High Performance Computing Workloads Core-memory sized Floating point intensive Well-structured
More informationBest GPU Code Practices Combining OpenACC, CUDA, and OmpSs
www.bsc.es Best GPU Code Practices Combining OpenACC, CUDA, and OmpSs Pau Farré Antonio J. Peña Munich, Oct. 12 2017 PROLOGUE Barcelona Supercomputing Center Marenostrum 4 13.7 PetaFlop/s General Purpose
More informationExploiting CUDA Dynamic Parallelism for low power ARM based prototypes
www.bsc.es Exploiting CUDA Dynamic Parallelism for low power ARM based prototypes Vishal Mehta Engineer, Barcelona Supercomputing Center vishal.mehta@bsc.es BSC/UPC CUDA Centre of Excellence (CCOE) Training
More informationGPU Architecture. Alan Gray EPCC The University of Edinburgh
GPU Architecture Alan Gray EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? Architectural reasons for accelerator performance advantages Latest GPU Products From
More informationEnabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters
Enabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters Presentation at GTC 2014 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda
More informationHPC Resources & Training
www.bsc.es HPC Resources & Training in the BSC, the RES and PRACE Montse González Ferreiro RES technical and training coordinator + Facilities + Capacity How fit together the BSC, the RES and PRACE? TIER
More informationLecture Topic Projects
Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4 Data reduction, similarity & distance, data augmentation
More informationMELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구
MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio
More informationDamaris. In-Situ Data Analysis and Visualization for Large-Scale HPC Simulations. KerData Team. Inria Rennes,
Damaris In-Situ Data Analysis and Visualization for Large-Scale HPC Simulations KerData Team Inria Rennes, http://damaris.gforge.inria.fr Outline 1. From I/O to in-situ visualization 2. Damaris approach
More informationBuilding the Most Efficient Machine Learning System
Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide
More informationOptimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink
Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink Rajesh Bordawekar IBM T. J. Watson Research Center bordaw@us.ibm.com Pidad D Souza IBM Systems pidsouza@in.ibm.com 1 Outline
More informationTowards an HPC tool for simulation of 3D CSEM surveys: and edge-based approach
www.bsc.es Towards an HPC tool for simulation of 3D CSEM surveys: and edge-based approach Octavio Castillo Reyes Computer Applications in Science & Engineering Department My country 2 Scholarship CONACyT
More informationShort Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy
Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy François Tessier, Venkatram Vishwanath Argonne National Laboratory, USA July 19,
More informationCafeGPI. Single-Sided Communication for Scalable Deep Learning
CafeGPI Single-Sided Communication for Scalable Deep Learning Janis Keuper itwm.fraunhofer.de/ml Competence Center High Performance Computing Fraunhofer ITWM, Kaiserslautern, Germany Deep Neural Networks
More informationDistributing Computation to Large GPU Clusters
Distributing Computation to Large GPU Clusters What is this about? DiCE: Software library for writing applications scaling to many GPUs and CPUs in a cluster What is this about? DiCE: Software library
More informationInfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014
InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment TOP500 Supercomputers, June 2014 TOP500 Performance Trends 38% CAGR 78% CAGR Explosive high-performance
More informationNVIDIA COLLECTIVE COMMUNICATION LIBRARY (NCCL)
NVIDIA COLLECTIVE COMMUNICATION LIBRARY (NCCL) RN-08645-000_v01 September 2018 Release Notes TABLE OF CONTENTS Chapter Chapter Chapter Chapter Chapter Chapter Chapter Chapter Chapter Chapter 1. NCCL Overview...1
More informationAlgorithms, System and Data Centre Optimisation for Energy Efficient HPC
2015-09-14 Algorithms, System and Data Centre Optimisation for Energy Efficient HPC Vincent Heuveline URZ Computing Centre of Heidelberg University EMCL Engineering Mathematics and Computing Lab 1 Energy
More informationThe Cray Rainier System: Integrated Scalar/Vector Computing
THE SUPERCOMPUTER COMPANY The Cray Rainier System: Integrated Scalar/Vector Computing Per Nyberg 11 th ECMWF Workshop on HPC in Meteorology Topics Current Product Overview Cray Technology Strengths Rainier
More informationMulti-Frame Rate Rendering for Standalone Graphics Systems
Multi-Frame Rate Rendering for Standalone Graphics Systems Jan P. Springer Stephan Beck Felix Weiszig Bernd Fröhlich Bauhaus-Universität Weimar Abstract: Multi-frame rate rendering is a technique for improving
More informationHiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.
HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes Ian Glendinning Outline NVIDIA GPU cards CUDA & OpenCL Parallel Implementation
More informationIHK/McKernel: A Lightweight Multi-kernel Operating System for Extreme-Scale Supercomputing
: A Lightweight Multi-kernel Operating System for Extreme-Scale Supercomputing Balazs Gerofi Exascale System Software Team, RIKEN Center for Computational Science 218/Nov/15 SC 18 Intel Extreme Computing
More informationCUDA. Matthew Joyner, Jeremy Williams
CUDA Matthew Joyner, Jeremy Williams Agenda What is CUDA? CUDA GPU Architecture CPU/GPU Communication Coding in CUDA Use cases of CUDA Comparison to OpenCL What is CUDA? What is CUDA? CUDA is a parallel
More informationIBM Power AC922 Server
IBM Power AC922 Server The Best Server for Enterprise AI Highlights More accuracy - GPUs access system RAM for larger models Faster insights - significant deep learning speedups Rapid deployment - integrated
More informationINTEL HPC DEVELOPER CONFERENCE FUEL YOUR INSIGHT
INTEL HPC DEVELOPER CONFERENCE FUEL YOUR INSIGHT INTEL HPC DEVELOPER CONFERENCE FUEL YOUR INSIGHT UPDATE ON OPENSWR: A SCALABLE HIGH- PERFORMANCE SOFTWARE RASTERIZER FOR SCIVIS Jefferson Amstutz Intel
More informationChapter 3 Parallel Software
Chapter 3 Parallel Software Part I. Preliminaries Chapter 1. What Is Parallel Computing? Chapter 2. Parallel Hardware Chapter 3. Parallel Software Chapter 4. Parallel Applications Chapter 5. Supercomputers
More informationCache Capacity Aware Thread Scheduling for Irregular Memory Access on Many-Core GPGPUs
Cache Capacity Aware Thread Scheduling for Irregular Memory Access on Many-Core GPGPUs Hsien-Kai Kuo, Ta-Kan Yen, Bo-Cheng Charles Lai and Jing-Yang Jou Department of Electronics Engineering National Chiao
More informationIntegrating Analysis and Computation with Trios Services
October 31, 2012 Integrating Analysis and Computation with Trios Services Approved for Public Release: SAND2012-9323P Ron A. Oldfield Scalable System Software Sandia National Laboratories Albuquerque,
More informationLarge scale Imaging on Current Many- Core Platforms
Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,
More informationPer-Pixel Lighting and Bump Mapping with the NVIDIA Shading Rasterizer
Per-Pixel Lighting and Bump Mapping with the NVIDIA Shading Rasterizer Executive Summary The NVIDIA Quadro2 line of workstation graphics solutions is the first of its kind to feature hardware support for
More informationArchitecture Conscious Data Mining. Srinivasan Parthasarathy Data Mining Research Lab Ohio State University
Architecture Conscious Data Mining Srinivasan Parthasarathy Data Mining Research Lab Ohio State University KDD & Next Generation Challenges KDD is an iterative and interactive process the goal of which
More informationAccelerating Realism with the (NVIDIA Scene Graph)
Accelerating Realism with the (NVIDIA Scene Graph) Holger Kunz Manager, Workstation Middleware Development Phillip Miller Director, Workstation Middleware Product Management NVIDIA application acceleration
More informationPortland State University ECE 588/688. Graphics Processors
Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly
More informationParallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)
Lecture 2: Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Visual Computing Systems Analyzing a 3D Graphics Workload Where is most of the work done? Memory Vertex
More informationAUTOMATIC SMT THREADING
AUTOMATIC SMT THREADING FOR OPENMP APPLICATIONS ON THE INTEL XEON PHI CO-PROCESSOR WIM HEIRMAN 1,2 TREVOR E. CARLSON 1 KENZO VAN CRAEYNEST 1 IBRAHIM HUR 2 AAMER JALEEL 2 LIEVEN EECKHOUT 1 1 GHENT UNIVERSITY
More informationMay 8-11, 2017 Silicon Valley CUDA 9 AND BEYOND. Mark Harris, May 10, 2017
May 8-11, 2017 Silicon Valley CUDA 9 AND BEYOND Mark Harris, May 10, 2017 INTRODUCING CUDA 9 BUILT FOR VOLTA FASTER LIBRARIES Tesla V100 New GPU Architecture Tensor Cores NVLink Independent Thread Scheduling
More informationOverview and Introduction to Scientific Visualization. Texas Advanced Computing Center The University of Texas at Austin
Overview and Introduction to Scientific Visualization Texas Advanced Computing Center The University of Texas at Austin Scientific Visualization The purpose of computing is insight not numbers. -- R. W.
More informationHarp-DAAL for High Performance Big Data Computing
Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big
More informationResults from TSUBAME3.0 A 47 AI- PFLOPS System for HPC & AI Convergence
Results from TSUBAME3.0 A 47 AI- PFLOPS System for HPC & AI Convergence Jens Domke Research Staff at MATSUOKA Laboratory GSIC, Tokyo Institute of Technology, Japan Omni-Path User Group 2017/11/14 Denver,
More informationHybrid Implementation of 3D Kirchhoff Migration
Hybrid Implementation of 3D Kirchhoff Migration Max Grossman, Mauricio Araya-Polo, Gladys Gonzalez GTC, San Jose March 19, 2013 Agenda 1. Motivation 2. The Problem at Hand 3. Solution Strategy 4. GPU Implementation
More informationIntroduction to Visualization on Stampede
Introduction to Visualization on Stampede Aaron Birkland Cornell CAC With contributions from TACC visualization training materials Parallel Computing on Stampede June 11, 2013 From data to Insight Data
More informationHybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS
+ Hybrid Computing @ KAUST Many Cores and OpenACC Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Agenda Hybrid Computing n Hybrid Computing n From Multi-Physics
More informationCSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University
CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand
More informationPerformance Benefits of NVIDIA GPUs for LS-DYNA
Performance Benefits of NVIDIA GPUs for LS-DYNA Mr. Stan Posey and Dr. Srinivas Kodiyalam NVIDIA Corporation, Santa Clara, CA, USA Summary: This work examines the performance characteristics of LS-DYNA
More informationHeadline in Arial Bold 30pt. Visualisation using the Grid Jeff Adie Principal Systems Engineer, SAPK July 2008
Headline in Arial Bold 30pt Visualisation using the Grid Jeff Adie Principal Systems Engineer, SAPK July 2008 Agenda Visualisation Today User Trends Technology Trends Grid Viz Nodes Software Ecosystem
More informationGPGPU LAB. Case study: Finite-Difference Time- Domain Method on CUDA
GPGPU LAB Case study: Finite-Difference Time- Domain Method on CUDA Ana Balevic IPVS 1 Finite-Difference Time-Domain Method Numerical computation of solutions to partial differential equations Explicit
More informationIndustrial achievements on Blue Waters using CPUs and GPUs
Industrial achievements on Blue Waters using CPUs and GPUs HPC User Forum, September 17, 2014 Seattle Seid Korić PhD Technical Program Manager Associate Adjunct Professor koric@illinois.edu Think Big!
More informationScalable Cluster Computing with NVIDIA GPUs Axel Koehler NVIDIA. NVIDIA Corporation 2012
Scalable Cluster Computing with NVIDIA GPUs Axel Koehler NVIDIA Outline Introduction to Multi-GPU Programming Communication for Single Host, Multiple GPUs Communication for Multiple Hosts, Multiple GPUs
More informationIntroduction. HPC Fall 2007 Prof. Robert van Engelen
Introduction HPC Fall 2007 Prof. Robert van Engelen Syllabus Title: High Performance Computing (ISC5935-1 and CIS5930-13) Classes: Tuesday and Thursday 2:00PM to 3:15PM in 152 DSL Evaluation: projects
More informationDistributed Virtual Reality Computation
Jeff Russell 4/15/05 Distributed Virtual Reality Computation Introduction Virtual Reality is generally understood today to mean the combination of digitally generated graphics, sound, and input. The goal
More informationMost real programs operate somewhere between task and data parallelism. Our solution also lies in this set.
for Windows Azure and HPC Cluster 1. Introduction In parallel computing systems computations are executed simultaneously, wholly or in part. This approach is based on the partitioning of a big task into
More informationOperational Robustness of Accelerator Aware MPI
Operational Robustness of Accelerator Aware MPI Sadaf Alam Swiss National Supercomputing Centre (CSSC) Switzerland 2nd Annual MVAPICH User Group (MUG) Meeting, 2014 Computing Systems @ CSCS http://www.cscs.ch/computers
More informationBuilding the Most Efficient Machine Learning System
Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide
More informationPedraforca: a First ARM + GPU Cluster for HPC
www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu
More informationHPC Application Porting to CUDA at BSC
www.bsc.es HPC Application Porting to CUDA at BSC Pau Farré, Marc Jordà GTC 2016 - San Jose Agenda WARIS-Transport Atmospheric volcanic ash transport simulation Computer Applications department PELE Protein-drug
More informationNVIDIA Update and Directions on GPU Acceleration for Earth System Models
NVIDIA Update and Directions on GPU Acceleration for Earth System Models Stan Posey, HPC Program Manager, ESM and CFD, NVIDIA, Santa Clara, CA, USA Carl Ponder, PhD, Applications Software Engineer, NVIDIA,
More informationComplex Systems Simulations on the GPU
Complex Systems Simulations on the GPU Dr Paul Richmond Talk delivered by Peter Heywood University of Sheffield EMIT2015 Overview Complex Systems A Framework for Modelling Agents Benchmarking and Application
More informationMPI RUNTIMES AT JSC, NOW AND IN THE FUTURE
, NOW AND IN THE FUTURE Which, why and how do they compare in our systems? 08.07.2018 I MUG 18, COLUMBUS (OH) I DAMIAN ALVAREZ Outline FZJ mission JSC s role JSC s vision for Exascale-era computing JSC
More informationCSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller
Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,
More informationInteractive Remote Large-Scale Data Visualization via Prioritized Multi-resolution Streaming
Interactive Remote Large-Scale Data Visualization via Prioritized Multi-resolution Streaming Jon Woodring, Los Alamos National Laboratory James P. Ahrens 1, Jonathan Woodring 1, David E. DeMarle 2, John
More informationOptimizing an Earth Science Atmospheric Application with the OmpSs Programming Model
www.bsc.es Optimizing an Earth Science Atmospheric Application with the OmpSs Programming Model HPC Knowledge Meeting'15 George S. Markomanolis, Jesus Labarta, Oriol Jorba University of Barcelona, Barcelona,
More informationMulti-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation
Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation 1 Cheng-Han Du* I-Hsin Chung** Weichung Wang* * I n s t i t u t e o f A p p l i e d M
More informationPREPARING AN AMR LIBRARY FOR SUMMIT. Max Katz March 29, 2018
PREPARING AN AMR LIBRARY FOR SUMMIT Max Katz March 29, 2018 CORAL: SIERRA AND SUMMIT NVIDIA Volta fueling supercomputers IBM Power 9 + NVIDIA Volta V100 Sierra (LLNL): 4 GPUs/node, ~4300 nodes Summit (ORNL):
More informationA Broad Overview of Scientific Visualization with a Focus on Geophysical Turbulence Simulation Data (SciVis
A Broad Overview of Scientific Visualization with a Focus on Geophysical Turbulence Simulation Data (SciVis 101 for Turbulence Researchers) John Clyne clyne@ucar.edu Examples: Medicine Examples: Biology
More informationOmpSs + OpenACC Multi-target Task-Based Programming Model Exploiting OpenACC GPU Kernel
www.bsc.es OmpSs + OpenACC Multi-target Task-Based Programming Model Exploiting OpenACC GPU Kernel Guray Ozen guray.ozen@bsc.es Exascale in BSC Marenostrum 4 (13.7 Petaflops ) General purpose cluster (3400
More informationCS4961 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/30/11. Administrative UPDATE. Mary Hall August 30, 2011
CS4961 Parallel Programming Lecture 3: Introduction to Parallel Architectures Administrative UPDATE Nikhil office hours: - Monday, 2-3 PM, MEB 3115 Desk #12 - Lab hours on Tuesday afternoons during programming
More informationNOVEL GPU FEATURES: PERFORMANCE AND PRODUCTIVITY. Peter Messmer
NOVEL GPU FEATURES: PERFORMANCE AND PRODUCTIVITY Peter Messmer pmessmer@nvidia.com COMPUTATIONAL CHALLENGES IN HEP Low-Level Trigger High-Level Trigger Monte Carlo Analysis Lattice QCD 2 COMPUTATIONAL
More informationEfficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning
Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning Ammar Ahmad Awan, Khaled Hamidouche, Akshay Venkatesh, and Dhabaleswar K. Panda Network-Based Computing Laboratory Department
More informationFast Dynamic Load Balancing for Extreme Scale Systems
Fast Dynamic Load Balancing for Extreme Scale Systems Cameron W. Smith, Gerrett Diamond, M.S. Shephard Computation Research Center (SCOREC) Rensselaer Polytechnic Institute Outline: n Some comments on
More informationWorking with Metal Overview
Graphics and Games #WWDC14 Working with Metal Overview Session 603 Jeremy Sandmel GPU Software 2014 Apple Inc. All rights reserved. Redistribution or public display not permitted without written permission
More informationImage-Space-Parallel Direct Volume Rendering on a Cluster of PCs
Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs B. Barla Cambazoglu and Cevdet Aykanat Bilkent University, Department of Computer Engineering, 06800, Ankara, Turkey {berkant,aykanat}@cs.bilkent.edu.tr
More informationExploiting Task-Parallelism on GPU Clusters via OmpSs and rcuda Virtualization
Exploiting Task-Parallelism on Clusters via Adrián Castelló, Rafael Mayo, Judit Planas, Enrique S. Quintana-Ortí RePara 2015, August Helsinki, Finland Exploiting Task-Parallelism on Clusters via Power/energy/utilization
More informationTHE CONVERGENCE OF HPC AND AI OBSERVATIONS AND INSIGHTS VERNEGLOBAL.COM
THE CONVERGENCE OF HPC AND AI OBSERVATIONS AND INSIGHTS VERNEGLOBAL.COM FIRST WELCOME TO VERNE GLOBAL Established in Iceland 2007 Optimised industrial scale data center solutions exploiting Iceland s cool
More informationThe Future of High Performance Interconnects
The Future of High Performance Interconnects Ashrut Ambastha HPC Advisory Council Perth, Australia :: August 2017 When Algorithms Go Rogue 2017 Mellanox Technologies 2 When Algorithms Go Rogue 2017 Mellanox
More informationNVIDIA DLI HANDS-ON TRAINING COURSE CATALOG
NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG Valid Through July 31, 2018 INTRODUCTION The NVIDIA Deep Learning Institute (DLI) trains developers, data scientists, and researchers on how to use artificial
More informationDELIVERABLE D5.5 Report on ICARUS visualization cluster installation. John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS)
DELIVERABLE D5.5 Report on ICARUS visualization cluster installation John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS) 02 May 2011 NextMuSE 2 Next generation Multi-mechanics Simulation Environment Cluster
More informationMPI + X programming. UTK resources: Rho Cluster with GPGPU George Bosilca CS462
MPI + X programming UTK resources: Rho Cluster with GPGPU https://newton.utk.edu/doc/documentation/systems/rhocluster George Bosilca CS462 MPI Each programming paradigm only covers a particular spectrum
More information