Porting Scalable Parallel CFD Application HiFUN on NVIDIA GPU
|
|
- Baldric Armstrong
- 6 years ago
- Views:
Transcription
1 Porting Scalable Parallel CFD Application NVIDIA D. V., N. Munikrishna, Nikhil Vijay Shende 1 N. Balakrishnan 2 Thejaswi Rao 3 1. S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering, Indian Institute of Science, Banglore, India 3. NVIDIA Graphics Pvt. Ltd., Banglore, India Technology Conference Silicon Valley March 26 29, / 18
2 Introduction The HiFUN Software High Resolution Flow Solver on Unstructured Meshes. A Computational Fluid Dynamics (CFD) Flow Solver. Primary product of the company SandI. Robust, fast, accurate and efficient tool. About SandI A technology company. Incubated from Indian Institute of Science, Bangalore. Promotes high end CFD technologies with uncompromising quality standards. 2 / 18
3 Introduction The HiFUN Software High Resolution Flow Solver on Unstructured Meshes. A Computational Fluid Dynamics (CFD) Flow Solver. Primary product of the company SandI. Robust, fast, accurate and efficient tool. About SandI A technology company. Incubated from Indian Institute of Science, Bangalore. Promotes high end CFD technologies with uncompromising quality standards. 2 / 18
4 3 / 18 Features of HiFUN General
5 Features of HiFUN Well Validated AIAA DPW SPICES AIAA HiLiftPW 4 / 18
6 Features of HiFUN Super Scalable Workload: 165 Million Volumes Simulation CPU Cores Time (Hours/Days) RANS / URANS / DES / / 18
7 6 / 18 SandI NVIDIA Collaboration Way Ahead NVIDIA Pascal, Volta NVLink With IBM Power CPU 2018 GTC 2018 GTCx Mumbai 2016 HiFUN in Apps Catalogue GTC 2016: Poster Presentation 2015 NVIDIA Innovation Award 2014 Joint Development Initiative Kicks Off
8 NVIDIA Hybrid Supercomputers Consist of CPU and NVIDIA. Less power to achieve same FLOPS. Less cooling & space. Thousands of computing cores sharing same RAM. Higher memory bandwidth. High data transfer overheads with CPU. 7 / 18
9 NVIDIA Hybrid Supercomputers Consist of CPU and NVIDIA. Less power to achieve same FLOPS. Less cooling & space. Thousands of computing cores sharing same RAM. Higher memory bandwidth. High data transfer overheads with CPU. 7 / 18
10 NVIDIA Parallelization Model on Shared memory. Many FLOPS per byte of data from CPU to. Re look at parallelization of CFD algorithms. Parallelization Challenges General purpose algorithms. Implicit: Global data dependence. Complex multi layered unstructured data structure. 8 / 18
11 NVIDIA Parallelization Model on Shared memory. Many FLOPS per byte of data from CPU to. Re look at parallelization of CFD algorithms. Parallelization Challenges General purpose algorithms. Implicit: Global data dependence. Complex multi layered unstructured data structure. 8 / 18
12 NVIDIA Constraints No compromise on distributed memory scalability. Source code maintainability should not suffer. Software portability should not suffer. Parallel Strategy Accelerate single node performance via offload model. Hybrid: MPI and OpenACC directives. Offload Model Computationally intensive part is offloaded to. Optimal data communication between CPU &. 9 / 18
13 NVIDIA Constraints No compromise on distributed memory scalability. Source code maintainability should not suffer. Software portability should not suffer. Parallel Strategy Accelerate single node performance via offload model. Hybrid: MPI and OpenACC directives. Offload Model Computationally intensive part is offloaded to. Optimal data communication between CPU &. 9 / 18
14 NVIDIA Constraints No compromise on distributed memory scalability. Source code maintainability should not suffer. Software portability should not suffer. Parallel Strategy Accelerate single node performance via offload model. Hybrid: MPI and OpenACC directives. Offload Model Computationally intensive part is offloaded to. Optimal data communication between CPU &. 9 / 18
15 10 / 18 NVIDIA Onera M6 NASA CRM Trap Wing Configurations & Workloads (Million) Onera M6 Wing: 1.1, 9.3, 12.12, 15.4 NASA CRM: 6.2, 26.5, 30 NASA Trap Wing: 20, 66 Simulation Type Steady RANS Simulations
16 10 / 18 NVIDIA Onera M6 NASA CRM Trap Wing Configurations & Workloads (Million) Onera M6 Wing: 1.1, 9.3, 12.12, 15.4 NASA CRM: 6.2, 26.5, 30 NASA Trap Wing: 20, 66 Simulation Type Steady RANS Simulations
17 NVIDIA Computing Platform: NVIDIA PSG Node configuration Two Hexa deca core Intel(R) Xeon(R) Haswell processors. Eight NVIDIA Tesla K 80 s. Memory = 12 GB. Total CPU Memory per node = 256 GB. Infiniband interconnect Software PGI Compiler 16.7 OPENMPI OpenACC / 18
18 NVIDIA Computing Platform: NVIDIA PSG Node configuration Two Hexa deca core Intel(R) Xeon(R) Haswell processors. Eight NVIDIA Tesla K 80 s. Memory = 12 GB. Total CPU Memory per node = 256 GB. Infiniband interconnect Software PGI Compiler 16.7 OPENMPI OpenACC / 18
19 12 / 18 NVIDIA Parallel Performance Parameters Ideal Speed up Ratio of number of nodes used for a given run to reference number of nodes. Actual Speed up Ratio of time/iteration using reference number of nodes to time/iteration using number of nodes for given run. Accelerator Speed up Ratio of time per iteration obtained using given no. of CPUs to time per iteration obtained using same no. of CPUs working in tandem with s.
20 NVIDIA Single Node Performance Observations Accelerator Speed up on 2 Increase in grid size increases utilization and accelerator speed up. Important to load completely. 13 / 18
21 NVIDIA Single Node Performance Varying s % Increase Observations Increase in no. of s increase accelerator speed up. Use of 4 s per node is optimal. 14 / 18
22 NVIDIA Single Node Performance Time to RANS Solution (Hours) Observations Time to solution on 1 million grid 15 minutes. Time to solution on 30 million grid half a day. Single node serves as a desktop supercomputer. 15 / 18
23 NVIDIA Multi node Performance Observations Parallel Speed up: 66 Million Workload Near linear speed up using 2 s per node. Drop in speed up for larger no. nodes and/or higher s due to lower utilization. 16 / 18
24 NVIDIA Multi node Performance Normalized Time Per Iteration: 66 Million Workload Observations Drop in time/iter with increase in no. of nodes and/or s. Time to solution with 8 nodes 4 hours. 17 / 18
25 18 / 18 NVIDIA Concluding Remarks Offload model to port. based computing node is powerful enough to serve as desktop supercomputer. HiFUN is ideally suited to solve grand challenge problems on based hybrid supercomputers. OpenACC directives based offload model is an attractive option for porting legacy CFD codes on.
26 18 / 18 NVIDIA Concluding Remarks Offload model to port. based computing node is powerful enough to serve as desktop supercomputer. HiFUN is ideally suited to solve grand challenge problems on based hybrid supercomputers. OpenACC directives based offload model is an attractive option for porting legacy CFD codes on.
27 18 / 18 NVIDIA Concluding Remarks Offload model to port. based computing node is powerful enough to serve as desktop supercomputer. HiFUN is ideally suited to solve grand challenge problems on based hybrid supercomputers. OpenACC directives based offload model is an attractive option for porting legacy CFD codes on.
28 18 / 18 NVIDIA Concluding Remarks Offload model to port. based computing node is powerful enough to serve as desktop supercomputer. HiFUN is ideally suited to solve grand challenge problems on based hybrid supercomputers. OpenACC directives based offload model is an attractive option for porting legacy CFD codes on.
Mapping MPI+X Applications to Multi-GPU Architectures
Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under
More informationPERFORMANCE PORTABILITY WITH OPENACC. Jeff Larkin, NVIDIA, November 2015
PERFORMANCE PORTABILITY WITH OPENACC Jeff Larkin, NVIDIA, November 2015 TWO TYPES OF PORTABILITY FUNCTIONAL PORTABILITY PERFORMANCE PORTABILITY The ability for a single code to run anywhere. The ability
More informationBig Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid Architectures
Procedia Computer Science Volume 51, 2015, Pages 2774 2778 ICCS 2015 International Conference On Computational Science Big Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid
More informationACCELERATION OF A COMPUTATIONAL FLUID DYNAMICS CODE WITH GPU USING OPENACC
Nonlinear Computational Aeroelasticity Lab ACCELERATION OF A COMPUTATIONAL FLUID DYNAMICS CODE WITH GPU USING OPENACC N I C H O L S O N K. KO U K PA I Z A N P H D. C A N D I D AT E GPU Technology Conference
More informationMaximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs
Presented at the 2014 ANSYS Regional Conference- Detroit, June 5, 2014 Maximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs Bhushan Desam, Ph.D. NVIDIA Corporation 1 NVIDIA Enterprise
More informationNVIDIA Update and Directions on GPU Acceleration for Earth System Models
NVIDIA Update and Directions on GPU Acceleration for Earth System Models Stan Posey, HPC Program Manager, ESM and CFD, NVIDIA, Santa Clara, CA, USA Carl Ponder, PhD, Applications Software Engineer, NVIDIA,
More informationAcuSolve Performance Benchmark and Profiling. October 2011
AcuSolve Performance Benchmark and Profiling October 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox, Altair Compute
More informationReal Application Performance and Beyond
Real Application Performance and Beyond Mellanox Technologies Inc. 2900 Stender Way, Santa Clara, CA 95054 Tel: 408-970-3400 Fax: 408-970-3403 http://www.mellanox.com Scientists, engineers and analysts
More informationCMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)
CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can
More informationGPU Architecture. Alan Gray EPCC The University of Edinburgh
GPU Architecture Alan Gray EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? Architectural reasons for accelerator performance advantages Latest GPU Products From
More informationPreparing GPU-Accelerated Applications for the Summit Supercomputer
Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership
More informationRecent results with elsa on multi-cores
Michel Gazaix (ONERA) Steeve Champagneux (AIRBUS) October 15th, 2009 Outline Short introduction to elsa elsa benchmark on HPC platforms Detailed performance evaluation IBM Power5, AMD Opteron, INTEL Nehalem
More informationMulti-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation
Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation 1 Cheng-Han Du* I-Hsin Chung** Weichung Wang* * I n s t i t u t e o f A p p l i e d M
More informationA Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids
A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids Patrice Castonguay and Antony Jameson Aerospace Computing Lab, Stanford University GTC Asia, Beijing, China December 15 th, 2011
More informationMICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE
MICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE LEVERAGE OUR EXPERTISE sales@microway.com http://microway.com/tesla NUMBERSMASHER TESLA 4-GPU SERVER/WORKSTATION Flexible form factor 4 PCI-E GPUs + 3 additional
More informationHETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA
HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS
More informationANSYS HPC. Technology Leadership. Barbara Hutchings ANSYS, Inc. September 20, 2011
ANSYS HPC Technology Leadership Barbara Hutchings barbara.hutchings@ansys.com 1 ANSYS, Inc. September 20, Why ANSYS Users Need HPC Insight you can t get any other way HPC enables high-fidelity Include
More informationOP2 FOR MANY-CORE ARCHITECTURES
OP2 FOR MANY-CORE ARCHITECTURES G.R. Mudalige, M.B. Giles, Oxford e-research Centre, University of Oxford gihan.mudalige@oerc.ox.ac.uk 27 th Jan 2012 1 AGENDA OP2 Current Progress Future work for OP2 EPSRC
More informationGTC 2017 S7672. OpenACC Best Practices: Accelerating the C++ NUMECA FINE/Open CFD Solver
David Gutzwiller, NUMECA USA (david.gutzwiller@numeca.com) Dr. Ravi Srinivasan, Dresser-Rand Alain Demeulenaere, NUMECA USA 5/9/2017 GTC 2017 S7672 OpenACC Best Practices: Accelerating the C++ NUMECA FINE/Open
More informationAerodynamics of a hi-performance vehicle: a parallel computing application inside the Hi-ZEV project
Workshop HPC enabling of OpenFOAM for CFD applications Aerodynamics of a hi-performance vehicle: a parallel computing application inside the Hi-ZEV project A. De Maio (1), V. Krastev (2), P. Lanucara (3),
More informationIBM CORAL HPC System Solution
IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy
More informationLattice Simulations using OpenACC compilers. Pushan Majumdar (Indian Association for the Cultivation of Science, Kolkata)
Lattice Simulations using OpenACC compilers Pushan Majumdar (Indian Association for the Cultivation of Science, Kolkata) OpenACC is a programming standard for parallel computing developed by Cray, CAPS,
More informationNext-generation CFD: Real-Time Computation and Visualization
Next-generation CFD: Real-Time Computation and Visualization Christian F. Janßen Hamburg University of Technology Tesla C1060, ~20 million lattice nodes [2010] Kinetic approaches for the simulation of
More informationsimulation framework for piecewise regular grids
WALBERLA, an ultra-scalable multiphysics simulation framework for piecewise regular grids ParCo 2015, Edinburgh September 3rd, 2015 Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler
More informationAn Introduction to the SPEC High Performance Group and their Benchmark Suites
An Introduction to the SPEC High Performance Group and their Benchmark Suites Robert Henschel Manager, Scientific Applications and Performance Tuning Secretary, SPEC High Performance Group Research Technologies
More informationHigh Performance Computing Course Notes HPC Fundamentals
High Performance Computing Course Notes 2008-2009 2009 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs
More informationInterconnect Your Future
Interconnect Your Future Gilad Shainer 2nd Annual MVAPICH User Group (MUG) Meeting, August 2014 Complete High-Performance Scalable Interconnect Infrastructure Comprehensive End-to-End Software Accelerators
More informationHPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)
HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access
More informationA Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC
A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC Hisashi YASHIRO RIKEN Advanced Institute of Computational Science Kobe, Japan My topic The study for Cloud computing My topic
More informationANSYS HPC Technology Leadership
ANSYS HPC Technology Leadership 1 ANSYS, Inc. November 14, Why ANSYS Users Need HPC Insight you can t get any other way It s all about getting better insight into product behavior quicker! HPC enables
More informationTianhe-2, the world s fastest supercomputer. Shaohua Wu Senior HPC application development engineer
Tianhe-2, the world s fastest supercomputer Shaohua Wu Senior HPC application development engineer Inspur Inspur revenue 5.8 2010-2013 6.4 2011 2012 Unit: billion$ 8.8 2013 21% Staff: 14, 000+ 12% 10%
More informationIBM Power Advanced Compute (AC) AC922 Server
IBM Power Advanced Compute (AC) AC922 Server The Best Server for Enterprise AI Highlights IBM Power Systems Accelerated Compute (AC922) server is an acceleration superhighway to enterprise- class AI. A
More informationNVIDIA GPU TECHNOLOGY UPDATE
NVIDIA GPU TECHNOLOGY UPDATE May 2015 Axel Koehler Senior Solutions Architect, NVIDIA NVIDIA: The VISUAL Computing Company GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS
More informationC-DAC HPC Trends & Activities in India. Abhishek Das Scientist & Team Leader HPC Solutions Group C-DAC Ministry of Communications & IT Govt of India
C-DAC HPC Trends & Activities in India Abhishek Das Scientist & Team Leader HPC Solutions Group C-DAC Ministry of Communications & IT Govt of India Presentation Outline A brief profile of C-DAC, India
More informationLooking ahead with IBM i. 10+ year roadmap
Looking ahead with IBM i 10+ year roadmap 1 Enterprises Trust IBM Power 80 of Fortune 100 have IBM Power Systems The top 10 banking firms have IBM Power Systems 9 of top 10 insurance companies have IBM
More informationIntel Xeon Phi архитектура, модели программирования, оптимизация.
Нижний Новгород, 2017 Intel Xeon Phi архитектура, модели программирования, оптимизация. Дмитрий Прохоров, Дмитрий Рябцев, Intel Agenda What and Why Intel Xeon Phi Top 500 insights, roadmap, architecture
More informationANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation
ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent
More informationInterconnect Your Future
#OpenPOWERSummit Interconnect Your Future Scot Schultz, Director HPC / Technical Computing Mellanox Technologies OpenPOWER Summit, San Jose CA March 2015 One-Generation Lead over the Competition Mellanox
More informationRecent Updates to the CFD General Notation System (CGNS)
Recent Updates to the CFD General Notation System (CGNS) C. L. Rumsey NASA Langley Research Center B. Wedan Computational Engineering Solutions T. Hauser University of Colorado M. Poinot ONERA AIAA-2012-1264,
More informationMASSIVELY-PARALLEL MULTI-GPU SIMULATIONS FOR FAST AND ACCURATE AUTOMOTIVE AERODYNAMICS
6th European Conference on Computational Mechanics (ECCM 6) 7th European Conference on Computational Fluid Dynamics (ECFD 7) -5 June 28, Glasgow, UK MASSIVELY-PARALLEL MULTI-GPU SIMULATIONS FOR FAST AND
More informationHPC Solution. Technology for a New Era in Computing
HPC Solution Technology for a New Era in Computing TEL IN HPC & Storage.. 20 years of changing with Technology Complete Solution Integrators for Select Verticals Mechanical Design & Engineering High Performance
More informationPerformance Benefits of NVIDIA GPUs for LS-DYNA
Performance Benefits of NVIDIA GPUs for LS-DYNA Mr. Stan Posey and Dr. Srinivas Kodiyalam NVIDIA Corporation, Santa Clara, CA, USA Summary: This work examines the performance characteristics of LS-DYNA
More informationIBM Power AC922 Server
IBM Power AC922 Server The Best Server for Enterprise AI Highlights More accuracy - GPUs access system RAM for larger models Faster insights - significant deep learning speedups Rapid deployment - integrated
More informationOptimizing LS-DYNA Productivity in Cluster Environments
10 th International LS-DYNA Users Conference Computing Technology Optimizing LS-DYNA Productivity in Cluster Environments Gilad Shainer and Swati Kher Mellanox Technologies Abstract Increasing demand for
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationAcceleration of HPC applications on hybrid CPU-GPU systems: When can Multi-Process Service (MPS) help?
Acceleration of HPC applications on hybrid CPU- systems: When can Multi-Process Service (MPS) help? GTC 2018 March 28, 2018 Olga Pearce (Lawrence Livermore National Laboratory) http://people.llnl.gov/olga
More informationGPGPU Offloading with OpenMP 4.5 In the IBM XL Compiler
GPGPU Offloading with OpenMP 4.5 In the IBM XL Compiler Taylor Lloyd Jose Nelson Amaral Ettore Tiotto University of Alberta University of Alberta IBM Canada 1 Why? 2 Supercomputer Power/Performance GPUs
More informationAcuSolve Performance Benchmark and Profiling. October 2011
AcuSolve Performance Benchmark and Profiling October 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox, Altair Compute
More informationGPU-Powered WRF in the Cloud for Research and Operational Applications
GPU-Powered WRF in the Cloud for Research and Operational Applications John Manobianco, Chief Scientist Don Berchoff, Chief Technical Officer john@tempoquest.com, don@tempoquest.com 2017 Modeling Research
More informationTurbostream: A CFD solver for manycore
Turbostream: A CFD solver for manycore processors Tobias Brandvik Whittle Laboratory University of Cambridge Aim To produce an order of magnitude reduction in the run-time of CFD solvers for the same hardware
More informationThe Future of High Performance Interconnects
The Future of High Performance Interconnects Ashrut Ambastha HPC Advisory Council Perth, Australia :: August 2017 When Algorithms Go Rogue 2017 Mellanox Technologies 2 When Algorithms Go Rogue 2017 Mellanox
More information19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr
19. prosince 2018 CIIRC Praha Milan Král, IBM Radek Špimr CORAL CORAL 2 CORAL Installation at ORNL CORAL Installation at LLNL Order of Magnitude Leap in Computational Power Real, Accelerated Science ACME
More informationQLogic TrueScale InfiniBand and Teraflop Simulations
WHITE Paper QLogic TrueScale InfiniBand and Teraflop Simulations For ANSYS Mechanical v12 High Performance Interconnect for ANSYS Computer Aided Engineering Solutions Executive Summary Today s challenging
More informationChoosing the Best Network Interface Card for Cloud Mellanox ConnectX -3 Pro EN vs. Intel XL710
COMPETITIVE BRIEF April 5 Choosing the Best Network Interface Card for Cloud Mellanox ConnectX -3 Pro EN vs. Intel XL7 Introduction: How to Choose a Network Interface Card... Comparison: Mellanox ConnectX
More informationFirst Steps of YALES2 Code Towards GPU Acceleration on Standard and Prototype Cluster
First Steps of YALES2 Code Towards GPU Acceleration on Standard and Prototype Cluster YALES2: Semi-industrial code for turbulent combustion and flows Jean-Matthieu Etancelin, ROMEO, NVIDIA GPU Application
More informationHPC with GPU and its applications from Inspur. Haibo Xie, Ph.D
HPC with GPU and its applications from Inspur Haibo Xie, Ph.D xiehb@inspur.com 2 Agenda I. HPC with GPU II. YITIAN solution and application 3 New Moore s Law 4 HPC? HPC stands for High Heterogeneous Performance
More informationPREPARING AN AMR LIBRARY FOR SUMMIT. Max Katz March 29, 2018
PREPARING AN AMR LIBRARY FOR SUMMIT Max Katz March 29, 2018 CORAL: SIERRA AND SUMMIT NVIDIA Volta fueling supercomputers IBM Power 9 + NVIDIA Volta V100 Sierra (LLNL): 4 GPUs/node, ~4300 nodes Summit (ORNL):
More informationAccelerating Implicit LS-DYNA with GPU
Accelerating Implicit LS-DYNA with GPU Yih-Yih Lin Hewlett-Packard Company Abstract A major hindrance to the widespread use of Implicit LS-DYNA is its high compute cost. This paper will show modern GPU,
More informationSingle-Points of Performance
Single-Points of Performance Mellanox Technologies Inc. 29 Stender Way, Santa Clara, CA 9554 Tel: 48-97-34 Fax: 48-97-343 http://www.mellanox.com High-performance computations are rapidly becoming a critical
More informationS Comparing OpenACC 2.5 and OpenMP 4.5
April 4-7, 2016 Silicon Valley S6410 - Comparing OpenACC 2.5 and OpenMP 4.5 James Beyer, NVIDIA Jeff Larkin, NVIDIA GTC16 April 7, 2016 History of OpenMP & OpenACC AGENDA Philosophical Differences Technical
More informationn N c CIni.o ewsrg.au
@NCInews NCI and Raijin National Computational Infrastructure 2 Our Partners General purpose, highly parallel processors High FLOPs/watt and FLOPs/$ Unit of execution Kernel Separate memory subsystem GPGPU
More informationBuilding NVLink for Developers
Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized
More informationCode Performance Analysis
Code Performance Analysis Massimiliano Fatica ASCI TST Review May 8 2003 Performance Theoretical peak performance of the ASCI machines are in the Teraflops range, but sustained performance with real applications
More informationNVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU
NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU GPGPU opens the door for co-design HPC, moreover middleware-support embedded system designs to harness the power of GPUaccelerated
More informationHigh Performance Computing in C and C++
High Performance Computing in C and C++ Rita Borgo Computer Science Department, Swansea University WELCOME BACK Course Administration Contact Details Dr. Rita Borgo Home page: http://cs.swan.ac.uk/~csrb/
More informationSENSEI / SENSEI-Lite / SENEI-LDC Updates
SENSEI / SENSEI-Lite / SENEI-LDC Updates Chris Roy and Brent Pickering Aerospace and Ocean Engineering Dept. Virginia Tech July 23, 2014 Collaborations with Math Collaboration on the implicit SENSEI-LDC
More informationAnalyzing the Performance of IWAVE on a Cluster using HPCToolkit
Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,
More informationSpeedup Altair RADIOSS Solvers Using NVIDIA GPU
Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair
More informationIN11E: Architecture and Integration Testbed for Earth/Space Science Cyberinfrastructures
IN11E: Architecture and Integration Testbed for Earth/Space Science Cyberinfrastructures A Future Accelerated Cognitive Distributed Hybrid Testbed for Big Data Science Analytics Milton Halem 1, John Edward
More informationGPU COMPUTING WITH MSC NASTRAN 2013
SESSION TITLE WILL BE COMPLETED BY MSC SOFTWARE GPU COMPUTING WITH MSC NASTRAN 2013 Srinivas Kodiyalam, NVIDIA, Santa Clara, USA THEME Accelerated computing with GPUs SUMMARY Current trends in HPC (High
More informationIntel Xeon Phi архитектура, модели программирования, оптимизация.
Нижний Новгород, 2016 Intel Xeon Phi архитектура, модели программирования, оптимизация. Дмитрий Прохоров, Intel Agenda What and Why Intel Xeon Phi Top 500 insights, roadmap, architecture How Programming
More informationINTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017
INTRODUCTION TO OPENACC Analyzing and Parallelizing with OpenACC, Feb 22, 2017 Objective: Enable you to to accelerate your applications with OpenACC. 2 Today s Objectives Understand what OpenACC is and
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationS-ducts and Nozzles: STAR-CCM+ at the Propulsion Aerodynamics Workshop. Peter Burns, CD-adapco
S-ducts and Nozzles: STAR-CCM+ at the Propulsion Aerodynamics Workshop Peter Burns, CD-adapco Background The Propulsion Aerodynamics Workshop (PAW) has been held twice PAW01: 2012 at the 48 th AIAA JPC
More informationHigh Performance Computing Course Notes Course Administration
High Performance Computing Course Notes 2009-2010 2010 Course Administration Contacts details Dr. Ligang He Home page: http://www.dcs.warwick.ac.uk/~liganghe Email: liganghe@dcs.warwick.ac.uk Office hours:
More informationGPU Computing with NVIDIA s new Kepler Architecture
GPU Computing with NVIDIA s new Kepler Architecture Axel Koehler Sr. Solution Architect HPC HPC Advisory Council Meeting, March 13-15 2013, Lugano 1 NVIDIA: Parallel Computing Company GPUs: GeForce, Quadro,
More informationBuilding the Most Efficient Machine Learning System
Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide
More informationHPC-CINECA infrastructure: The New Marconi System. HPC methods for Computational Fluid Dynamics and Astrophysics Giorgio Amati,
HPC-CINECA infrastructure: The New Marconi System HPC methods for Computational Fluid Dynamics and Astrophysics Giorgio Amati, g.amati@cineca.it Agenda 1. New Marconi system Roadmap Some performance info
More informationWhy HPC for. ANSYS Mechanical and ANSYS CFD?
Why HPC for ANSYS Mechanical and ANSYS CFD? 1 HPC Defined High Performance Computing (HPC) at ANSYS: An ongoing effort designed to remove computing limitations from engineers who use computer aided engineering
More informationPyFR: Heterogeneous Computing on Mixed Unstructured Grids with Python. F.D. Witherden, M. Klemm, P.E. Vincent
PyFR: Heterogeneous Computing on Mixed Unstructured Grids with Python F.D. Witherden, M. Klemm, P.E. Vincent 1 Overview Motivation. Accelerators and Modern Hardware Python and PyFR. Summary. Motivation
More informationSupercomputing made super human
Supercomputing made super human The New Age of Accelerated Computing: A History of Innovation and Optimization in Computing Steve Hebert, Cofounder and CEO, Nimbix 2 1880 census had taken eight years to
More informationParallel Applications on Distributed Memory Systems. Le Yan HPC User LSU
Parallel Applications on Distributed Memory Systems Le Yan HPC User Services @ LSU Outline Distributed memory systems Message Passing Interface (MPI) Parallel applications 6/3/2015 LONI Parallel Programming
More informationGPU COMPUTING AND THE FUTURE OF HPC. Timothy Lanfear, NVIDIA
GPU COMPUTING AND THE FUTURE OF HPC Timothy Lanfear, NVIDIA ~1 W ~3 W ~100 W ~30 W 1 kw 100 kw 20 MW Power-constrained Computers 2 EXASCALE COMPUTING WILL ENABLE TRANSFORMATIONAL SCIENCE RESULTS First-principles
More informationSYNERGIE VON HPC UND DEEP LEARNING MIT NVIDIA GPUS
SYNERGIE VON HPC UND DEEP LEARNING MIT NVIDIA S Axel Koehler, Principal Solution Architect HPCN%Workshop%Goettingen,%14.%Mai%2018 NVIDIA - AI COMPUTING COMPANY Computer Graphics Computing Artificial Intelligence
More informationRAMSES on the GPU: An OpenACC-Based Approach
RAMSES on the GPU: An OpenACC-Based Approach Claudio Gheller (ETHZ-CSCS) Giacomo Rosilho de Souza (EPFL Lausanne) Romain Teyssier (University of Zurich) Markus Wetzstein (ETHZ-CSCS) PRACE-2IP project EU
More informationHPC future trends from a science perspective
HPC future trends from a science perspective Simon McIntosh-Smith University of Bristol HPC Research Group simonm@cs.bris.ac.uk 1 Business as usual? We've all got used to new machines being relatively
More informationACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016
ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016 Challenges What is Algebraic Multi-Grid (AMG)? AGENDA Why use AMG? When to use AMG? NVIDIA AmgX Results 2
More informationThe Stampede is Coming: A New Petascale Resource for the Open Science Community
The Stampede is Coming: A New Petascale Resource for the Open Science Community Jay Boisseau Texas Advanced Computing Center boisseau@tacc.utexas.edu Stampede: Solicitation US National Science Foundation
More informationChallenges Simulating Real Fuel Combustion Kinetics: The Role of GPUs
Challenges Simulating Real Fuel Combustion Kinetics: The Role of GPUs M. J. McNenly and R. A. Whitesides GPU Technology Conference March 27, 2014 San Jose, CA LLNL-PRES-652254! This work performed under
More informationMaking Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010
Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010 Windows HPC Server 2008 R2 Windows HPC Server 2008 R2 makes supercomputing
More informationKeisuke Sawada. Department of Aerospace Engineering Tohoku University
March 29th, 213 : Next Generation Aircraft Workshop at Washington University Numerical Study of Wing Deformation Effect in Wind-Tunnel Testing Keisuke Sawada Department of Aerospace Engineering Tohoku
More informationREAL PERFORMANCE RESULTS WITH VMWARE HORIZON AND VIEWPLANNER
April 4-7, 2016 Silicon Valley REAL PERFORMANCE RESULTS WITH VMWARE HORIZON AND VIEWPLANNER Manvender Rawat, NVIDIA Jason K. Lee, NVIDIA Uday Kurkure, VMware Inc. Overview of VMware Horizon 7 and NVIDIA
More informationStan Posey, NVIDIA, Santa Clara, CA, USA
Stan Posey, sposey@nvidia.com NVIDIA, Santa Clara, CA, USA NVIDIA Strategy for CWO Modeling (Since 2010) Initial focus: CUDA applied to climate models and NWP research Opportunities to refactor code with
More informationTransactions on Information and Communications Technologies vol 3, 1993 WIT Press, ISSN
The implementation of a general purpose FORTRAN harness for an arbitrary network of transputers for computational fluid dynamics J. Mushtaq, A.J. Davies D.J. Morgan ABSTRACT Many Computational Fluid Dynamics
More informationPerformance of the 3D-Combustion Simulation Code RECOM-AIOLOS on IBM POWER8 Architecture. Alexander Berreth. Markus Bühler, Benedikt Anlauf
PADC Anual Workshop 20 Performance of the 3D-Combustion Simulation Code RECOM-AIOLOS on IBM POWER8 Architecture Alexander Berreth RECOM Services GmbH, Stuttgart Markus Bühler, Benedikt Anlauf IBM Deutschland
More informationA Comprehensive Study on the Performance of Implicit LS-DYNA
12 th International LS-DYNA Users Conference Computing Technologies(4) A Comprehensive Study on the Performance of Implicit LS-DYNA Yih-Yih Lin Hewlett-Packard Company Abstract This work addresses four
More informationHPC Usage for Aerodynamic Flow Computation with Different Levels of Detail
DLR.de Folie 1 HPCN-Workshop 14./15. Mai 2018 HPC Usage for Aerodynamic Flow Computation with Different Levels of Detail Cornelia Grabe, Marco Burnazzi, Axel Probst, Silvia Probst DLR, Institute of Aerodynamics
More informationAccelerate Graphics in Virtual Environments
Accelerate Graphics in Virtual Environments Deliver rich graphics capabilities to more users through virtualized graphics technology from Citrix, Dell, and NVIDIA. Virtualization of graphics applications
More informationWhy Supercomputing Partnerships Matter for CFD Simulations
Why Supercomputing Partnerships Matter for CFD Simulations Wim Slagter, PhD Director, HPC & Cloud Alliances ANSYS, Inc. 1 2017 ANSYS, Inc. May 9, 2017 ANSYS is Fluids FOCUSED This is all we do. Leading
More informationIllinois Proposal Considerations Greg Bauer
- 2016 Greg Bauer Support model Blue Waters provides traditional Partner Consulting as part of its User Services. Standard service requests for assistance with porting, debugging, allocation issues, and
More information