Challenges Simulating Real Fuel Combustion Kinetics: The Role of GPUs
|
|
- Jerome Todd
- 5 years ago
- Views:
Transcription
1 Challenges Simulating Real Fuel Combustion Kinetics: The Role of GPUs M. J. McNenly and R. A. Whitesides GPU Technology Conference March 27, 2014 San Jose, CA LLNL-PRES ! This work performed under the auspices of the U.S. Department of Energy by under Contract DE-AC52-07NA27344
2 Enhanced understanding of high efficiency clean combustion requires expensive models that fully couple detailed kinetics with CFD Objective! Create faster and more accurate combustion solvers.! Accelerates R&D on three major challenges identified in the DOE VTP multi-year program plan:! A. Lack of fundamental knowledge of advanced engine combustion regimes! C. Lack of modeling capability for combustion and emission control! D. Lack of effective engine controls! We used to aim for! Detailed chemistry!!!! in highly resolved 3D simulations!!! Ex.!Diesel component!!c 20 H 42 (LLNL)!!7.2K!species!!53K!reaction steps! Ex.!SI/HCCI transition ~30M cells for Bosch in LLNL s hpc4energy incubator! 2
3 Enhanced understanding of high efficiency clean combustion requires expensive models that fully couple detailed kinetics with CFD Objective! Create faster and more accurate combustion solvers.! Accelerates R&D on three major challenges identified in the DOE VTP multi-year program plan:! A. Lack of fundamental knowledge of advanced engine combustion regimes! C. Lack of modeling capability for combustion and emission control! D. Lack of effective engine controls! Now we want to use! Detailed chemistry!!!!! Ex.!9-component diesel surrogate (AVFL18)!!C. Mueller et al. Energy Fuels, 2012.!!+10K!species!!+75K!reaction steps! in highly resolved 3D simulations! 3
4 Long term challenges requires new algorithms and lower cost computing architecture (flops/$) Current industrial research achieves 30M cells LES/flamelet cells or 1.4B species-cells RANS/detailed finite rate kinetics! Pick one: Turbulence Chemistry! The John Deur [Cummins] Challenge all of the above with 150M cells and 10K species! Bare memory requirements: 12TB for a snapshot! 400 nodes of cab (2x 8 core Intel, 2GB per core)! 2000 NVIDIA K20X (4 Indiana University Big Red II)! 4
5 HCCI relies on the chemical kinetics of the charge mixture to determine ignition timing and burn duration and not mixing Spark-Ignition (SI)! HCCI! Videos courtesy of Y. Ikeda (U of Kobe)! 5
6 Predictive HCCI models require accurate chemistry more than detailed fluid dynamic transport Aceves et al. SAE :! What role do the in-cylinder transport processes affect HCCI performance?! turbulence intensity, m/s disc square solid lines: experimental dotted lines: numerical crank angle, degrees 6
7 Approach Accelerate research in advanced combustion regimes by developing faster and more predictive engine models 1. Better algorithms and applied mathematics same solution only faster 2. New computing architecture more flops per second, per dollar, per watt 3. Improved physical models more accuracy, better error control 7
8 General Processing Units (GPGPUs) Our GPUPurpose researchgraphical into combustion algorithms had very modest roots bringaug. Tflop/s power to the partners desktop from 2010computing presentation to program Originally used for graphics intensive applications: - video games GPU: Cores: Mem: Tflop/s: Price: Price: CPU: Cores: Mem: Tflop/s: Price: Intel/AMD up to 12 up to 256GB up to 0.13 up to $1300 NVIDIA GTX GB <$300 $500 8
9 Enthalpy and entropy calculations have greater speedup benefit from more arithmetic operations per memory access 9
10 Approach Accelerate research in advanced combustion regimes by developing faster and more predictive engine models 1. Better algorithms and applied mathematics same solution only faster 2. New computing architecture more flops per second, per dollar, per watt 3. Improved physical models more accuracy, better error control 10
11 Better Algorithms: adaptive preconditioner using on-the-fly reduction produces the same solution significantly faster Two approaches to faster chemistry solutions Ex. iso-octane 874 species 3796 reactions 1. Classic mechanism reduction: Ex.197 species Smaller ODE size Smaller Jacobian matrix 2. LLNL s adaptive preconditioner: Solution is faster but is not accurate over the entire operating range Jacobian Matrix (species coupling freq.) slower faster Identical ODE Reduced mech only in preconditioner Filter out 50-75% of the least important reactions Our solver is as fast as the reduced mechanism without any loss of accuracy - more than 10x speedup 11
12 LLNL s solver brings well-resolved chemistry and 3D CFD to 1-day engine design iterations using iso-octane (874 species) Simulation time (chemistry-only) for 10 6 cells on 32 processors! 2-methylnonadecane! 7172 species! steps! Traditional dense matrix ODE solvers still found in KIVA and OpenFOAM - 90 years 874 species iso-octane now only 1-day!! New commercial solvers using sparse systems days New LLNL solvers created for VTP program - 11 days 12
13 Approach Accelerate research in advanced combustion regimes by developing faster and more predictive engine models 1. Better algorithms and applied mathematics same solution only faster 2. New computing architecture more flops per second, per dollar, per watt 3. Improved physical models more accuracy, better error control 13
14 The calculation of the species production rate serves as a practical case study to illustrate GPU algorithm development Creation rate of species i Rate of progress of reaction step j Example: hydrogen mechanism! Summation over all reactions containing species i as a product 14
15 Low arithmetic intensity and unstructured memory access of the species production rate algorithm makes it ill-suited for the GPU Scatter-add for the species production rates Unstructured read or write operations very costly on GPU iso-octane (874 species) GPU: 0.13 GB/s CPU: 6.6 GB/s 15
16 Processing multiple instructions that update the same species within the same thread block increases throughput Process 4 instructions per block: Shared Memory Bank iso-octane (874 species) GPU: GB/s (2 n instr) CPU: 17 GB/s GPU Thread Block 1 16
17 Similar algorithm improvements allows for an order of magnitude speedup in the calculation of the system derivatives on the GPU 17
18 The system derivatives for chemistry are only half the CPU cost the matrix operations must also be accelerated Total Simulation Time (sec)! A: CPU Only (1T)! D: Vector Ops on GPU! E: Matrix skipping (1T)! F: Matrix CPU threads (8T)! A! D! E! F! 18
19 NVIDIA has provided us with a new sparse matrix software library GLU sparse matrix package Developed internally at NVIDIA - lead developers Maxim Naumov & Sharan Chetlur Key application: integrated circuits (SPICE) Non-symmetric matrices Beta software LLNL has been provided early access presentations/s3364-spice-acceleration-on-gpus.pdf! 19
20 The system derivatives for chemistry are only half the CPU cost the matrix operations must also be accelerated Total Simulation Time (sec)! A: CPU Only (1T)! D: Vector Ops on GPU! E: Matrix skipping (1T)! F: Matrix CPU threads (8T)! A! D! E! F! 20
21 First GLU results promising it is faster than the best work avoidance CPU-heuristic Total Simulation Time (sec)! A: CPU Only (1T)! D: Vector Ops on GPU! E: Matrix skipping (1T)! F: Matrix CPU threads (8T)! G: GLU Matrix Ops! A! D! E! F! G! 21
22 Second GLU results even better now comparable speed to multithreaded CPU but offers greater future growth Total Simulation Time (sec)! A: CPU Only (1T)! D: Vector Ops on GPU! E: Matrix skipping (1T)! F: Matrix CPU threads (8T)! G: GLU Matrix Ops! H: GLU Matrix Ops #2! A! D! E! F! G! H! 22
23 Further improvements are expected with new mix & match solver capability added to the next build Total Simulation Time (sec)! CPU matrix! GPU matrix! A: CPU Only (1T)! D: Vector Ops on GPU! E: Matrix skipping (1T)! F: Matrix CPU threads (8T)! G: GLU Matrix Ops! H: GLU Matrix Ops #2! A! D! E! F! G! H! 23
24 Chemistry integration over a fluid dynamic time step Single chemical reactor! t 0 t 1 t 2 t N-1 t N dt 0 dt 1 Integrator decides how long a step to take and when to perform new factorization! Black lines indicates derivative evaluation and backsolve! Red lines indicate matrix factorization (traditionally expensive).! 24
25 Reactor coupling: Ideal Case cell 1! cell 2! cell 3! cell 4! Combined! 25
26 Reactor coupling: Ideal Case cell 1! cell 2! cell 3! cell 4! Exactly the same initial conditions means exactly the same combined time steps and matrix factorizations.! Combined! 26
27 Reactor coupling: Poor Case cell 1! cell 2! cell 3! cell 4! Combined! 27
28 Reactor coupling: Poor Case cell 1! cell 2! Closely related initial conditions can lead to very large coupling penalty.! cell 3! cell 4! Combined! 28
29 Reactor coupling: Acceptable Case cell 1! cell 2! cell 3! cell 4! Combined! 29
30 Under what conditions can GPU-friendly multireactor groups be solved efficiently? T i = 800,! Φ i! = 0.2! Speedup (CPU Time/GPU Time)! ΔΦ i! 0! 0.1! 0.2! 50! 25! 0! ΔT i 30
31 Initial study reveals many operating conditions where the multireactor approach is near-ideal T i 1800! 1600! 1400! 1200! 1000! ΔΦ i! 0! 0.1! 0.2! 2000! 50! 25! ΔT i 0! Speedup (GPU Time/CPU Time)! 800! Φ i! 0.2! 0.4! 0.6! 0.8! 1.0! 31
32 Future research directions to improve combustion algorithms on the GPU Near term (FY14): 1. Evaluate new metrics for organizing GPU-friendly multireactor groups! 2. Take full advantage of the hybrid CPU/GPU architecture using the CPU to avoid poorly coupled multireactors!! Long term: 3. Sensitivity calculations and other mechanism development tools used by fuel chemists! 4. Lagrangian spray models! 5. New control strategies for stiff ODE integrators! 6. Many-species transport! 32
33 Summary: GPU-accelerated multireactor solver will enable detailed fuel chemistry to be used in IC engine design Total Simulation Time (sec)! A: CPU Only (1T)! D: Vector Ops on GPU! E: Matrix skipping (1T)! F: Matrix CPU threads (8T)! G: GLU Matrix Ops! H: GLU Matrix Ops #2! A! D! E! F! G! H! 33
34 Acknowledgements We gratefully acknowledge the support of Gurpreet Singh, the Advanced Combustion Engines Program Leader for the US Department of Energy Vehicle Technologies Office. We would like to thank Stan Posey of NVIDIA for providing us with new K20 Tesla GPUs (2496 CUDA cores) for testing, and Maxim Naumov for adding new features to the GLU library and providing us with support. We would also like to thank Prof. Bill Green and his research group at MIT for providing large mechanisms to test from their Reaction Mechanism Generator (RMG) package ( 34
Maximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs
Presented at the 2014 ANSYS Regional Conference- Detroit, June 5, 2014 Maximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs Bhushan Desam, Ph.D. NVIDIA Corporation 1 NVIDIA Enterprise
More informationKrishnan Suresh Associate Professor Mechanical Engineering
Large Scale FEA on the GPU Krishnan Suresh Associate Professor Mechanical Engineering High-Performance Trick Computations (i.e., 3.4*1.22): essentially free Memory access determines speed of code Pick
More informationOn Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators
On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC
More informationHYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE
HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER AVISHA DHISLE PRERIT RODNEY ADHISLE PRODNEY 15618: PARALLEL COMPUTER ARCHITECTURE PROF. BRYANT PROF. KAYVON LET S
More informationModern GPUs (Graphics Processing Units)
Modern GPUs (Graphics Processing Units) Powerful data parallel computation platform. High computation density, high memory bandwidth. Relatively low cost. NVIDIA GTX 580 512 cores 1.6 Tera FLOPs 1.5 GB
More informationCrevice and Blowby Model Development and Application
Crevice and Blowby Model Development and Application Randy P. Hessel University of Wisconsin - Madison Salvador M. Aceves and Dan L. Flowers - Lawrence Livermore National Lab ABSTRACT This paper describes
More informationPDF-based simulations of turbulent spray combustion in a constant-volume chamber under diesel-engine-like conditions
International Multidimensional Engine Modeling User s Group Meeting at the SAE Congress Detroit, MI 23 April 2012 PDF-based simulations of turbulent spray combustion in a constant-volume chamber under
More informationMulti-GPU simulations in OpenFOAM with SpeedIT technology.
Multi-GPU simulations in OpenFOAM with SpeedIT technology. Attempt I: SpeedIT GPU-based library of iterative solvers for Sparse Linear Algebra and CFD. Current version: 2.2. Version 1.0 in 2008. CMRS format
More informationME964 High Performance Computing for Engineering Applications
ME964 High Performance Computing for Engineering Applications Outlining Midterm Projects Topic 3: GPU-based FEA Topic 4: GPU Direct Solver for Sparse Linear Algebra March 01, 2011 Dan Negrut, 2011 ME964
More informationInvestigation of mixing chamber for experimental FGD reactor
Investigation of mixing chamber for experimental FGD reactor Jan Novosád 1,a, Petra Danová 1 and Tomáš Vít 1 1 Department of Power Engineering Equipment, Faculty of Mechanical Engineering, Technical University
More informationSpeedup Altair RADIOSS Solvers Using NVIDIA GPU
Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair
More informationS0432 NEW IDEAS FOR MASSIVELY PARALLEL PRECONDITIONERS
S0432 NEW IDEAS FOR MASSIVELY PARALLEL PRECONDITIONERS John R Appleyard Jeremy D Appleyard Polyhedron Software with acknowledgements to Mark A Wakefield Garf Bowen Schlumberger Outline of Talk Reservoir
More informationIntroduction to C omputational F luid Dynamics. D. Murrin
Introduction to C omputational F luid Dynamics D. Murrin Computational fluid dynamics (CFD) is the science of predicting fluid flow, heat transfer, mass transfer, chemical reactions, and related phenomena
More informationGPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC
GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of
More informationCSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller
Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,
More informationGPU-Accelerated Algebraic Multigrid for Commercial Applications. Joe Eaton, Ph.D. Manager, NVAMG CUDA Library NVIDIA
GPU-Accelerated Algebraic Multigrid for Commercial Applications Joe Eaton, Ph.D. Manager, NVAMG CUDA Library NVIDIA ANSYS Fluent 2 Fluent control flow Accelerate this first Non-linear iterations Assemble
More informationFlux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters
Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,
More informationA Comprehensive Study on the Performance of Implicit LS-DYNA
12 th International LS-DYNA Users Conference Computing Technologies(4) A Comprehensive Study on the Performance of Implicit LS-DYNA Yih-Yih Lin Hewlett-Packard Company Abstract This work addresses four
More informationAccelerating the Implicit Integration of Stiff Chemical Systems with Emerging Multi-core Technologies
Accelerating the Implicit Integration of Stiff Chemical Systems with Emerging Multi-core Technologies John C. Linford John Michalakes Manish Vachharajani Adrian Sandu IMAGe TOY 2009 Workshop 2 Virginia
More informationTwo-Phase flows on massively parallel multi-gpu clusters
Two-Phase flows on massively parallel multi-gpu clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn Workshop Programming of Heterogeneous
More informationEfficient Tridiagonal Solvers for ADI methods and Fluid Simulation
Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Nikolai Sakharnykh - NVIDIA San Jose Convention Center, San Jose, CA September 21, 2010 Introduction Tridiagonal solvers very popular
More informationPerformance of Implicit Solver Strategies on GPUs
9. LS-DYNA Forum, Bamberg 2010 IT / Performance Performance of Implicit Solver Strategies on GPUs Prof. Dr. Uli Göhner DYNAmore GmbH Stuttgart, Germany Abstract: The increasing power of GPUs can be used
More informationPerformance of the 3D-Combustion Simulation Code RECOM-AIOLOS on IBM POWER8 Architecture. Alexander Berreth. Markus Bühler, Benedikt Anlauf
PADC Anual Workshop 20 Performance of the 3D-Combustion Simulation Code RECOM-AIOLOS on IBM POWER8 Architecture Alexander Berreth RECOM Services GmbH, Stuttgart Markus Bühler, Benedikt Anlauf IBM Deutschland
More informationAlternative GPU friendly assignment algorithms. Paul Richmond and Peter Heywood Department of Computer Science The University of Sheffield
Alternative GPU friendly assignment algorithms Paul Richmond and Peter Heywood Department of Computer Science The University of Sheffield Graphics Processing Units (GPUs) Context: GPU Performance Accelerated
More informationPerformance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA
Performance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA Kazuhiko Komatsu, S. Momose, Y. Isobe, O. Watanabe, A. Musa, M. Yokokawa, T. Aoyama, M. Sato, H. Kobayashi Tohoku University 14 November,
More informationGTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013
GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS Kyle Spagnoli Research Engineer @ EM Photonics 3/20/2013 INTRODUCTION» Sparse systems» Iterative solvers» High level benchmarks»
More informationOn the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters
1 On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters N. P. Karunadasa & D. N. Ranasinghe University of Colombo School of Computing, Sri Lanka nishantha@opensource.lk, dnr@ucsc.cmb.ac.lk
More informationACCELERATION OF A COMPUTATIONAL FLUID DYNAMICS CODE WITH GPU USING OPENACC
Nonlinear Computational Aeroelasticity Lab ACCELERATION OF A COMPUTATIONAL FLUID DYNAMICS CODE WITH GPU USING OPENACC N I C H O L S O N K. KO U K PA I Z A N P H D. C A N D I D AT E GPU Technology Conference
More informationMAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures
MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures Stan Tomov Innovative Computing Laboratory University of Tennessee, Knoxville OLCF Seminar Series, ORNL June 16, 2010
More informationFOR P3: A monolithic multigrid FEM solver for fluid structure interaction
FOR 493 - P3: A monolithic multigrid FEM solver for fluid structure interaction Stefan Turek 1 Jaroslav Hron 1,2 Hilmar Wobker 1 Mudassar Razzaq 1 1 Institute of Applied Mathematics, TU Dortmund, Germany
More informationAccelerating Implicit LS-DYNA with GPU
Accelerating Implicit LS-DYNA with GPU Yih-Yih Lin Hewlett-Packard Company Abstract A major hindrance to the widespread use of Implicit LS-DYNA is its high compute cost. This paper will show modern GPU,
More informationEfficient Multi-GPU CUDA Linear Solvers for OpenFOAM
Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM Alexander Monakov, amonakov@ispras.ru Institute for System Programming of Russian Academy of Sciences March 20, 2013 1 / 17 Problem Statement In OpenFOAM,
More informationGPU-accelerated data expansion for the Marching Cubes algorithm
GPU-accelerated data expansion for the Marching Cubes algorithm San Jose (CA) September 23rd, 2010 Christopher Dyken, SINTEF Norway Gernot Ziegler, NVIDIA UK Agenda Motivation & Background Data Compaction
More informationCSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University
CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand
More informationMAGMA. Matrix Algebra on GPU and Multicore Architectures
MAGMA Matrix Algebra on GPU and Multicore Architectures Innovative Computing Laboratory Electrical Engineering and Computer Science University of Tennessee Piotr Luszczek (presenter) web.eecs.utk.edu/~luszczek/conf/
More informationStan Posey, CAE Industry Development NVIDIA, Santa Clara, CA, USA
Stan Posey, CAE Industry Development NVIDIA, Santa Clara, CA, USA NVIDIA and HPC Evolution of GPUs Public, based in Santa Clara, CA ~$4B revenue ~5,500 employees Founded in 1999 with primary business in
More informationParallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs
Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs C.-C. Su a, C.-W. Hsieh b, M. R. Smith b, M. C. Jermy c and J.-S. Wu a a Department of Mechanical Engineering, National Chiao Tung
More informationGPU COMPUTING WITH MSC NASTRAN 2013
SESSION TITLE WILL BE COMPLETED BY MSC SOFTWARE GPU COMPUTING WITH MSC NASTRAN 2013 Srinivas Kodiyalam, NVIDIA, Santa Clara, USA THEME Accelerated computing with GPUs SUMMARY Current trends in HPC (High
More informationGeneral Purpose GPU Computing in Partial Wave Analysis
JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data
More informationGPU-optimized computational speed-up for the atmospheric chemistry box model from CAM4-Chem
GPU-optimized computational speed-up for the atmospheric chemistry box model from CAM4-Chem Presenter: Jian Sun Advisor: Joshua S. Fu Collaborator: John B. Drake, Qingzhao Zhu, Azzam Haidar, Mark Gates,
More informationAccelerating GPU computation through mixed-precision methods. Michael Clark Harvard-Smithsonian Center for Astrophysics Harvard University
Accelerating GPU computation through mixed-precision methods Michael Clark Harvard-Smithsonian Center for Astrophysics Harvard University Outline Motivation Truncated Precision using CUDA Solving Linear
More informationWhy HPC for. ANSYS Mechanical and ANSYS CFD?
Why HPC for ANSYS Mechanical and ANSYS CFD? 1 HPC Defined High Performance Computing (HPC) at ANSYS: An ongoing effort designed to remove computing limitations from engineers who use computer aided engineering
More informationTowards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers
Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,
More informationA GPU Sparse Direct Solver for AX=B
1 / 25 A GPU Sparse Direct Solver for AX=B Jonathan Hogg, Evgueni Ovtchinnikov, Jennifer Scott* STFC Rutherford Appleton Laboratory 26 March 2014 GPU Technology Conference San Jose, California * Thanks
More informationANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation
ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent
More informationEfficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs
Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,
More informationG P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G
Joined Advanced Student School (JASS) 2009 March 29 - April 7, 2009 St. Petersburg, Russia G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G Dmitry Puzyrev St. Petersburg State University Faculty
More informationHPC with GPU and its applications from Inspur. Haibo Xie, Ph.D
HPC with GPU and its applications from Inspur Haibo Xie, Ph.D xiehb@inspur.com 2 Agenda I. HPC with GPU II. YITIAN solution and application 3 New Moore s Law 4 HPC? HPC stands for High Heterogeneous Performance
More informationClick to edit Master title style
Click to edit Master title style LES LES Applications for for Internal Internal Combustion Engines Engines David Gosman & Richard Johns CD-adapco, June 2011 Some Qs and As Why would we use LES calculations
More informationFirst Steps of YALES2 Code Towards GPU Acceleration on Standard and Prototype Cluster
First Steps of YALES2 Code Towards GPU Acceleration on Standard and Prototype Cluster YALES2: Semi-industrial code for turbulent combustion and flows Jean-Matthieu Etancelin, ROMEO, NVIDIA GPU Application
More informationTitan - Early Experience with the Titan System at Oak Ridge National Laboratory
Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid
More informationGeoImaging Accelerator Pansharpen Test Results. Executive Summary
Executive Summary After demonstrating the exceptional performance improvement in the orthorectification module (approximately fourteen-fold see GXL Ortho Performance Whitepaper), the same approach has
More informationSENSEI / SENSEI-Lite / SENEI-LDC Updates
SENSEI / SENSEI-Lite / SENEI-LDC Updates Chris Roy and Brent Pickering Aerospace and Ocean Engineering Dept. Virginia Tech July 23, 2014 Collaborations with Math Collaboration on the implicit SENSEI-LDC
More informationFlux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters
Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,
More informationCMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)
CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can
More informationIMPORTANCE OF INTERNAL FLOW AND
IMPORTANCE OF INTERNAL FLOW AND GEOMETRY MODELLING IN THE GM 1.9L LIGHT DUTY ENGINE F. Perini a, P. C. Miles b, R. D. Reitz a a University of Wisconsin-Madison b Sandia National Laboratories ERC Seminar
More informationStructure-preserving Smoothing for Seismic Amplitude Data by Anisotropic Diffusion using GPGPU
GPU Technology Conference 2016 April, 4-7 San Jose, CA, USA Structure-preserving Smoothing for Seismic Amplitude Data by Anisotropic Diffusion using GPGPU Joner Duarte jduartejr@tecgraf.puc-rio.br Outline
More informationFaster Innovation - Accelerating SIMULIA Abaqus Simulations with NVIDIA GPUs. Baskar Rajagopalan Accelerated Computing, NVIDIA
Faster Innovation - Accelerating SIMULIA Abaqus Simulations with NVIDIA GPUs Baskar Rajagopalan Accelerated Computing, NVIDIA 1 Engineering & IT Challenges/Trends NVIDIA GPU Solutions AGENDA Abaqus GPU
More informationComparing Hybrid CPU-GPU and Native GPU-only Acceleration for Linear Algebra. Mark Gates, Stan Tomov, Azzam Haidar SIAM LA Oct 29, 2015
Comparing Hybrid CPU-GPU and Native GPU-only Acceleration for Linear Algebra Mark Gates, Stan Tomov, Azzam Haidar SIAM LA Oct 29, 2015 Overview Dense linear algebra algorithms Hybrid CPU GPU implementation
More informationVisual Analysis of Lagrangian Particle Data from Combustion Simulations
Visual Analysis of Lagrangian Particle Data from Combustion Simulations Hongfeng Yu Sandia National Laboratories, CA Ultrascale Visualization Workshop, SC11 Nov 13 2011, Seattle, WA Joint work with Jishang
More informationACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016
ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016 Challenges What is Algebraic Multi-Grid (AMG)? AGENDA Why use AMG? When to use AMG? NVIDIA AmgX Results 2
More informationApplying Solution-Adaptive Mesh Refinement in Engine Simulations
International Multidimensional Engine Modeling User's Group Meeting April 11, 2016, Detroit, Michigan Applying Solution-Adaptive Mesh Refinement in Engine Simulations Long Liang, Yue Wang, Anthony Shelburn,
More informationOptimizing Data Locality for Iterative Matrix Solvers on CUDA
Optimizing Data Locality for Iterative Matrix Solvers on CUDA Raymond Flagg, Jason Monk, Yifeng Zhu PhD., Bruce Segee PhD. Department of Electrical and Computer Engineering, University of Maine, Orono,
More informationConcurrent execution of an analytical workload on a POWER8 server with K40 GPUs A Technology Demonstration
Concurrent execution of an analytical workload on a POWER8 server with K40 GPUs A Technology Demonstration Sina Meraji sinamera@ca.ibm.com Berni Schiefer schiefer@ca.ibm.com Tuesday March 17th at 12:00
More informationDirected Optimization On Stencil-based Computational Fluid Dynamics Application(s)
Directed Optimization On Stencil-based Computational Fluid Dynamics Application(s) Islam Harb 08/21/2015 Agenda Motivation Research Challenges Contributions & Approach Results Conclusion Future Work 2
More informationGPGPU Applications. for Hydrological and Atmospheric Simulations. and Visualizations on the Web. Ibrahim Demir
GPGPU Applications for Hydrological and Atmospheric Simulations and Visualizations on the Web Ibrahim Demir Big Data We are collecting and generating data on a petabyte scale (1Pb = 1,000 Tb = 1M Gb) Data
More informationWilliam Yang Group 14 Mentor: Dr. Rogerio Richa Visual Tracking of Surgical Tools in Retinal Surgery using Particle Filtering
Mutual Information Computation and Maximization Using GPU Yuping Lin and Gérard Medioni Computer Vision and Pattern Recognition Workshops (CVPR) Anchorage, AK, pp. 1-6, June 2008 Project Summary and Paper
More informationBuilding NVLink for Developers
Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized
More informationAutomated Finite Element Computations in the FEniCS Framework using GPUs
Automated Finite Element Computations in the FEniCS Framework using GPUs Florian Rathgeber (f.rathgeber10@imperial.ac.uk) Advanced Modelling and Computation Group (AMCG) Department of Earth Science & Engineering
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationTR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut
TR-2014-17 An Overview of NVIDIA Tegra K1 Architecture Ang Li, Radu Serban, Dan Negrut November 20, 2014 Abstract This paperwork gives an overview of NVIDIA s Jetson TK1 Development Kit and its Tegra K1
More informationNVIDIA Update and Directions on GPU Acceleration for Earth System Models
NVIDIA Update and Directions on GPU Acceleration for Earth System Models Stan Posey, HPC Program Manager, ESM and CFD, NVIDIA, Santa Clara, CA, USA Carl Ponder, PhD, Applications Software Engineer, NVIDIA,
More informationPerformance Benefits of NVIDIA GPUs for LS-DYNA
Performance Benefits of NVIDIA GPUs for LS-DYNA Mr. Stan Posey and Dr. Srinivas Kodiyalam NVIDIA Corporation, Santa Clara, CA, USA Summary: This work examines the performance characteristics of LS-DYNA
More informationA GPU-based Approximate SVD Algorithm Blake Foster, Sridhar Mahadevan, Rui Wang
A GPU-based Approximate SVD Algorithm Blake Foster, Sridhar Mahadevan, Rui Wang University of Massachusetts Amherst Introduction Singular Value Decomposition (SVD) A: m n matrix (m n) U, V: orthogonal
More informationThe Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System
The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System Alan Humphrey, Qingyu Meng, Martin Berzins Scientific Computing and Imaging Institute & University of Utah I. Uintah Overview
More informationMulti-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation
Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation 1 Cheng-Han Du* I-Hsin Chung** Weichung Wang* * I n s t i t u t e o f A p p l i e d M
More informationApplications of Berkeley s Dwarfs on Nvidia GPUs
Applications of Berkeley s Dwarfs on Nvidia GPUs Seminar: Topics in High-Performance and Scientific Computing Team N2: Yang Zhang, Haiqing Wang 05.02.2015 Overview CUDA The Dwarfs Dynamic Programming Sparse
More informationTurbulent Premixed Combustion with Flamelet Generated Manifolds in COMSOL Multiphysics
Turbulent Premixed Combustion with Flamelet Generated Manifolds in COMSOL Multiphysics Rob J.M Bastiaans* Eindhoven University of Technology *Corresponding author: PO box 512, 5600 MB, Eindhoven, r.j.m.bastiaans@tue.nl
More informationIntroduction to Computational Fluid Dynamics Mech 122 D. Fabris, K. Lynch, D. Rich
Introduction to Computational Fluid Dynamics Mech 122 D. Fabris, K. Lynch, D. Rich 1 Computational Fluid dynamics Computational fluid dynamics (CFD) is the analysis of systems involving fluid flow, heat
More informationFPGA-based Supercomputing: New Opportunities and Challenges
FPGA-based Supercomputing: New Opportunities and Challenges Naoya Maruyama (RIKEN AICS)* 5 th ADAC Workshop Feb 15, 2018 * Current Main affiliation is Lawrence Livermore National Laboratory SIAM PP18:
More informationPorting Scalable Parallel CFD Application HiFUN on NVIDIA GPU
Porting Scalable Parallel CFD Application NVIDIA D. V., N. Munikrishna, Nikhil Vijay Shende 1 N. Balakrishnan 2 Thejaswi Rao 3 1. S & I Engineering Solutions Pvt. Ltd., Bangalore, India 2. Aerospace Engineering,
More informationICE Roadmap Japanese STAR Conference. Richard Johns
ICE Roadmap Japanese STAR Conference Richard Johns Introduction Top-Level Roadmap STAR-CCM+ and Internal Combustion Engines Modeling Improvements and Research Support Sprays LES Chemistry Meshing Summary
More informationcuibm A GPU Accelerated Immersed Boundary Method
cuibm A GPU Accelerated Immersed Boundary Method S. K. Layton, A. Krishnan and L. A. Barba Corresponding author: labarba@bu.edu Department of Mechanical Engineering, Boston University, Boston, MA, 225,
More informationApplication of GPU technology to OpenFOAM simulations
Application of GPU technology to OpenFOAM simulations Jakub Poła, Andrzej Kosior, Łukasz Miroslaw jakub.pola@vratis.com, www.vratis.com Wroclaw, Poland Agenda Motivation Partial acceleration SpeedIT OpenFOAM
More informationDIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka
USE OF FOR Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka Faculty of Nuclear Sciences and Physical Engineering Czech Technical University in Prague Mini workshop on advanced numerical methods
More informationApplication of GPU-Based Computing to Large Scale Finite Element Analysis of Three-Dimensional Structures
Paper 6 Civil-Comp Press, 2012 Proceedings of the Eighth International Conference on Engineering Computational Technology, B.H.V. Topping, (Editor), Civil-Comp Press, Stirlingshire, Scotland Application
More informationLoad Balancing and Data Migration in a Hybrid Computational Fluid Dynamics Application
Load Balancing and Data Migration in a Hybrid Computational Fluid Dynamics Application Esteban Meneses Patrick Pisciuneri Center for Simulation and Modeling (SaM) University of Pittsburgh University of
More informationDebunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU
Debunking the 100X GPU vs CPU Myth: An Evaluation of Throughput Computing on CPU and GPU The myth 10x-1000x speed up on GPU vs CPU Papers supporting the myth: Microsoft: N. K. Govindaraju, B. Lloyd, Y.
More informationAccelerated ANSYS Fluent: Algebraic Multigrid on a GPU. Robert Strzodka NVAMG Project Lead
Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU Robert Strzodka NVAMG Project Lead A Parallel Success Story in Five Steps 2 Step 1: Understand Application ANSYS Fluent Computational Fluid Dynamics
More informationOn the limits of (and opportunities for?) GPU acceleration
On the limits of (and opportunities for?) GPU acceleration Aparna Chandramowlishwaran, Jee Choi, Kenneth Czechowski, Murat (Efe) Guney, Logan Moon, Aashay Shringarpure, Richard (Rich) Vuduc HotPar 10,
More informationJ. Blair Perot. Ali Khajeh-Saeed. Software Engineer CD-adapco. Mechanical Engineering UMASS, Amherst
Ali Khajeh-Saeed Software Engineer CD-adapco J. Blair Perot Mechanical Engineering UMASS, Amherst Supercomputers Optimization Stream Benchmark Stag++ (3D Incompressible Flow Code) Matrix Multiply Function
More informationCSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance
More informationGPU Acceleration of Matrix Algebra. Dr. Ronald C. Young Multipath Corporation. fmslib.com
GPU Acceleration of Matrix Algebra Dr. Ronald C. Young Multipath Corporation FMS Performance History Machine Year Flops DEC VAX 1978 97,000 FPS 164 1982 11,000,000 FPS 164-MAX 1985 341,000,000 DEC VAX
More informationGPU-Accelerated Parallel Sparse LU Factorization Method for Fast Circuit Analysis
GPU-Accelerated Parallel Sparse LU Factorization Method for Fast Circuit Analysis Abstract: Lower upper (LU) factorization for sparse matrices is the most important computing step for circuit simulation
More informationParallel Interpolation in FSI Problems Using Radial Basis Functions and Problem Size Reduction
Parallel Interpolation in FSI Problems Using Radial Basis Functions and Problem Size Reduction Sergey Kopysov, Igor Kuzmin, Alexander Novikov, Nikita Nedozhogin, and Leonid Tonkov Institute of Mechanics,
More informationObject-Oriented CFD Solver Design
Object-Oriented CFD Solver Design Hrvoje Jasak h.jasak@wikki.co.uk Wikki Ltd. United Kingdom 10/Mar2005 Object-Oriented CFD Solver Design p.1/29 Outline Objective Present new approach to software design
More informationModeling of a DaimlerChrysler Truck Engine using an Eulerian Spray Model
Modeling of a DaimlerChrysler Truck Engine using an Eulerian Spray Model C. Hasse, S. Vogel, N. Peters Institut für Technische Mechanik RWTH Aachen Templergraben 64 52056 Aachen Germany c.hasse@itm.rwth-aachen.de
More informationANSYS HPC. Technology Leadership. Barbara Hutchings ANSYS, Inc. September 20, 2011
ANSYS HPC Technology Leadership Barbara Hutchings barbara.hutchings@ansys.com 1 ANSYS, Inc. September 20, Why ANSYS Users Need HPC Insight you can t get any other way HPC enables high-fidelity Include
More informationFast Interactive Sand Simulation for Gesture Tracking systems Shrenik Lad
Fast Interactive Sand Simulation for Gesture Tracking systems Shrenik Lad Project Guide : Vivek Mehta, Anup Tapadia TouchMagix media labs TouchMagix www.touchmagix.com Interactive display solutions Interactive
More informationHigh-Performance Reservoir Simulations on Modern CPU-GPU Computational Platforms Abstract Introduction
High-Performance Reservoir Simulations on Modern CPU-GPU Computational Platforms K. Bogachev 1, S. Milyutin 1, A. Telishev 1, V. Nazarov 1, V. Shelkov 1, D. Eydinov 1, O. Malinur 2, S. Hinneh 2 ; 1. Rock
More information