A Peta-scale LES (Large-Eddy Simulation) for Turbulent Flows Based on Lattice Boltzmann Method

Size: px
Start display at page:

Download "A Peta-scale LES (Large-Eddy Simulation) for Turbulent Flows Based on Lattice Boltzmann Method"

Transcription

1 GTC (GPU Technology Conference) 2013, San Jose, 2013, March 20 A Peta-scale LES (Large-Eddy Simulation) for Turbulent Flows Based on Lattice Boltzmann Method Takayuki Aoki Global Scientific Information and Computing Center (GSIC) Tokyo Institute of Technology Copyright Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology 1

2 TSUBAME 2.0 Compute Node (3 Tesla M2050 GPUs) Rack (30 nodes) Performance: 51.0 TFLOPS Memory: 2.03 TB System (58 racks) 1442 nodes: 2952 CPU sockets, 4264 GPUs Performance: TFLOPS (CPU) Turbo boost 2196 TFLOPS (GPU) Total: 2420 TFLOPS Performance: 1.7 TFLOPS Memory: 58.0GB(CPU) +9.7GB(GPU) Copyright Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology

3 TSUBAME Supercomputer 2013 Q3 or Q4 All the GPU will be replaced by new accelerators TSUBAME 2.5 will have PFlops In single precision Performance. Copyright Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology 3

4 Drop on dry floor

5 5

6 Industrial Appl. Steering Oil 6

7 Development of New Materials Mechanical Structure Microstructure Low-carbon society Improvement of fuel efficiency by reducing the weight of transportation and mechanical structures Developing lightweight strengthening material by controlling microstructure Copyright Global Scientific Information and Computing Center, Tokyo Institute of Technology

8 Copyright Global Scientific Information and Computing Center, Tokyo Institute of Technology

9 Weather News Copyright Global Scientific Information and Computing Center, Tokyo Institute of Technology

10 Full GPU Implementation: ASUCA Full GPU Approach CPU Initial condition GPU Dynamics Physics output J. Ishida, C. Muroi, K. Kawano, Y. Kitamura, Development of a new nonhydrostatic model ASUCA at JMA, CAS/JSC WGNE Reserch Activities in Atomospheric and Oceanic Modelling. Copyright Global Scientific Information and Computing Center, Tokyo Institute of Technology 10

11 ASUCA Typhoon Simulation 500m-horizontal resolution Using 437 GPUs Copyright Global Scientific Information and Computing Center, Tokyo Institute of Technology 11

12 Air Flow in a 10km x 10km Area of Tokyo 10km 10km TDM 3D 2012 Google, ZENRIN Copyright Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology 12

13 Lattice Boltzmann Method f t i e i f i 1 f i f eq i f eq i w i 1 3 c c 3 2 c 2 u e u u u e i 4 i 2 Strongly Memory Bound Problem: Collision step: Streaming step: i is the value in the direction of ith discrete velocity e i is the discrete velocity set; w i is the weighting factor c is the particle velocity u is the macroscopic velocity Copyright Global Scientific Information and Computing Center, Tokyo Institute of Technology 13

14 LES (Large-Eddy Simulation) Relaxation time for LES model Energy spectrum Molecular viscosity and Eddy viscosity GS SGS Copyright Global Scientific Information and Computing Center, Tokyo Institute of Technology

15 LES modeling Simple inaccurate for the flow with wall boundary emperical tuning for the constant model coefficient Copyright Global Scientific Information and Computing Center, Tokyo Institute of Technology applicable to wall boundary complicated calculation average process over the wide area not available for complex shaped body not suitable for large-scale problem *H.Kobayashi, Phys. Fluids.17, (2005). model coefficient determined by the second invariant of the velocity gradient tensor model coefficient applicable to wall boundary model coefficient is locally determined.

16 LES modeling on LBM Turbulence model : Molecular viscosity + eddy viscosity Smagorinsky model subgrid closure C S = 0.22 Copyright Global Scientific Information and Computing Center, Tokyo Institute of Technology

17 Coherent-structure SGS model Dynamic Smagorinsky model (DSM) DSM requires to take an average operation for a wide area to determine the model parameter. Automatically determine model coefficient Turbulent flow around a complex object Computational efficiency is poor : average operation Coherent-structure Smagorinsky model The model parameter is locally determined by the second invariant of the velocity gradient tensor. Second invariant of the velocity gradient tensor(q) and Energy dissipation(ε) Turbulent flow around a complex object Large-scale parallel computation Copyright Global Scientific Information and Computing Center, Tokyo Institute of Technology

18 Computational Area Major part of Tokyo Including Shnjuku-ku, Chiyoda-ku, Minato-ku, Meguro-ku, Chuou-ku, 10km 10km Building Data: Pasco Co. Ltd. TDM 3D Shinjyuku Shibuya Shinagawa Tokyo Copyright Global Scientific Information and Computing Center, Tokyo Institute of Technology Map 2012 Google, ZENRIN 18

19 Copyright Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology

20 Copyright Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology

21 Area Around Metropolitan Government Building Wind Flow profile at the 25m height on the ground 960 m 640 m 2012 Google, ZENRIN Copyright Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology 21

22 Copyright Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology 22

23 Copyright Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology

24 Copyright Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology 24

25 Copyright Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology 25

26 Copyright Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology 26

27 Performance of the GPU code Performance estimation by using Improved Roofline Model CUDA Programing Tuning Using SFU (Special Function Unit) and single precision computation Kernel fusion of the collision step and streaming step Loop unrolling to save resister usage Reduction of the address calculation by use of a 32-bit compile option 32bit compile 198 GFlops(efficiency 92%) 310 MLUPS (Mega Lattice site Updates /sec) 64bit compile 183 GFlops(efficiency 88%) Copyright Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology

28 Performance (Strong Scalability) For the fixed problem size, the performances are shown with increasing the number of GPUs. By introducing the overlapping technique, the performance is improved up to 30%. It is found that the elapsed time is shorted by increasing GPUs. Copyright Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology 28

29 Performance (Weak Scalability) 600 TFLOPS on 4000 GPUs 15 % of the peak performance Copyright Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology 29

30 Turbulent Flow behind football Re = 100,000 Mesh: 2000x1000x1000 Copyright Takayuki Aoki / Global Scientific Information and Computing Center, Tokyo Institute of Technology 30

31 DriVar: BMW-Audi 3,000x1,500x1,500 Re = 1,000,000

32

33

34

35 SUMMARY Lattice Boltzmann LES turbulent simulation has been successfully conducted with 1-m resolution for 10km x 10km area by using the whole TSUBAME 2.0 resource. Coherent-Structure Smagorinsky model works well in association with LBM. The performance of 15% has been achieved on TSUBAME 2.0. Copyright Global Scientific Information and Computing Center, Tokyo Institute of Technology 35

36 Thank you for your kind attention Copyright Global Scientific Information and Computing Center, Tokyo Institute of Technology 36

LARGE-SCALE FREE-SURFACE FLOW SIMULATION USING LATTICE BOLTZMANN METHOD ON MULTI-GPU CLUSTERS

LARGE-SCALE FREE-SURFACE FLOW SIMULATION USING LATTICE BOLTZMANN METHOD ON MULTI-GPU CLUSTERS ECCOMAS Congress 2016 VII European Congress on Computational Methods in Applied Sciences and Engineering M. Papadrakakis, V. Papadopoulos, G. Stefanou, V. Plevris (eds.) Crete Island, Greece, 5 10 June

More information

Beyond Peta-scale on Stencil and Particle-based GPU Applications

Beyond Peta-scale on Stencil and Particle-based GPU Applications SPPEXA annual meeting, January 6, 06, TUM Beyond Peta-scale on Stencil and Particle-based GPU Applications Takayuki Aoki Global Scientific Information and Computing Center Tokyo Institute of Technology

More information

International Supercomputing Conference 2009

International Supercomputing Conference 2009 International Supercomputing Conference 2009 Implementation of a Lattice-Boltzmann-Method for Numerical Fluid Mechanics Using the nvidia CUDA Technology E. Riegel, T. Indinger, N.A. Adams Technische Universität

More information

Software and Performance Engineering for numerical codes on GPU clusters

Software and Performance Engineering for numerical codes on GPU clusters Software and Performance Engineering for numerical codes on GPU clusters H. Köstler International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering Harbin, China 28.7.2010 2 3

More information

The next generation supercomputer. Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency

The next generation supercomputer. Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency The next generation supercomputer and NWP system of JMA Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency Contents JMA supercomputer systems Current system (Mar

More information

High Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters

High Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters SIAM PP 2014 High Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters C. Riesinger, A. Bakhtiari, M. Schreiber Technische Universität München February 20, 2014

More information

LATTICE-BOLTZMANN AND COMPUTATIONAL FLUID DYNAMICS

LATTICE-BOLTZMANN AND COMPUTATIONAL FLUID DYNAMICS LATTICE-BOLTZMANN AND COMPUTATIONAL FLUID DYNAMICS NAVIER-STOKES EQUATIONS u t + u u + 1 ρ p = Ԧg + ν u u=0 WHAT IS COMPUTATIONAL FLUID DYNAMICS? Branch of Fluid Dynamics which uses computer power to approximate

More information

Ab initio NMR Chemical Shift Calculations for Biomolecular Systems Using Fragment Molecular Orbital Method

Ab initio NMR Chemical Shift Calculations for Biomolecular Systems Using Fragment Molecular Orbital Method 4 Ab initio NMR Chemical Shift Calculations for Biomolecular Systems Using Fragment Molecular Orbital Method A Large-scale Two-phase Flow Simulation Evolutive Image/ Video Coding with Massively Parallel

More information

MASSIVELY-PARALLEL MULTI-GPU SIMULATIONS FOR FAST AND ACCURATE AUTOMOTIVE AERODYNAMICS

MASSIVELY-PARALLEL MULTI-GPU SIMULATIONS FOR FAST AND ACCURATE AUTOMOTIVE AERODYNAMICS 6th European Conference on Computational Mechanics (ECCM 6) 7th European Conference on Computational Fluid Dynamics (ECFD 7) -5 June 28, Glasgow, UK MASSIVELY-PARALLEL MULTI-GPU SIMULATIONS FOR FAST AND

More information

A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC

A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC Hisashi YASHIRO RIKEN Advanced Institute of Computational Science Kobe, Japan My topic The study for Cloud computing My topic

More information

Possibility of Implicit LES for Two-Dimensional Incompressible Lid-Driven Cavity Flow Based on COMSOL Multiphysics

Possibility of Implicit LES for Two-Dimensional Incompressible Lid-Driven Cavity Flow Based on COMSOL Multiphysics Possibility of Implicit LES for Two-Dimensional Incompressible Lid-Driven Cavity Flow Based on COMSOL Multiphysics Masanori Hashiguchi 1 1 Keisoku Engineering System Co., Ltd. 1-9-5 Uchikanda, Chiyoda-ku,

More information

From Notebooks to Supercomputers: Tap the Full Potential of Your CUDA Resources with LibGeoDecomp

From Notebooks to Supercomputers: Tap the Full Potential of Your CUDA Resources with LibGeoDecomp From Notebooks to Supercomputers: Tap the Full Potential of Your CUDA Resources with andreas.schaefer@cs.fau.de Friedrich-Alexander-Universität Erlangen-Nürnberg GPU Technology Conference 2013, San José,

More information

Sailfish: Lattice Boltzmann Fluid Simulations with GPUs and Python

Sailfish: Lattice Boltzmann Fluid Simulations with GPUs and Python Sailfish: Lattice Boltzmann Fluid Simulations with GPUs and Python Micha l Januszewski Institute of Physics University of Silesia in Katowice, Poland Google GTC 2012 M. Januszewski (IoP, US) Sailfish:

More information

HPC Application Porting to CUDA at BSC

HPC Application Porting to CUDA at BSC www.bsc.es HPC Application Porting to CUDA at BSC Pau Farré, Marc Jordà GTC 2016 - San Jose Agenda WARIS-Transport Atmospheric volcanic ash transport simulation Computer Applications department PELE Protein-drug

More information

Dynamic Mode Decomposition analysis of flow fields from CFD Simulations

Dynamic Mode Decomposition analysis of flow fields from CFD Simulations Dynamic Mode Decomposition analysis of flow fields from CFD Simulations Technische Universität München Thomas Indinger Lukas Haag, Daiki Matsumoto, Christoph Niedermeier in collaboration with Agenda Motivation

More information

XFlow HIGH FIDELITY COMPUTATIONAL FLUID DYNAMICS

XFlow HIGH FIDELITY COMPUTATIONAL FLUID DYNAMICS XFlow HIGH FIDELITY COMPUTATIONAL FLUID DYNAMICS XFlow OVERVIEW In the traditional mesh-based approach to solving Computational Fluid Dynamics (CFD) problems, reliability is highly dependent on the quality

More information

An 80-Fold Speedup, 15.0 TFlops Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code

An 80-Fold Speedup, 15.0 TFlops Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code An 80-Fold Speedup, 15.0 TFlops Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code Takashi Shimokawabe, Takayuki Aoki, Chiashi Muroi, Junichi Ishida, Kohei Kawano, Toshio Endo,

More information

High Performance Computing

High Performance Computing High Performance Computing ADVANCED SCIENTIFIC COMPUTING Dr. Ing. Morris Riedel Adjunct Associated Professor School of Engineering and Natural Sciences, University of Iceland Research Group Leader, Juelich

More information

HPC with GPU and its applications from Inspur. Haibo Xie, Ph.D

HPC with GPU and its applications from Inspur. Haibo Xie, Ph.D HPC with GPU and its applications from Inspur Haibo Xie, Ph.D xiehb@inspur.com 2 Agenda I. HPC with GPU II. YITIAN solution and application 3 New Moore s Law 4 HPC? HPC stands for High Heterogeneous Performance

More information

Aeroacoustic computations with a new CFD solver based on the Lattice Boltzmann Method

Aeroacoustic computations with a new CFD solver based on the Lattice Boltzmann Method Aeroacoustic computations with a new CFD solver based on the Lattice Boltzmann Method D. Ricot 1, E. Foquet 2, H. Touil 3, E. Lévêque 3, H. Machrouki 4, F. Chevillotte 5, M. Meldi 6 1: Renault 2: CS 3:

More information

Particleworks: Particle-based CAE Software fully ported to GPU

Particleworks: Particle-based CAE Software fully ported to GPU Particleworks: Particle-based CAE Software fully ported to GPU Introduction PrometechVideo_v3.2.3.wmv 3.5 min. Particleworks Why the particle method? Existing methods FEM, FVM, FLIP, Fluid calculation

More information

Achieving Portable Performance for GTC-P with OpenACC on GPU, multi-core CPU, and Sunway Many-core Processor

Achieving Portable Performance for GTC-P with OpenACC on GPU, multi-core CPU, and Sunway Many-core Processor Achieving Portable Performance for GTC-P with OpenACC on GPU, multi-core CPU, and Sunway Many-core Processor Stephen Wang 1, James Lin 1,4, William Tang 2, Stephane Ethier 2, Bei Wang 2, Simon See 1,3

More information

High-productivity Framework for Large-scale GPU/CPU Stencil Applications

High-productivity Framework for Large-scale GPU/CPU Stencil Applications Procedia Computer Science Volume 80, 2016, Pages 1646 1657 ICCS 2016. The International Conference on Computational Science High-productivity Framework for Large-scale GPU/CPU Stencil Applications Takashi

More information

Reproducibility of Complex Turbulent Flow Using Commercially-Available CFD Software

Reproducibility of Complex Turbulent Flow Using Commercially-Available CFD Software Reports of Research Institute for Applied Mechanics, Kyushu University, No.150 (60-70) March 2016 Reproducibility of Complex Turbulent Flow Using Commercially-Available CFD Software Report 2: For the Case

More information

Analysis of Performance Gap Between OpenACC and the Native Approach on P100 GPU and SW26010: A Case Study with GTC-P

Analysis of Performance Gap Between OpenACC and the Native Approach on P100 GPU and SW26010: A Case Study with GTC-P Analysis of Performance Gap Between OpenACC and the Native Approach on P100 GPU and SW26010: A Case Study with GTC-P Stephen Wang 1, James Lin 1, William Tang 2, Stephane Ethier 2, Bei Wang 2, Simon See

More information

Reproducibility of Complex Turbulent Flow Using Commercially-Available CFD Software

Reproducibility of Complex Turbulent Flow Using Commercially-Available CFD Software Reports of Research Institute for Applied Mechanics, Kyushu University No.150 (71 83) March 2016 Reproducibility of Complex Turbulent Flow Using Commercially-Available CFD Software Report 3: For the Case

More information

GREYBLS: modelling GREY-zone Boundary LayerS

GREYBLS: modelling GREY-zone Boundary LayerS GREYBLS: modelling GREY-zone Boundary LayerS Bob Beare, Bob Plant, Omduth Coceal, John Thuburn, Adrian Lock, Humphrey Lean 25 Sept 2013 Introduction NWP at grid lengths 2 km - 100 m now possible. Introduction

More information

Computational Fluid Dynamics PRODUCT SHEET

Computational Fluid Dynamics PRODUCT SHEET TM 2014 Computational Fluid Dynamics PRODUCT SHEET 1 Breaking Limitations The Challenge of Traditional CFD In the traditional mesh-based approach, the reliability highly depends on the quality of the mesh,

More information

Mapping MPI+X Applications to Multi-GPU Architectures

Mapping MPI+X Applications to Multi-GPU Architectures Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under

More information

Next-generation CFD: Real-Time Computation and Visualization

Next-generation CFD: Real-Time Computation and Visualization Next-generation CFD: Real-Time Computation and Visualization Christian F. Janßen Hamburg University of Technology Tesla C1060, ~20 million lattice nodes [2010] Kinetic approaches for the simulation of

More information

The walberla Framework: Multi-physics Simulations on Heterogeneous Parallel Platforms

The walberla Framework: Multi-physics Simulations on Heterogeneous Parallel Platforms The walberla Framework: Multi-physics Simulations on Heterogeneous Parallel Platforms Harald Köstler, Uli Rüde (LSS Erlangen, ruede@cs.fau.de) Lehrstuhl für Simulation Universität Erlangen-Nürnberg www10.informatik.uni-erlangen.de

More information

Lattice Boltzmann with CUDA

Lattice Boltzmann with CUDA Lattice Boltzmann with CUDA Lan Shi, Li Yi & Liyuan Zhang Hauptseminar: Multicore Architectures and Programming Page 1 Outline Overview of LBM An usage of LBM Algorithm Implementation in CUDA and Optimization

More information

(LSS Erlangen, Simon Bogner, Ulrich Rüde, Thomas Pohl, Nils Thürey in collaboration with many more

(LSS Erlangen, Simon Bogner, Ulrich Rüde, Thomas Pohl, Nils Thürey in collaboration with many more Parallel Free-Surface Extension of the Lattice-Boltzmann Method A Lattice-Boltzmann Approach for Simulation of Two-Phase Flows Stefan Donath (LSS Erlangen, stefan.donath@informatik.uni-erlangen.de) Simon

More information

NVIDIA GPUs in Earth System Modelling Thomas Bradley

NVIDIA GPUs in Earth System Modelling Thomas Bradley NVIDIA GPUs in Earth System Modelling Thomas Bradley Agenda: GPU Developments for CWO Motivation for GPUs in CWO Parallelisation Considerations GPU Technology Roadmap MOTIVATION FOR GPUS IN CWO NVIDIA

More information

simulation framework for piecewise regular grids

simulation framework for piecewise regular grids WALBERLA, an ultra-scalable multiphysics simulation framework for piecewise regular grids ParCo 2015, Edinburgh September 3rd, 2015 Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler

More information

Reproducibility of Complex Turbulent Flow Using Commercially-Available CFD Software

Reproducibility of Complex Turbulent Flow Using Commercially-Available CFD Software Reports of Research Institute for Applied Mechanics, Kyushu University No.150 (47 59) March 2016 Reproducibility of Complex Turbulent Using Commercially-Available CFD Software Report 1: For the Case of

More information

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620 Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved

More information

HIGH PERFORMANCE COMPUTATION (HPC) FOR THE

HIGH PERFORMANCE COMPUTATION (HPC) FOR THE HIGH PERFORMANCE COMPUTATION (HPC) FOR THE DEVELOPMENT OF FLUIDIZED BED TECHNOLOGIES FOR BIOMASS GASIFICATION AND CO2 CAPTURE P. Fede, H. Neau, O. Simonin Université de Toulouse; INPT, UPS ; IMFT ; 31400

More information

Realizing Out of Core Stencil Computations using Multi Tier Memory Hierarchy on GPGPU Clusters

Realizing Out of Core Stencil Computations using Multi Tier Memory Hierarchy on GPGPU Clusters Realizing Out of Core Stencil Computations using Multi Tier Memory Hierarchy on GPGPU Clusters ~ Towards Extremely Big & Fast Simulations ~ Toshio Endo GSIC, Tokyo Institute of Technology ( 東京工業大学 ) Stencil

More information

3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA

3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA 3D ADI Method for Fluid Simulation on Multiple GPUs Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA Introduction Fluid simulation using direct numerical methods Gives the most accurate result Requires

More information

Performance and Accuracy of Lattice-Boltzmann Kernels on Multi- and Manycore Architectures

Performance and Accuracy of Lattice-Boltzmann Kernels on Multi- and Manycore Architectures Performance and Accuracy of Lattice-Boltzmann Kernels on Multi- and Manycore Architectures Dirk Ribbrock, Markus Geveler, Dominik Göddeke, Stefan Turek Angewandte Mathematik, Technische Universität Dortmund

More information

Transport Simulations beyond Petascale. Jing Fu (ANL)

Transport Simulations beyond Petascale. Jing Fu (ANL) Transport Simulations beyond Petascale Jing Fu (ANL) A) Project Overview The project: Peta- and exascale algorithms and software development (petascalable codes: Nek5000, NekCEM, NekLBM) Science goals:

More information

Numerical Algorithms on Multi-GPU Architectures

Numerical Algorithms on Multi-GPU Architectures Numerical Algorithms on Multi-GPU Architectures Dr.-Ing. Harald Köstler 2 nd International Workshops on Advances in Computational Mechanics Yokohama, Japan 30.3.2010 2 3 Contents Motivation: Applications

More information

Mathematical computations with GPUs

Mathematical computations with GPUs Master Educational Program Information technology in applications Mathematical computations with GPUs Introduction Alexey A. Romanenko arom@ccfit.nsu.ru Novosibirsk State University How to.. Process terabytes

More information

Computational Science and Engineering (Int. Master s Program)

Computational Science and Engineering (Int. Master s Program) Computational Science and Engineering (Int. Master s Program) Technische Universität München Master s Thesis MPI Parallelization of GPU-based Lattice Boltzmann Simulations Author: Arash Bakhtiari 1 st

More information

Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs

Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs C.-C. Su a, C.-W. Hsieh b, M. R. Smith b, M. C. Jermy c and J.-S. Wu a a Department of Mechanical Engineering, National Chiao Tung

More information

Numerical Simulation of Coastal Wave Processes with the Use of Smoothed Particle Hydrodynamics (SPH) Method

Numerical Simulation of Coastal Wave Processes with the Use of Smoothed Particle Hydrodynamics (SPH) Method Aristotle University of Thessaloniki Faculty of Engineering Department of Civil Engineering Division of Hydraulics and Environmental Engineering Laboratory of Maritime Engineering Christos V. Makris Dipl.

More information

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Nikolai Sakharnykh - NVIDIA San Jose Convention Center, San Jose, CA September 21, 2010 Introduction Tridiagonal solvers very popular

More information

Large Eddy Simulation of Flow over a Backward Facing Step using Fire Dynamics Simulator (FDS)

Large Eddy Simulation of Flow over a Backward Facing Step using Fire Dynamics Simulator (FDS) The 14 th Asian Congress of Fluid Mechanics - 14ACFM October 15-19, 2013; Hanoi and Halong, Vietnam Large Eddy Simulation of Flow over a Backward Facing Step using Fire Dynamics Simulator (FDS) Md. Mahfuz

More information

A Study of the Development of an Analytical Wall Function for Large Eddy Simulation of Turbulent Channel and Rectangular Duct Flow

A Study of the Development of an Analytical Wall Function for Large Eddy Simulation of Turbulent Channel and Rectangular Duct Flow University of Wisconsin Milwaukee UWM Digital Commons Theses and Dissertations August 2014 A Study of the Development of an Analytical Wall Function for Large Eddy Simulation of Turbulent Channel and Rectangular

More information

CHAPTER 3. Elementary Fluid Dynamics

CHAPTER 3. Elementary Fluid Dynamics CHAPTER 3. Elementary Fluid Dynamics - Understanding the physics of fluid in motion - Derivation of the Bernoulli equation from Newton s second law Basic Assumptions of fluid stream, unless a specific

More information

Rotating Moving Boundary Analysis Using ANSYS 5.7

Rotating Moving Boundary Analysis Using ANSYS 5.7 Abstract Rotating Moving Boundary Analysis Using ANSYS 5.7 Qin Yin Fan CYBERNET SYSTEMS CO., LTD. Rich Lange ANSYS Inc. As subroutines in commercial software, APDL (ANSYS Parametric Design Language) provides

More information

Mesh Adaptive LES for micro-scale air pollution dispersion and effect of tall buildings.

Mesh Adaptive LES for micro-scale air pollution dispersion and effect of tall buildings. HARMO17, Budapest, 9 12 May, 2016. Mesh Adaptive LES for micro-scale air pollution dispersion and effect of tall buildings. Elsa Aristodemou, Luz Maria Boganegra, Christopher Pain, Alan Robins, and Helen

More information

LaBS: a CFD tool based on the Lattice Boltzmann method. E. Tannoury, D. Ricot, B. Gaston

LaBS: a CFD tool based on the Lattice Boltzmann method. E. Tannoury, D. Ricot, B. Gaston LaBS: a CFD tool based on the Lattice Boltzmann method E. Tannoury, D. Ricot, B. Gaston Impact of HPC on automotive engineering applications Alain Prost, 1989 Lewis Hamilton, 2008 Impact of HPC on automotive

More information

Supercomputing of Tsunami Damage Mitigation Using Offshore Mega-Floating Structures

Supercomputing of Tsunami Damage Mitigation Using Offshore Mega-Floating Structures International Innovation Workshop on Tsunami, Snow Avalanche and Flash Flood Energy Dissipation January 21-22, 2016, Maison Villemanzy in Lyon, France Supercomputing of Tsunami Damage Mitigation Using

More information

Shape Optimization of Long-Span Translational Free-Form Shell Roofs in Strong Wind Using Multigrid Method and Variable Complexity Model

Shape Optimization of Long-Span Translational Free-Form Shell Roofs in Strong Wind Using Multigrid Method and Variable Complexity Model 6 th China-Japan-Korea Joint Symposium on Optimiation of Structural and Mechanical Systems June 22 25, 21, Kyoto, Japan Shape Optimiation of Long-Span Translational Free-Form Shell Roofs in Strong Wind

More information

Mass-flux parameterization in the shallow convection gray zone

Mass-flux parameterization in the shallow convection gray zone Mass-flux parameterization in the shallow convection gray zone LACE stay report Toulouse Centre National de Recherche Meteorologique, 15. September 2014 26. September 2014 Scientific supervisor: Rachel

More information

A Contact Angle Model for the Parallel Free Surface Lattice Boltzmann Method in walberla Stefan Donath (stefan.donath@informatik.uni-erlangen.de) Computer Science 10 (System Simulation) University of Erlangen-Nuremberg

More information

Developing LES Models for IC Engine Simulations. June 14-15, 2017 Madison, WI

Developing LES Models for IC Engine Simulations. June 14-15, 2017 Madison, WI Developing LES Models for IC Engine Simulations June 14-15, 2017 Madison, WI 1 2 RANS vs LES Both approaches use the same equation: u i u i u j 1 P 1 u i t x x x x j i j T j The only difference is turbulent

More information

Lattice Boltzmann Method for Simulating Turbulent Flows

Lattice Boltzmann Method for Simulating Turbulent Flows Lattice Boltzmann Method for Simulating Turbulent Flows by Yusuke Koda A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Applied Science

More information

X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management

X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management Hideyuki Shamoto, Tatsuhiro Chiba, Mikio Takeuchi Tokyo Institute of Technology IBM Research Tokyo Programming for large

More information

GPU COMPUTING AND THE FUTURE OF HPC. Timothy Lanfear, NVIDIA

GPU COMPUTING AND THE FUTURE OF HPC. Timothy Lanfear, NVIDIA GPU COMPUTING AND THE FUTURE OF HPC Timothy Lanfear, NVIDIA ~1 W ~3 W ~100 W ~30 W 1 kw 100 kw 20 MW Power-constrained Computers 2 EXASCALE COMPUTING WILL ENABLE TRANSFORMATIONAL SCIENCE RESULTS First-principles

More information

Computational Fluid Dynamics with the Lattice Boltzmann Method KTH SCI, Stockholm

Computational Fluid Dynamics with the Lattice Boltzmann Method KTH SCI, Stockholm Computational Fluid Dynamics with the Lattice Boltzmann Method KTH SCI, Stockholm March 17 March 21, 2014 Florian Schornbaum, Martin Bauer, Simon Bogner Chair for System Simulation Friedrich-Alexander-Universität

More information

Real-time Thermal Flow Predictions for Data Centers

Real-time Thermal Flow Predictions for Data Centers Real-time Thermal Flow Predictions for Data Centers Using the Lattice Boltzmann Method on Graphics Processing Units for Predicting Thermal Flow in Data Centers Johannes Sjölund Computer Science and Engineering,

More information

Tutorial: Modeling Liquid Reactions in CIJR Using the Eulerian PDF transport (DQMOM-IEM) Model

Tutorial: Modeling Liquid Reactions in CIJR Using the Eulerian PDF transport (DQMOM-IEM) Model Tutorial: Modeling Liquid Reactions in CIJR Using the Eulerian PDF transport (DQMOM-IEM) Model Introduction The purpose of this tutorial is to demonstrate setup and solution procedure of liquid chemical

More information

Large Scale Parallel Lattice Boltzmann Model of Dendritic Growth

Large Scale Parallel Lattice Boltzmann Model of Dendritic Growth Large Scale Parallel Lattice Boltzmann Model of Dendritic Growth Bohumir Jelinek Mohsen Eshraghi Sergio Felicelli CAVS, Mississippi State University March 3-7, 2013 San Antonio, Texas US Army Corps of

More information

CME 213 S PRING Eric Darve

CME 213 S PRING Eric Darve CME 213 S PRING 2017 Eric Darve Summary of previous lectures Pthreads: low-level multi-threaded programming OpenMP: simplified interface based on #pragma, adapted to scientific computing OpenMP for and

More information

Continued Investigation of Small-Scale Air-Sea Coupled Dynamics Using CBLAST Data

Continued Investigation of Small-Scale Air-Sea Coupled Dynamics Using CBLAST Data Continued Investigation of Small-Scale Air-Sea Coupled Dynamics Using CBLAST Data Dick K.P. Yue Center for Ocean Engineering Department of Mechanical Engineering Massachusetts Institute of Technology Cambridge,

More information

ANSYS AIM Tutorial Turbulent Flow Over a Backward Facing Step

ANSYS AIM Tutorial Turbulent Flow Over a Backward Facing Step ANSYS AIM Tutorial Turbulent Flow Over a Backward Facing Step Author(s): Sebastian Vecchi, ANSYS Created using ANSYS AIM 18.1 Problem Specification Pre-Analysis & Start Up Governing Equation Start-Up Geometry

More information

Performance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA

Performance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA Performance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA Kazuhiko Komatsu, S. Momose, Y. Isobe, O. Watanabe, A. Musa, M. Yokokawa, T. Aoyama, M. Sato, H. Kobayashi Tohoku University 14 November,

More information

High-Fidelity Simulation of Unsteady Flow Problems using a 3rd Order Hybrid MUSCL/CD scheme. A. West & D. Caraeni

High-Fidelity Simulation of Unsteady Flow Problems using a 3rd Order Hybrid MUSCL/CD scheme. A. West & D. Caraeni High-Fidelity Simulation of Unsteady Flow Problems using a 3rd Order Hybrid MUSCL/CD scheme ECCOMAS, June 6 th -11 th 2016, Crete Island, Greece A. West & D. Caraeni Outline Industrial Motivation Numerical

More information

NVIDIA Application Lab at Jülich

NVIDIA Application Lab at Jülich Mitglied der Helmholtz- Gemeinschaft NVIDIA Application Lab at Jülich Dirk Pleiter Jülich Supercomputing Centre (JSC) Forschungszentrum Jülich at a Glance (status 2010) Budget: 450 mio Euro Staff: 4,800

More information

FPGA-based Supercomputing: New Opportunities and Challenges

FPGA-based Supercomputing: New Opportunities and Challenges FPGA-based Supercomputing: New Opportunities and Challenges Naoya Maruyama (RIKEN AICS)* 5 th ADAC Workshop Feb 15, 2018 * Current Main affiliation is Lawrence Livermore National Laboratory SIAM PP18:

More information

A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids

A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids Patrice Castonguay and Antony Jameson Aerospace Computing Lab, Stanford University GTC Asia, Beijing, China December 15 th, 2011

More information

Self-Cultivation System

Self-Cultivation System Development of a Microorganism Incubator using CFD Simulations Self-Cultivation System A comfortable mixing incubator to grow microorganism for agricultural, animal husbandry and ocean agriculture industries

More information

GPU Parallelization is the Perfect Match with the Discrete Particle Method for Blast Analysis

GPU Parallelization is the Perfect Match with the Discrete Particle Method for Blast Analysis GPU Parallelization is the Perfect Match with the Discrete Particle Method for Blast Analysis Wayne L. Mindle, Ph.D. 1 Lars Olovsson, Ph.D. 2 1 CertaSIM, LLC and 2 IMPETUS Afea AB GTC 2015 March 17-20,

More information

Faster Simulations of the National Airspace System

Faster Simulations of the National Airspace System Faster Simulations of the National Airspace System PK Menon Monish Tandale Sandy Wiraatmadja Optimal Synthesis Inc. Joseph Rios NASA Ames Research Center NVIDIA GPU Technology Conference 2010, San Jose,

More information

The Spalart Allmaras turbulence model

The Spalart Allmaras turbulence model The Spalart Allmaras turbulence model The main equation The Spallart Allmaras turbulence model is a one equation model designed especially for aerospace applications; it solves a modelled transport equation

More information

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid

More information

This tutorial illustrates how to set up and solve a problem involving solidification. This tutorial will demonstrate how to do the following:

This tutorial illustrates how to set up and solve a problem involving solidification. This tutorial will demonstrate how to do the following: Tutorial 22. Modeling Solidification Introduction This tutorial illustrates how to set up and solve a problem involving solidification. This tutorial will demonstrate how to do the following: Define a

More information

Experiences of the Development of the Supercomputers

Experiences of the Development of the Supercomputers Experiences of the Development of the Supercomputers - Earth Simulator and K Computer YOKOKAWA, Mitsuo Kobe University/RIKEN AICS Application Oriented Systems Developed in Japan No.1 systems in TOP500

More information

Introduction to ANSYS CFX

Introduction to ANSYS CFX Workshop 03 Fluid flow around the NACA0012 Airfoil 16.0 Release Introduction to ANSYS CFX 2015 ANSYS, Inc. March 13, 2015 1 Release 16.0 Workshop Description: The flow simulated is an external aerodynamics

More information

URANS and SAS analysis of flow dynamics in a GDI nozzle

URANS and SAS analysis of flow dynamics in a GDI nozzle , 3rd Annual Conference on Liquid Atomization and Spray Systems, Brno, Czech Republic, September 010 J.-M. Shi*, K. Wenzlawski*, J. Helie, H. Nuglisch, J. Cousin * Continental Automotive GmbH Siemensstr.

More information

TESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications

TESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications TESLA P PERFORMANCE GUIDE HPC and Deep Learning Applications MAY 217 TESLA P PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important

More information

The Future of High Performance Interconnects

The Future of High Performance Interconnects The Future of High Performance Interconnects Ashrut Ambastha HPC Advisory Council Perth, Australia :: August 2017 When Algorithms Go Rogue 2017 Mellanox Technologies 2 When Algorithms Go Rogue 2017 Mellanox

More information

Pressure Drop Evaluation in a Pilot Plant Hydrocyclone

Pressure Drop Evaluation in a Pilot Plant Hydrocyclone Pressure Drop Evaluation in a Pilot Plant Hydrocyclone Fabio Kasper, M.Sc. Emilio Paladino, D.Sc. Marcus Reis, M.Sc. ESSS Carlos A. Capela Moraes, D.Sc. Dárley C. Melo, M.Sc. Petrobras Research Center

More information

Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC

Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC Fourth Workshop on Accelerator Programming Using Directives (WACCPD), Nov. 13, 2017 Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC Takuma

More information

PARALLEL PROGRAMMING MANY-CORE COMPUTING: THE LOFAR SOFTWARE TELESCOPE (5/5)

PARALLEL PROGRAMMING MANY-CORE COMPUTING: THE LOFAR SOFTWARE TELESCOPE (5/5) PARALLEL PROGRAMMING MANY-CORE COMPUTING: THE LOFAR SOFTWARE TELESCOPE (5/5) Rob van Nieuwpoort Vrije Universiteit Amsterdam & Astron, the Netherlands Institute for Radio Astronomy Why Radio? Credit: NASA/IPAC

More information

CUDA. Fluid simulation Lattice Boltzmann Models Cellular Automata

CUDA. Fluid simulation Lattice Boltzmann Models Cellular Automata CUDA Fluid simulation Lattice Boltzmann Models Cellular Automata Please excuse my layout of slides for the remaining part of the talk! Fluid Simulation Navier Stokes equations for incompressible fluids

More information

NUMERICAL ANALYSIS OF WIND EFFECT ON HIGH-DENSITY BUILDING AERAS

NUMERICAL ANALYSIS OF WIND EFFECT ON HIGH-DENSITY BUILDING AERAS NUMERICAL ANALYSIS OF WIND EFFECT ON HIGH-DENSITY BUILDING AERAS Bin ZHAO, Ying LI, Xianting LI and Qisen YAN Department of Thermal Engineering, Tsinghua University Beijing, 100084, P.R. China ABSTRACT

More information

2006: Short-Range Molecular Dynamics on GPU. San Jose, CA September 22, 2010 Peng Wang, NVIDIA

2006: Short-Range Molecular Dynamics on GPU. San Jose, CA September 22, 2010 Peng Wang, NVIDIA 2006: Short-Range Molecular Dynamics on GPU San Jose, CA September 22, 2010 Peng Wang, NVIDIA Overview The LAMMPS molecular dynamics (MD) code Cell-list generation and force calculation Algorithm & performance

More information

J. Blair Perot. Ali Khajeh-Saeed. Software Engineer CD-adapco. Mechanical Engineering UMASS, Amherst

J. Blair Perot. Ali Khajeh-Saeed. Software Engineer CD-adapco. Mechanical Engineering UMASS, Amherst Ali Khajeh-Saeed Software Engineer CD-adapco J. Blair Perot Mechanical Engineering UMASS, Amherst Supercomputers Optimization Stream Benchmark Stag++ (3D Incompressible Flow Code) Matrix Multiply Function

More information

Two-Phase flows on massively parallel multi-gpu clusters

Two-Phase flows on massively parallel multi-gpu clusters Two-Phase flows on massively parallel multi-gpu clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn Workshop Programming of Heterogeneous

More information

Virtual EM Inc. Ann Arbor, Michigan, USA

Virtual EM Inc. Ann Arbor, Michigan, USA Functional Description of the Architecture of a Special Purpose Processor for Orders of Magnitude Reduction in Run Time in Computational Electromagnetics Tayfun Özdemir Virtual EM Inc. Ann Arbor, Michigan,

More information

Simulation of Liquid-Gas-Solid Flows with the Lattice Boltzmann Method

Simulation of Liquid-Gas-Solid Flows with the Lattice Boltzmann Method Simulation of Liquid-Gas-Solid Flows with the Lattice Boltzmann Method June 21, 2011 Introduction Free Surface LBM Liquid-Gas-Solid Flows Parallel Computing Examples and More References Fig. Simulation

More information

Advanced ANSYS FLUENT Acoustics

Advanced ANSYS FLUENT Acoustics Workshop Modeling Flow-Induced (Aeroacoustic) Noise 14.5 Release Advanced ANSYS FLUENT Acoustics 2011 ANSYS, Inc. November 7, 2012 1 Introduction This tutorial demonstrates how to model 2D turbulent flow

More information

The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy.! Thomas C.

The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy.! Thomas C. The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy! Thomas C. Schulthess ENES HPC Workshop, Hamburg, March 17, 2014 T. Schulthess!1

More information

Quantifying the Dynamic Ocean Surface Using Underwater Radiometric Measurement

Quantifying the Dynamic Ocean Surface Using Underwater Radiometric Measurement DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. Quantifying the Dynamic Ocean Surface Using Underwater Radiometric Measurement Lian Shen Department of Mechanical Engineering

More information

Fujitsu s Technologies to the K Computer

Fujitsu s Technologies to the K Computer Fujitsu s Technologies to the K Computer - a journey to practical Petascale computing platform - June 21 nd, 2011 Motoi Okuda FUJITSU Ltd. Agenda The Next generation supercomputer project of Japan The

More information

Large Eddy Simulation Applications to Meteorology

Large Eddy Simulation Applications to Meteorology Large Eddy Simulation Applications to Meteorology Marcelo Chamecki Department of Meteorology The Pennsylvania State University Tutorial School on Fluid Dynamics: Topics in Turbulence May 27 th 2010, College

More information