Towards real-time prediction of Tsunami impact effects on nearshore infrastructure

Size: px
Start display at page:

Download "Towards real-time prediction of Tsunami impact effects on nearshore infrastructure"

Transcription

1 Towards real-time prediction of Tsunami impact effects on nearshore infrastructure Manfred Krafczyk & Jonas Tölke Inst. for Computational Modeling in Civil Engineering DFG-Round Table Programme Near and Onshore Tsunami Effects Folie 1

2 physics numerics Modeling geometry Engineering model Hard- and software Folie 2

3 overview I Kinetic transport modeling overview of previous results description of the planned work algorithms and parallelization strategy anticipated results hard and software resources acceleration by utilizing dedicated hardware summary Folie 3

4 tsunamis, storm surges and dam-breaks involve the large-scale movement of solids and fluids are often irregular in timing and thus difficult to observe and measure involve multiple types of physical processes on a broad range of spatial and temporal scales Computational modelling can play an important role both in helping to understand the nature of the fundamental processes involved and in predicting the detailed outcomes of various types of events in specific locations. State-of-the-art is 2.5 D modeling / simulation of free surface flows Folie 4

5 alps flood 8/2005 following massive precipitation: average fill level exceeds river bank level: genuine 3D effect Folie 5

6 project idea: support / optimization of evacuation / rescue / damage minimization measures by short term 3D HPC simulation Folie 8

7 ~3 CPU hours on a fast PC Folie 9

8 Simulation of a surf wave TU München, Oskar von Miller Institut, Versuchsanstalt für Wasserbau und Wasserwirtschaft Folie 10

9 Simulation of a surf wave Folie 11

10 warning time: the 2004 tsunami example Folie 12

11 Development of a space adaptive CFD prototype for short-term 3D flood / tsunami prediction based on automatic GIS data input to support evacuation / rescue / damage minimization measures. Folie 13

12 pocket -tsunami (PC) Folie 14

13 Input: flow BC GIS database (surface mesh) 3D - SST CFD simulation DOF output (local): flow rates forces water levels simulation cycle: 2-3 hours Pre/Postprocessing: Computational Steering Folie 15

14 Simulation of flooding and tsunamis automatic acquisition of GIS data fast mesh generation adaptive mesh refinement / coarsening using hierarchic blocks coupling 3D free surface + 2D non-linear shallow water For testing purposes: GIS-Data from GEBCO Digital Atlas, British Oceanographic Data Centre Folie 16

15 CFD kernel: research prototype Lattice-Boltzmann CFD solver 3D transient adaptive multiphase / free surface LES / RANS parallel second-order accurate in space and time MPI Folie 17

16 Kinetic transport modeling small Knudsen number Boltzmann equation Chapman-Enskog-Expansion BGK-Approximation (Bhatnagar, Gross, Krook) Navier Stokes equations discretization in space and time continuity equation Lattice Boltzmann equation (LBGK) Chapman-Enskog-Expansion small Knudsen number small Mach number Folie 18

17 The LB-equation structural advantages: linear and exact advection operator conservative scheme for mass and momentum no numerical viscosity Folie 19

18 code performance on Hitachi SR-8000: ~30% of theoretical peak performance per node parallel efficiency ~90 % on 32 nodes (x 8 processors) code performance on Opteron cluster (120 processors, 500 GB RAM, Myrinet): parallel efficiency ~95 % on 120 processors up to 1 billion grid points Folie 20

19 algorithms and parallelization strategy computational aspects no Poisson equation is solved for the pressure Cartesian grid (automatic 3D grid generation) convergence properties: LBE can be tuned to second-order accuracy with respect to the corresponding solution of incompressible Navier-Stokes flow because of their explicit nature and local stencil LB models are are perfect candidates for efficient parallelization stress tensor locally available (turbulence modelling) Folie 21

20 algorithms and parallelization strategy Optimized data structures for general geometries top: Peano-Hilbert U -ordering bottom: Morton N -ordering adaptive Morton N -ordering around a 2-D airfoil (M. J. Aftosmis et al., Applications of Space-Filling Curves to Cartesian Methods for CFD, AIAA ) Folie 22

21 algorithms and parallelization strategy D2Q9-Modell D3Q19-Modell NW N NE W q SW q S P R q SE S q E T E B Q A SW S SE second order accuracy in space Folie 23

22 Gittergenerierung Automatic 3D grid generation DFG-Round Table Programme Near and Onshore Tsunami Effects Folie 24

23 Domain decomposition: (PAR)METIS DFG-Round Table Programme Near and Onshore Tsunami Effects Folie 25

24 start read subdomain(s) collision boundary nodes Multi Relaxation Time second order BC local grid refinement efficient data structures asynchronous communication nonblocking receive comunication nonblocking send boundary nodes collision inner nodes propagation inner nodes time loop (non-adaptive run) propagation boundary nodes Folie 26

25 anticipated results feasability proof of short term prediction / analysis of catastrophic flood events based on 3D CFD modeling and automatic input of GIS / satellite based topography / bathymetry data Folie 27

26 required computer and software resources minimum requirements: sufficient (short-term) CPU share ~10 TFlops 1TByte RAM 10 TByte Disk space F90 / C++ compiler parallel Debugger MPI 2.x? Folie 28

27 DRAM GAP contiguous memory access is mandatory! Folie 29

28 New hardware developements nvidia GTX 8800 Compute Unified Device Architecture (CUDA) Folie 30

29 nvidia - G80: the parallel stream processor The G80 has eight groups of 16 stream processors, for a total of 128 SPs Generalized floating-point processors capable of operating on any manner of data. G80's stream processors are scalar each SP handles one component SPs are clocked 1.35GHz, giving the GeForce 8800 a tremendous amount of floating-point processing power: 1.35*2*128 = 345 GFLOPs Eight "clusters" of stream processors are connected to six Render Output Unit (ROP) by a crossbar-style switch Each ROP partition has a 64-bits wide interface to graphics memory, which is clocked at 900 MHz. Memory Bandwidth: 6*64/8*0.9*2 (DDR) GB/s = 86 GB/s Folie 31

30 Application Programming Interface (API) Thread Block Grid of Thread Blocks Function Type Qualifiers (_device_, _global_, _host_) Variable Type Qualifiers (_device_,_shared_) Memory management (cudamalloc, cudamemcpy) Synchronisation (_syncthreads() ) Memory Bandwidth Effective bandwidth of each memory space depends significantly on the memory access pattern simultaneous memory accesses of one thread block can be coalesced into a single contiguous, aligned memory access if: thread number N access address BaseAddress + N BaseAddress has to be aligned to 16*sizeof(type) bytes ( otherwise memory bandwidth performance breaks down to about 8 GB/sec ) Folie 32

31 Platform Peak[Gflops] MemBW[GB/s] price [Euro] Intel Core 2 Duo (2 GHz) NEC SX-8R A (8 CPUs) Ca nvidia 8800 GTX Folie 33

32 Nonlocal operations Each thread block has shared memory of 16 KB (2 cycles latency) Use shared memory for nonlocal operations (LB: Propagation) Synchronize Write back to device memory Synchronize Grid of Thread Blocks over borders Results Platform MLUPS MemBW[GB/s] GFlops Intel Core Duo (2 GHz) (17 %) 1.6 (20%) Intel Core 2 Duo (2 GHz) (25 %) 3.2 (20%) nvidia 8800 GTX (30 %) 66.0 (19 %) Folie 34

33 CPU versus GPU The war is on Performance GPU CPU Algorithm complexity Explicit solvers Block type grids unstructured mesh implicit solvers Folie 35

34 Fluid-Struktur Interaktion Ferrybridge, England 1965 Folie 36

35 bidirectional Fluid-Structureinteraction Folie 37

36 outlook: fluid-structure interaction, debris flow erosion, sedimentation, scouring Folie 38

37 summary purpose of this research project: to develop a 3D simulation prototype for short term prediction / analysis of catastrophic flood events based on HPC 3D-CFD modeling In terms of modeling / numerical methods / computer science we want to study and optimize the performance of an adaptive kinetic CFD solution environment including pre- and postprocessing issues on massively parallel hardware Folie 39

Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics

Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics H. Y. Schive ( 薛熙于 ) Graduate Institute of Physics, National Taiwan University Leung Center for Cosmology and Particle Astrophysics

More information

Next-generation CFD: Real-Time Computation and Visualization

Next-generation CFD: Real-Time Computation and Visualization Next-generation CFD: Real-Time Computation and Visualization Christian F. Janßen Hamburg University of Technology Tesla C1060, ~20 million lattice nodes [2010] Kinetic approaches for the simulation of

More information

Performance and Accuracy of Lattice-Boltzmann Kernels on Multi- and Manycore Architectures

Performance and Accuracy of Lattice-Boltzmann Kernels on Multi- and Manycore Architectures Performance and Accuracy of Lattice-Boltzmann Kernels on Multi- and Manycore Architectures Dirk Ribbrock, Markus Geveler, Dominik Göddeke, Stefan Turek Angewandte Mathematik, Technische Universität Dortmund

More information

International Supercomputing Conference 2009

International Supercomputing Conference 2009 International Supercomputing Conference 2009 Implementation of a Lattice-Boltzmann-Method for Numerical Fluid Mechanics Using the nvidia CUDA Technology E. Riegel, T. Indinger, N.A. Adams Technische Universität

More information

Two-Phase flows on massively parallel multi-gpu clusters

Two-Phase flows on massively parallel multi-gpu clusters Two-Phase flows on massively parallel multi-gpu clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn Workshop Programming of Heterogeneous

More information

Numerical Algorithms on Multi-GPU Architectures

Numerical Algorithms on Multi-GPU Architectures Numerical Algorithms on Multi-GPU Architectures Dr.-Ing. Harald Köstler 2 nd International Workshops on Advances in Computational Mechanics Yokohama, Japan 30.3.2010 2 3 Contents Motivation: Applications

More information

J. Blair Perot. Ali Khajeh-Saeed. Software Engineer CD-adapco. Mechanical Engineering UMASS, Amherst

J. Blair Perot. Ali Khajeh-Saeed. Software Engineer CD-adapco. Mechanical Engineering UMASS, Amherst Ali Khajeh-Saeed Software Engineer CD-adapco J. Blair Perot Mechanical Engineering UMASS, Amherst Supercomputers Optimization Stream Benchmark Stag++ (3D Incompressible Flow Code) Matrix Multiply Function

More information

GPU Cluster Computing for FEM

GPU Cluster Computing for FEM GPU Cluster Computing for FEM Dominik Göddeke Sven H.M. Buijssen, Hilmar Wobker and Stefan Turek Angewandte Mathematik und Numerik TU Dortmund, Germany dominik.goeddeke@math.tu-dortmund.de GPU Computing

More information

General Purpose GPU Computing in Partial Wave Analysis

General Purpose GPU Computing in Partial Wave Analysis JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data

More information

Software and Performance Engineering for numerical codes on GPU clusters

Software and Performance Engineering for numerical codes on GPU clusters Software and Performance Engineering for numerical codes on GPU clusters H. Köstler International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering Harbin, China 28.7.2010 2 3

More information

GPUs and GPGPUs. Greg Blanton John T. Lubia

GPUs and GPGPUs. Greg Blanton John T. Lubia GPUs and GPGPUs Greg Blanton John T. Lubia PROCESSOR ARCHITECTURAL ROADMAP Design CPU Optimized for sequential performance ILP increasingly difficult to extract from instruction stream Control hardware

More information

Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model

Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA Part 1: Hardware design and programming model Dirk Ribbrock Faculty of Mathematics, TU dortmund 2016 Table of Contents Why parallel

More information

CUDA Experiences: Over-Optimization and Future HPC

CUDA Experiences: Over-Optimization and Future HPC CUDA Experiences: Over-Optimization and Future HPC Carl Pearson 1, Simon Garcia De Gonzalo 2 Ph.D. candidates, Electrical and Computer Engineering 1 / Computer Science 2, University of Illinois Urbana-Champaign

More information

REEF3D : Open-Source Hydrodynamics Large Scale Wave Propagation Modeling for the Norwegian Coast with REEF3D

REEF3D : Open-Source Hydrodynamics Large Scale Wave Propagation Modeling for the Norwegian Coast with REEF3D REEF3D : Open-Source Hydrodynamics Large Scale Wave Propagation Modeling for the Norwegian Coast with REEF3D Hans Bihs, Associate Professor Arun Kamath, Post Doc Marine Civil Engineering Department of

More information

Virtual EM Inc. Ann Arbor, Michigan, USA

Virtual EM Inc. Ann Arbor, Michigan, USA Functional Description of the Architecture of a Special Purpose Processor for Orders of Magnitude Reduction in Run Time in Computational Electromagnetics Tayfun Özdemir Virtual EM Inc. Ann Arbor, Michigan,

More information

Computing on GPU Clusters

Computing on GPU Clusters Computing on GPU Clusters Robert Strzodka (MPII), Dominik Göddeke G (TUDo( TUDo), Dominik Behr (AMD) Conference on Parallel Processing and Applied Mathematics Wroclaw, Poland, September 13-16, 16, 2009

More information

A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids

A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids Patrice Castonguay and Antony Jameson Aerospace Computing Lab, Stanford University GTC Asia, Beijing, China December 15 th, 2011

More information

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host

More information

What is GPU? CS 590: High Performance Computing. GPU Architectures and CUDA Concepts/Terms

What is GPU? CS 590: High Performance Computing. GPU Architectures and CUDA Concepts/Terms CS 590: High Performance Computing GPU Architectures and CUDA Concepts/Terms Fengguang Song Department of Computer & Information Science IUPUI What is GPU? Conventional GPUs are used to generate 2D, 3D

More information

Finite Element Integration and Assembly on Modern Multi and Many-core Processors

Finite Element Integration and Assembly on Modern Multi and Many-core Processors Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,

More information

Porting a parallel rotor wake simulation to GPGPU accelerators using OpenACC

Porting a parallel rotor wake simulation to GPGPU accelerators using OpenACC DLR.de Chart 1 Porting a parallel rotor wake simulation to GPGPU accelerators using OpenACC Melven Röhrig-Zöllner DLR, Simulations- und Softwaretechnik DLR.de Chart 2 Outline Hardware-Architecture (CPU+GPU)

More information

Real-time Thermal Flow Predictions for Data Centers

Real-time Thermal Flow Predictions for Data Centers Real-time Thermal Flow Predictions for Data Centers Using the Lattice Boltzmann Method on Graphics Processing Units for Predicting Thermal Flow in Data Centers Johannes Sjölund Computer Science and Engineering,

More information

CUDA. Fluid simulation Lattice Boltzmann Models Cellular Automata

CUDA. Fluid simulation Lattice Boltzmann Models Cellular Automata CUDA Fluid simulation Lattice Boltzmann Models Cellular Automata Please excuse my layout of slides for the remaining part of the talk! Fluid Simulation Navier Stokes equations for incompressible fluids

More information

Tesla Architecture, CUDA and Optimization Strategies

Tesla Architecture, CUDA and Optimization Strategies Tesla Architecture, CUDA and Optimization Strategies Lan Shi, Li Yi & Liyuan Zhang Hauptseminar: Multicore Architectures and Programming Page 1 Outline Tesla Architecture & CUDA CUDA Programming Optimization

More information

Developing the TELEMAC system for HECToR (phase 2b & beyond) Zhi Shang

Developing the TELEMAC system for HECToR (phase 2b & beyond) Zhi Shang Developing the TELEMAC system for HECToR (phase 2b & beyond) Zhi Shang Outline of the Talk Introduction to the TELEMAC System and to TELEMAC-2D Code Developments Data Reordering Strategy Results Conclusions

More information

Radial Basis Function-Generated Finite Differences (RBF-FD): New Opportunities for Applications in Scientific Computing

Radial Basis Function-Generated Finite Differences (RBF-FD): New Opportunities for Applications in Scientific Computing Radial Basis Function-Generated Finite Differences (RBF-FD): New Opportunities for Applications in Scientific Computing Natasha Flyer National Center for Atmospheric Research Boulder, CO Meshes vs. Mesh-free

More information

SEASHORE / SARUMAN. Short Read Matching using GPU Programming. Tobias Jakobi

SEASHORE / SARUMAN. Short Read Matching using GPU Programming. Tobias Jakobi SEASHORE SARUMAN Summary 1 / 24 SEASHORE / SARUMAN Short Read Matching using GPU Programming Tobias Jakobi Center for Biotechnology (CeBiTec) Bioinformatics Resource Facility (BRF) Bielefeld University

More information

Coastal impact of a tsunami Review of numerical models

Coastal impact of a tsunami Review of numerical models Coastal impact of a tsunami Review of numerical models Richard Marcer 2 Content Physics to simulate Different approaches of modelling 2D depth average Full 3D Navier-Stokes 3D model Key point : free surface

More information

High Performance Computing

High Performance Computing High Performance Computing ADVANCED SCIENTIFIC COMPUTING Dr. Ing. Morris Riedel Adjunct Associated Professor School of Engineering and Natural Sciences, University of Iceland Research Group Leader, Juelich

More information

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can

More information

Unstructured Grid Numbering Schemes for GPU Coalescing Requirements

Unstructured Grid Numbering Schemes for GPU Coalescing Requirements Unstructured Grid Numbering Schemes for GPU Coalescing Requirements Andrew Corrigan 1 and Johann Dahm 2 Laboratories for Computational Physics and Fluid Dynamics Naval Research Laboratory 1 Department

More information

Load-balancing multi-gpu shallow water simulations on small clusters

Load-balancing multi-gpu shallow water simulations on small clusters Load-balancing multi-gpu shallow water simulations on small clusters Gorm Skevik master thesis autumn 2014 Load-balancing multi-gpu shallow water simulations on small clusters Gorm Skevik 1st August 2014

More information

High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs

High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs Gordon Erlebacher Department of Scientific Computing Sept. 28, 2012 with Dimitri Komatitsch (Pau,France) David Michea

More information

LATTICE-BOLTZMANN METHOD FOR THE SIMULATION OF LAMINAR MIXERS

LATTICE-BOLTZMANN METHOD FOR THE SIMULATION OF LAMINAR MIXERS 14 th European Conference on Mixing Warszawa, 10-13 September 2012 LATTICE-BOLTZMANN METHOD FOR THE SIMULATION OF LAMINAR MIXERS Felix Muggli a, Laurent Chatagny a, Jonas Lätt b a Sulzer Markets & Technology

More information

arxiv: v1 [physics.comp-ph] 4 Nov 2013

arxiv: v1 [physics.comp-ph] 4 Nov 2013 arxiv:1311.0590v1 [physics.comp-ph] 4 Nov 2013 Performance of Kepler GTX Titan GPUs and Xeon Phi System, Weonjong Lee, and Jeonghwan Pak Lattice Gauge Theory Research Center, CTP, and FPRD, Department

More information

Parallel 3D Sweep Kernel with PaRSEC

Parallel 3D Sweep Kernel with PaRSEC Parallel 3D Sweep Kernel with PaRSEC Salli Moustafa Mathieu Faverge Laurent Plagne Pierre Ramet 1 st International Workshop on HPC-CFD in Energy/Transport Domains August 22, 2014 Overview 1. Cartesian

More information

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES Cliff Woolley, NVIDIA PREFACE This talk presents a case study of extracting parallelism in the UMT2013 benchmark for 3D unstructured-mesh

More information

GPU-based Distributed Behavior Models with CUDA

GPU-based Distributed Behavior Models with CUDA GPU-based Distributed Behavior Models with CUDA Courtesy: YouTube, ISIS Lab, Universita degli Studi di Salerno Bradly Alicea Introduction Flocking: Reynolds boids algorithm. * models simple local behaviors

More information

Speedup Altair RADIOSS Solvers Using NVIDIA GPU

Speedup Altair RADIOSS Solvers Using NVIDIA GPU Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair

More information

Massively Parallel OpenMP-MPI Implementation of the SPH Code DualSPHysics

Massively Parallel OpenMP-MPI Implementation of the SPH Code DualSPHysics Massively Parallel OpenMP-MPI Implementation of the SPH Code DualSPHysics Athanasios Mokos, Benedict D. Rogers School of Mechanical, Aeronautical and Civil Engineering University of Manchester, UK Funded

More information

Shallow Water Simulations on Graphics Hardware

Shallow Water Simulations on Graphics Hardware Shallow Water Simulations on Graphics Hardware Ph.D. Thesis Presentation 2014-06-27 Martin Lilleeng Sætra Outline Introduction Parallel Computing and the GPU Simulating Shallow Water Flow Topics of Thesis

More information

Optimization of HOM Couplers using Time Domain Schemes

Optimization of HOM Couplers using Time Domain Schemes Optimization of HOM Couplers using Time Domain Schemes Workshop on HOM Damping in Superconducting RF Cavities Carsten Potratz Universität Rostock October 11, 2010 10/11/2010 2009 UNIVERSITÄT ROSTOCK FAKULTÄT

More information

OpenACC programming for GPGPUs: Rotor wake simulation

OpenACC programming for GPGPUs: Rotor wake simulation DLR.de Chart 1 OpenACC programming for GPGPUs: Rotor wake simulation Melven Röhrig-Zöllner, Achim Basermann Simulations- und Softwaretechnik DLR.de Chart 2 Outline Hardware-Architecture (CPU+GPU) GPU computing

More information

Performance Benefits of NVIDIA GPUs for LS-DYNA

Performance Benefits of NVIDIA GPUs for LS-DYNA Performance Benefits of NVIDIA GPUs for LS-DYNA Mr. Stan Posey and Dr. Srinivas Kodiyalam NVIDIA Corporation, Santa Clara, CA, USA Summary: This work examines the performance characteristics of LS-DYNA

More information

HPC Usage for Aerodynamic Flow Computation with Different Levels of Detail

HPC Usage for Aerodynamic Flow Computation with Different Levels of Detail DLR.de Folie 1 HPCN-Workshop 14./15. Mai 2018 HPC Usage for Aerodynamic Flow Computation with Different Levels of Detail Cornelia Grabe, Marco Burnazzi, Axel Probst, Silvia Probst DLR, Institute of Aerodynamics

More information

Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards

Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards By Allan P. Engsig-Karup, Morten Gorm Madsen and Stefan L. Glimberg DTU Informatics Workshop

More information

FINITE POINTSET METHOD FOR 2D DAM-BREAK PROBLEM WITH GPU-ACCELERATION. M. Panchatcharam 1, S. Sundar 2

FINITE POINTSET METHOD FOR 2D DAM-BREAK PROBLEM WITH GPU-ACCELERATION. M. Panchatcharam 1, S. Sundar 2 International Journal of Applied Mathematics Volume 25 No. 4 2012, 547-557 FINITE POINTSET METHOD FOR 2D DAM-BREAK PROBLEM WITH GPU-ACCELERATION M. Panchatcharam 1, S. Sundar 2 1,2 Department of Mathematics

More information

GPU Cluster Computing for Finite Element Applications

GPU Cluster Computing for Finite Element Applications GPU Cluster Computing for Finite Element Applications Dominik Göddeke, Hilmar Wobker, Sven H.M. Buijssen and Stefan Turek Applied Mathematics TU Dortmund dominik.goeddeke@math.tu-dortmund.de http://www.mathematik.tu-dortmund.de/~goeddeke

More information

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances) HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access

More information

Interaction of Fluid Simulation Based on PhysX Physics Engine. Huibai Wang, Jianfei Wan, Fengquan Zhang

Interaction of Fluid Simulation Based on PhysX Physics Engine. Huibai Wang, Jianfei Wan, Fengquan Zhang 4th International Conference on Sensors, Measurement and Intelligent Materials (ICSMIM 2015) Interaction of Fluid Simulation Based on PhysX Physics Engine Huibai Wang, Jianfei Wan, Fengquan Zhang College

More information

Computational Fluid Dynamics with the Lattice Boltzmann Method KTH SCI, Stockholm

Computational Fluid Dynamics with the Lattice Boltzmann Method KTH SCI, Stockholm Computational Fluid Dynamics with the Lattice Boltzmann Method KTH SCI, Stockholm March 17 March 21, 2014 Florian Schornbaum, Martin Bauer, Simon Bogner Chair for System Simulation Friedrich-Alexander-Universität

More information

Mathematical computations with GPUs

Mathematical computations with GPUs Master Educational Program Information technology in applications Mathematical computations with GPUs GPU architecture Alexey A. Romanenko arom@ccfit.nsu.ru Novosibirsk State University GPU Graphical Processing

More information

Particleworks: Particle-based CAE Software fully ported to GPU

Particleworks: Particle-based CAE Software fully ported to GPU Particleworks: Particle-based CAE Software fully ported to GPU Introduction PrometechVideo_v3.2.3.wmv 3.5 min. Particleworks Why the particle method? Existing methods FEM, FVM, FLIP, Fluid calculation

More information

cuibm A GPU Accelerated Immersed Boundary Method

cuibm A GPU Accelerated Immersed Boundary Method cuibm A GPU Accelerated Immersed Boundary Method S. K. Layton, A. Krishnan and L. A. Barba Corresponding author: labarba@bu.edu Department of Mechanical Engineering, Boston University, Boston, MA, 225,

More information

Finite Volume Discretization on Irregular Voronoi Grids

Finite Volume Discretization on Irregular Voronoi Grids Finite Volume Discretization on Irregular Voronoi Grids C.Huettig 1, W. Moore 1 1 Hampton University / National Institute of Aerospace Folie 1 The earth and its terrestrial neighbors NASA Colin Rose, Dorling

More information

Center for Computational Science

Center for Computational Science Center for Computational Science Toward GPU-accelerated meshfree fluids simulation using the fast multipole method Lorena A Barba Boston University Department of Mechanical Engineering with: Felipe Cruz,

More information

Generic Refinement and Block Partitioning enabling efficient GPU CFD on Unstructured Grids

Generic Refinement and Block Partitioning enabling efficient GPU CFD on Unstructured Grids Generic Refinement and Block Partitioning enabling efficient GPU CFD on Unstructured Grids Matthieu Lefebvre 1, Jean-Marie Le Gouez 2 1 PhD at Onera, now post-doc at Princeton, department of Geosciences,

More information

Computational Fluid Dynamics

Computational Fluid Dynamics Computational Fluid Dynamics Prof. Dr.-Ing. Siegfried Wagner Institut für Aerodynamik und Gasdynamik, Universität Stuttgart, Pfaffenwaldring 21, 70550 Stuttgart A large number of highly qualified papers

More information

Verification and Validation in CFD and Heat Transfer: ANSYS Practice and the New ASME Standard

Verification and Validation in CFD and Heat Transfer: ANSYS Practice and the New ASME Standard Verification and Validation in CFD and Heat Transfer: ANSYS Practice and the New ASME Standard Dimitri P. Tselepidakis & Lewis Collins ASME 2012 Verification and Validation Symposium May 3 rd, 2012 1 Outline

More information

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST CS 380 - GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1 Markus Hadwiger, KAUST Reading Assignment #2 (until Feb. 17) Read (required): GLSL book, chapter 4 (The OpenGL Programmable

More information

Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters

Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,

More information

Vector Engine Processor of SX-Aurora TSUBASA

Vector Engine Processor of SX-Aurora TSUBASA Vector Engine Processor of SX-Aurora TSUBASA Shintaro Momose, Ph.D., NEC Deutschland GmbH 9 th October, 2018 WSSP 1 NEC Corporation 2018 Contents 1) Introduction 2) VE Processor Architecture 3) Performance

More information

Large-scale Gas Turbine Simulations on GPU clusters

Large-scale Gas Turbine Simulations on GPU clusters Large-scale Gas Turbine Simulations on GPU clusters Tobias Brandvik and Graham Pullan Whittle Laboratory University of Cambridge A large-scale simulation Overview PART I: Turbomachinery PART II: Stencil-based

More information

Performance and Software-Engineering Considerations for Massively Parallel Simulations

Performance and Software-Engineering Considerations for Massively Parallel Simulations Performance and Software-Engineering Considerations for Massively Parallel Simulations Ulrich Rüde (ruede@cs.fau.de) Ben Bergen, Frank Hülsemann, Christoph Freundl Universität Erlangen-Nürnberg www10.informatik.uni-erlangen.de

More information

SENSEI / SENSEI-Lite / SENEI-LDC Updates

SENSEI / SENSEI-Lite / SENEI-LDC Updates SENSEI / SENSEI-Lite / SENEI-LDC Updates Chris Roy and Brent Pickering Aerospace and Ocean Engineering Dept. Virginia Tech July 23, 2014 Collaborations with Math Collaboration on the implicit SENSEI-LDC

More information

Performance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA

Performance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA Performance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA Kazuhiko Komatsu, S. Momose, Y. Isobe, O. Watanabe, A. Musa, M. Yokokawa, T. Aoyama, M. Sato, H. Kobayashi Tohoku University 14 November,

More information

Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method

Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method G. Wellein, T. Zeiser, G. Hager HPC Services Regional Computing Center A. Nitsure, K. Iglberger, U. Rüde Chair for System

More information

Memory. Lecture 2: different memory and variable types. Memory Hierarchy. CPU Memory Hierarchy. Main memory

Memory. Lecture 2: different memory and variable types. Memory Hierarchy. CPU Memory Hierarchy. Main memory Memory Lecture 2: different memory and variable types Prof. Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford e-research Centre Key challenge in modern computer architecture

More information

Evacuate Now? Faster-than-real-time Shallow Water Simulations on GPUs. NVIDIA GPU Technology Conference San Jose, California, 2010 André R.

Evacuate Now? Faster-than-real-time Shallow Water Simulations on GPUs. NVIDIA GPU Technology Conference San Jose, California, 2010 André R. Evacuate Now? Faster-than-real-time Shallow Water Simulations on GPUs NVIDIA GPU Technology Conference San Jose, California, 2010 André R. Brodtkorb Talk Outline Learn how to simulate a half an hour dam

More information

Enabling In Situ Viz and Data Analysis with Provenance in libmesh

Enabling In Situ Viz and Data Analysis with Provenance in libmesh Enabling In Situ Viz and Data Analysis with Provenance in libmesh Vítor Silva Jose J. Camata Marta Mattoso Alvaro L. G. A. Coutinho (Federal university Of Rio de Janeiro/Brazil) Patrick Valduriez (INRIA/France)

More information

High Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters

High Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters SIAM PP 2014 High Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters C. Riesinger, A. Bakhtiari, M. Schreiber Technische Universität München February 20, 2014

More information

CFD Analysis of a Novel Hull Design for an Offshore Wind Farm Service Vessel

CFD Analysis of a Novel Hull Design for an Offshore Wind Farm Service Vessel CFD Analysis of a Novel Hull Design for an Offshore Wind Farm Service Vessel M. Shanley 1, J. Murphy 1, and P. Molloy 2 1 Hydraulics and Maritime, Civil and Environmental Engineering University College

More information

CUDA Performance Optimization. Patrick Legresley

CUDA Performance Optimization. Patrick Legresley CUDA Performance Optimization Patrick Legresley Optimizations Kernel optimizations Maximizing global memory throughput Efficient use of shared memory Minimizing divergent warps Intrinsic instructions Optimizations

More information

A laboratory-dualsphysics modelling approach to support landslide-tsunami hazard assessment

A laboratory-dualsphysics modelling approach to support landslide-tsunami hazard assessment A laboratory-dualsphysics modelling approach to support landslide-tsunami hazard assessment Lake Lucerne case, Switzerland, 2007 Dr. Valentin Heller (www.drvalentinheller.com) Geohazards and Earth Processes

More information

1.2 Numerical Solutions of Flow Problems

1.2 Numerical Solutions of Flow Problems 1.2 Numerical Solutions of Flow Problems DIFFERENTIAL EQUATIONS OF MOTION FOR A SIMPLIFIED FLOW PROBLEM Continuity equation for incompressible flow: 0 Momentum (Navier-Stokes) equations for a Newtonian

More information

A GPU Implementation for Two-Dimensional Shallow Water Modeling arxiv: v1 [cs.dc] 5 Sep 2013

A GPU Implementation for Two-Dimensional Shallow Water Modeling arxiv: v1 [cs.dc] 5 Sep 2013 A GPU Implementation for Two-Dimensional Shallow Water Modeling arxiv:1309.1230v1 [cs.dc] 5 Sep 2013 Kerry A. Seitz, Jr. 1, Alex Kennedy 1, Owen Ransom 2, Bassam A. Younis 2, and John D. Owens 3 1 Department

More information

Performance potential for simulating spin models on GPU

Performance potential for simulating spin models on GPU Performance potential for simulating spin models on GPU Martin Weigel Institut für Physik, Johannes-Gutenberg-Universität Mainz, Germany 11th International NTZ-Workshop on New Developments in Computational

More information

Sailfish: Lattice Boltzmann Fluid Simulations with GPUs and Python

Sailfish: Lattice Boltzmann Fluid Simulations with GPUs and Python Sailfish: Lattice Boltzmann Fluid Simulations with GPUs and Python Micha l Januszewski Institute of Physics University of Silesia in Katowice, Poland Google GTC 2012 M. Januszewski (IoP, US) Sailfish:

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

CFD VALIDATION FOR SURFACE COMBATANT 5415 STRAIGHT AHEAD AND STATIC DRIFT 20 DEGREE CONDITIONS USING STAR CCM+

CFD VALIDATION FOR SURFACE COMBATANT 5415 STRAIGHT AHEAD AND STATIC DRIFT 20 DEGREE CONDITIONS USING STAR CCM+ CFD VALIDATION FOR SURFACE COMBATANT 5415 STRAIGHT AHEAD AND STATIC DRIFT 20 DEGREE CONDITIONS USING STAR CCM+ by G. J. Grigoropoulos and I..S. Kefallinou 1. Introduction and setup 1. 1 Introduction The

More information

CFD Best Practice Guidelines: A process to understand CFD results and establish Simulation versus Reality

CFD Best Practice Guidelines: A process to understand CFD results and establish Simulation versus Reality CFD Best Practice Guidelines: A process to understand CFD results and establish Simulation versus Reality Judd Kaiser ANSYS Inc. judd.kaiser@ansys.com 2005 ANSYS, Inc. 1 ANSYS, Inc. Proprietary Overview

More information

simulation framework for piecewise regular grids

simulation framework for piecewise regular grids WALBERLA, an ultra-scalable multiphysics simulation framework for piecewise regular grids ParCo 2015, Edinburgh September 3rd, 2015 Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler

More information

Introduction to CUDA

Introduction to CUDA Introduction to CUDA Overview HW computational power Graphics API vs. CUDA CUDA glossary Memory model, HW implementation, execution Performance guidelines CUDA compiler C/C++ Language extensions Limitations

More information

Implementation of an integrated efficient parallel multiblock Flow solver

Implementation of an integrated efficient parallel multiblock Flow solver Implementation of an integrated efficient parallel multiblock Flow solver Thomas Bönisch, Panagiotis Adamidis and Roland Rühle adamidis@hlrs.de Outline Introduction to URANUS Why using Multiblock meshes

More information

Aeroacoustic computations with a new CFD solver based on the Lattice Boltzmann Method

Aeroacoustic computations with a new CFD solver based on the Lattice Boltzmann Method Aeroacoustic computations with a new CFD solver based on the Lattice Boltzmann Method D. Ricot 1, E. Foquet 2, H. Touil 3, E. Lévêque 3, H. Machrouki 4, F. Chevillotte 5, M. Meldi 6 1: Renault 2: CS 3:

More information

Available online at ScienceDirect. Parallel Computational Fluid Dynamics Conference (ParCFD2013)

Available online at  ScienceDirect. Parallel Computational Fluid Dynamics Conference (ParCFD2013) Available online at www.sciencedirect.com ScienceDirect Procedia Engineering 61 ( 2013 ) 81 86 Parallel Computational Fluid Dynamics Conference (ParCFD2013) An OpenCL-based parallel CFD code for simulations

More information

Co-Simulation von Flownex und ANSYS CFX am Beispiel einer Verdrängermaschine

Co-Simulation von Flownex und ANSYS CFX am Beispiel einer Verdrängermaschine Co-Simulation von Flownex und ANSYS CFX am Beispiel einer Verdrängermaschine Benoit Bosc-Bierne, Dr. Andreas Spille-Kohoff, Farai Hetze CFX Berlin Software GmbH, Berlin Contents Positive displacement compressors

More information

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS 1 Last time Each block is assigned to and executed on a single streaming multiprocessor (SM). Threads execute in groups of 32 called warps. Threads in

More information

PhD Student. Associate Professor, Co-Director, Center for Computational Earth and Environmental Science. Abdulrahman Manea.

PhD Student. Associate Professor, Co-Director, Center for Computational Earth and Environmental Science. Abdulrahman Manea. Abdulrahman Manea PhD Student Hamdi Tchelepi Associate Professor, Co-Director, Center for Computational Earth and Environmental Science Energy Resources Engineering Department School of Earth Sciences

More information

Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics

Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics I. Pantle Fachgebiet Strömungsmaschinen Karlsruher Institut für Technologie KIT Motivation

More information

Simulation of Turbulent Axisymmetric Waterjet Using Computational Fluid Dynamics (CFD)

Simulation of Turbulent Axisymmetric Waterjet Using Computational Fluid Dynamics (CFD) Simulation of Turbulent Axisymmetric Waterjet Using Computational Fluid Dynamics (CFD) PhD. Eng. Nicolae MEDAN 1 1 Technical University Cluj-Napoca, North University Center Baia Mare, Nicolae.Medan@cunbm.utcluj.ro

More information

ANSYS HPC. Technology Leadership. Barbara Hutchings ANSYS, Inc. September 20, 2011

ANSYS HPC. Technology Leadership. Barbara Hutchings ANSYS, Inc. September 20, 2011 ANSYS HPC Technology Leadership Barbara Hutchings barbara.hutchings@ansys.com 1 ANSYS, Inc. September 20, Why ANSYS Users Need HPC Insight you can t get any other way HPC enables high-fidelity Include

More information

Abstract. Introduction. Kevin Todisco

Abstract. Introduction. Kevin Todisco - Kevin Todisco Figure 1: A large scale example of the simulation. The leftmost image shows the beginning of the test case, and shows how the fluid refracts the environment around it. The middle image

More information

FOR P3: A monolithic multigrid FEM solver for fluid structure interaction

FOR P3: A monolithic multigrid FEM solver for fluid structure interaction FOR 493 - P3: A monolithic multigrid FEM solver for fluid structure interaction Stefan Turek 1 Jaroslav Hron 1,2 Hilmar Wobker 1 Mudassar Razzaq 1 1 Institute of Applied Mathematics, TU Dortmund, Germany

More information

Scalable Multi Agent Simulation on the GPU. Avi Bleiweiss NVIDIA Corporation San Jose, 2009

Scalable Multi Agent Simulation on the GPU. Avi Bleiweiss NVIDIA Corporation San Jose, 2009 Scalable Multi Agent Simulation on the GPU Avi Bleiweiss NVIDIA Corporation San Jose, 2009 Reasoning Explicit State machine, serial Implicit Compute intensive Fits SIMT well Collision avoidance Motivation

More information

Dynamic Mode Decomposition analysis of flow fields from CFD Simulations

Dynamic Mode Decomposition analysis of flow fields from CFD Simulations Dynamic Mode Decomposition analysis of flow fields from CFD Simulations Technische Universität München Thomas Indinger Lukas Haag, Daiki Matsumoto, Christoph Niedermeier in collaboration with Agenda Motivation

More information

Cerebrospinal Fluid Flow Analysis in Subarachnoid Space

Cerebrospinal Fluid Flow Analysis in Subarachnoid Space Jh160055-NAHI Cerebrospinal Fluid Flow Analysis in Subarachnoid Space Ryusuke Egawa Cyberscience Center, Tohoku University Abstract A better understanding of the hydrodynamics of the cerebrospinal fluid

More information

GPUs and Emerging Architectures

GPUs and Emerging Architectures GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs

More information

Computational Fluid Dynamics PRODUCT SHEET

Computational Fluid Dynamics PRODUCT SHEET TM 2014 Computational Fluid Dynamics PRODUCT SHEET 1 Breaking Limitations The Challenge of Traditional CFD In the traditional mesh-based approach, the reliability highly depends on the quality of the mesh,

More information

Paralization on GPU using CUDA An Introduction

Paralization on GPU using CUDA An Introduction Paralization on GPU using CUDA An Introduction Ehsan Nedaaee Oskoee 1 1 Department of Physics IASBS IPM Grid and HPC workshop IV, 2011 Outline 1 Introduction to GPU 2 Introduction to CUDA Graphics Processing

More information