Multi-GPU simulations in OpenFOAM with SpeedIT technology.

Size: px

Start display at page:

Download "Multi-GPU simulations in OpenFOAM with SpeedIT technology."

Oliver Washington
6 years ago
Views:

1 Multi-GPU simulations in OpenFOAM with SpeedIT technology.

2 Attempt I: SpeedIT GPU-based library of iterative solvers for Sparse Linear Algebra and CFD. Current version: 2.2. Version 1.0 in CMRS format for fast SpMV and other storage formats. BiCGSTAB and CG iterative solvers. Preconditioners: Jacobi, AMG. Runs on one or more GPU cards. OpenFOAM compatibile. Possible Applications: Quantum Chemistry, Seminconductor and Power Network Design CFD (OpenFOAM) etc.

SpeedIT and OpenFOAM Plugin to OpenFOAM Provides conversion between CSR and OpenFOAM LDU data

OpenFOAM source code is unchanged due to a plugin concept.

Any acceleration toolkit can be integrated now. 1.

3 SpeedIT and OpenFOAM Plugin to OpenFOAM Provides conversion between CSR and OpenFOAM LDU data formats. Easy substitution of CG/BCG with their GPU versions. OpenFOAM source code is unchanged due to a plugin concept. No recompilation of OpenFOAM is necessary. GPL-based generic Plugin to OpenFOAM. Any acceleration toolkit can be integrated now. 1. Edit system/controldict file and add libspeedit.so 2. Edit system/fvsolution and replace CG with CG_accel 3. Compile the Plugin to OpenFOAM (wmake) 4. Complete library dependencies: $FOAM_USER_LIBBIN should contain GPU-based libraries.

SpeedIT vs. CUDA 5.0 SpMV is a key operation in scientific software. CMRS speedup 10 1 vector scalar HYB speedup 3.5 3 2.5 2 1.5 1 2 1.5 1 0.2 1 10 100 1000 µ Fig.

4 SpeedIT vs. CUDA 5.0 SpMV is a key operation in scientific software. CMRS speedup 10 1 vector scalar HYB speedup µ Fig. Speedup of our CMRS implementation against scalar, vector (CSR) and HYB kernels as a function of the mean number of nonzero elements per row (µ) σ Fig. Speedup of our CMRS implementation against CUDA 5.0 (HYB/CSR formats)

5 OpenFOAM & single GPU Low-cost hardware GPU cost: $200 Test cases: Cavity 3D, 512K cells, icofoam. Human aorta, 500K cells, simplefoam. CPU: Intel Core 2 Duo E8400, 3GHz, 8GB 800MHz GPU: NVIDIA GTX 460, VRAM 1GB Ubuntu x64, OpenFOAM 2.0.1, CUDA Toolkit 4.1, Pressure solved with CG preconditioned with DIC, GAMG and CG preconditioned with AMG.

6 OpenFOAM & single GPU Validation Fig. Velocity and pressure profiles along x axis for aorta (top left) and pressure profile along x axis for cavity3d (top right). Solution for all preconditioners. Cross-section for velocity in X and Y direction for cavity3d run with OpenFOAM and SpeedIT (lines) and from literature for Re=100 (dotted line) and Re=400 (solid line) without turbulence model (icofoam).

OpenFOAM & single GPU Performance 2000 2500 1600 2000 1200 1500 800 1000 400 500 0 4.00 3.50 3.00 2.50 CG+DIC 1C CG+DIC 2C GAMG 1C SpeedIT GAMG 2C 3.

00 CG+DIC 1C CG+DIC 2C GAMG 1C GAMG 2C 0 CG+DIC 1C CG+DIC 2C GAMG 2C GAMG 1C Fig.

7 OpenFOAM & single GPU Performance CG+DIC 1C CG+DIC 2C GAMG 1C SpeedIT GAMG 2C 3.77x 2.40x CG+DIC 2C CG+DIC 1C SpeedIT GAMG 2C GAMG 1C 2.68x 2.88x x CG+DIC 1C CG+DIC 2C GAMG 1C GAMG 2C 0 CG+DIC 1C CG+DIC 2C GAMG 2C GAMG 1C Fig. Execution times of 50 time steps (up) and acceleration factor for simplefoam (left) and pisofoam (right). Performed at Intel E8400@3GHz and GTX460. 1C and 2C stand for one, two cores respectively.

8 OpenFoam & multi-gpu Motivation: large geometries (> 8M celss) do not fit to a single GPU card (with max. 6GB RAM). OpenFOAM performed the decomposition. MPI was used for communication between GPU cards. Tests were performed at IBM PLX CINECA with 2 six-cores Intel Westmere 2.40 GHz and 2 Tesla M2070 per node (supported by NVIDIA). One thread manages one GPU card. Tests focused on multi-gpu precondtioned CG with AMG to solve pressure equation in steady-state flows.

OpenFoam & multi-gpu 6000 5000 CPU GPU 4000 3000 2000 1000 0 x1.59 1353 848 x1.63 1207 736 x1.62 699 x1.

9 OpenFoam & multi-gpu CPU GPU x x x x CPU, PCG with GAMG CPU, PCG with diagonal SpeedIT Fig. Time in sec. of first 10 time steps for N GPUs and CPU cores. Motorbike test, simplefoam with diagonal (blue) and GAMG (green) preconditioners, SpeedIT is marked in red. Geometry is 32 millions cells. Tests were performed at PLX CINECA.

10 Scaling of OpenFOAM & multi-gpu Complex car geometry, external steady state flow, simplefoam, 32M cells, pressure: GAMG, velocity: smoothgs. SpeedIT multigpu Ideal Scaling Fig. Execution times of 2950 time steps (left) and scaling relative to 8 GPU (right).

11 OpenFOAM & GPU for industries Authors: Saeed Iqbal and Kevin Tubbs Full report can be obtained at Dell Tech Center. Figure 1: OpenFOAM performance of 3D cavity case using 4 million cells on a single node.

12 OpenFOAM & GPU for industries Authors: Saeed Iqbal and Kevin Tubbs Figure 2: Total Power and Power Efficiency of 3D cavity case on 4 million cells on a single node.

13 OpenFOAM & GPU for industries Authors: Saeed Iqbal and Kevin Tubbs Figure 1: OpenFOAM performance of 3D cavity case using 8 million cells on a single node.

14 OpenFOAM & GPU for industries Authors: Saeed Iqbal and Kevin Tubbs Figure 2: Total Power and Power Efficiency of 3D cavity case on 4 million cells on a single node.

15 SpeedIT 3.0 What s new: New ILU based preconditioner. Support for Keppler NVIDIA cards. much higher bandwidth (300GB/s). 5x faster intra-gpu communication thanks to GPU Direct 2.0. Release in the begining of 2013.

16 Acknowledgments Saeed Iqbal and Kevin Tubbs (DELL) Stan Posey, Edmondo Orlotti (NVIDIA) CINECA, PRACE Vratis Ltd. Muchoborska 18, Wroclaw, Poland More information: speed-it.vratis.com, vratis.com/blog

Application of GPU technology to OpenFOAM simulations

Application of GPU technology to OpenFOAM simulations Jakub Poła, Andrzej Kosior, Łukasz Miroslaw jakub.pola@vratis.com, www.vratis.com Wroclaw, Poland Agenda Motivation Partial acceleration SpeedIT OpenFOAM