CFD Solvers on Many-core Processors

Size: px

Start display at page:

Download "CFD Solvers on Many-core Processors"

Owen Ellis
5 years ago
Views:

1 CFD Solvers on Many-core Processors Tobias Brandvik Whittle Laboratory CFD Solvers on Many-core Processors p.1/36

2 CFD Backgroud CFD: Computational Fluid Dynamics Whittle Laboratory - Turbomachinery CFD Solvers on Many-core Processors p.2/36

3 Turbomachinery CFD Solvers on Many-core Processors p.3/36

4 Compute requirements Steady models (no wake/blade interaction etc.) 1 blade 0.5 Mcells 1 CPU hour 1 stage (2 blades) 1.0 Mcells 3 CPU hours 1 component (5 stages) 5.0 Mcells 20 CPU hours CFD Solvers on Many-core Processors p.4/36

5 Compute requirements Steady models (no wake/blade interaction etc.) 1 blade 0.5 Mcells 1 CPU hour 1 stage (2 blades) 1.0 Mcells 3 CPU hours 1 component (5 stages) 5.0 Mcells 20 CPU hours Unsteady models (with wakes etc.) 1 component (1000 blades) 500 Mcells 0.1M CPU hours Engine (4000 blades) 2 Gcells 1M CPU hours CFD Solvers on Many-core Processors p.5/36

6 Objectives Can CFD be made to run faster by using other types of processors? How to make sure it will continue to get faster with better processors in the future? CFD Solvers on Many-core Processors p.6/36

7 Background Single core Multi core Many core CFD Solvers on Many-core Processors p.7/36

8 Processor design P N trans For a modern chip, N trans small cores with transistors each gives 10 times the performance as 1 big core Performance Number of transistors x 10 8 CFD Solvers on Many-core Processors p.8/36

9 Everyone is going parallel Every major chip vendor is switching to many-core processors All future processors will be massively parallel CFD Solvers on Many-core Processors p.9/36

10 NVIDIA Tesla CFD Solvers on Many-core Processors p.10/36

11 IBM Cell CFD Solvers on Many-core Processors p.11/36

12 Sun Niagara CFD Solvers on Many-core Processors p.12/36

13 Intel Larrabee CFD Solvers on Many-core Processors p.13/36

14 Challenges Every processor has different characteristics (and in some cases languages and libraries) Codes have to be rewritten More difficult - use to have 1 process/thread per core with 1 MB cache Thousands of threads/processes per core KB on-chip memory per core CFD Solvers on Many-core Processors p.14/36

15 Benefits Possible to achieve step change in performance NOW Once the job is done, can expect to scale with Moore s Law like in the 80s and 90s CFD Solvers on Many-core Processors p.15/36

16 Scientific computing There are two possible approaches Horizontal: Single language for all problems Vertical: Different language for every problem CFD Solvers on Many-core Processors p.16/36

17 A View From Berkeley Scientific computing consists of seven different applications 1. Dense Linear Algebra 2. Sparse Linear Algebra 3. Spectral Methods 4. N-Body Methods 5. Structured Grids 6. Unstructured Grids 7. MapReduce CFD Solvers on Many-core Processors p.17/36

18 A View From Berkeley Scientific computing consists of seven different applications 1. Dense Linear Algebra 2. Sparse Linear Algebra 3. Spectral Methods 4. N-Body Methods 5. Structured Grids 6. Unstructured Grids 7. MapReduce Red applications relevant to CFD CFD Solvers on Many-core Processors p.18/36

19 The Seven Dwarfs CFD Solvers on Many-core Processors p.19/36

20 The Seven Dwarfs CFD Solvers on Many-core Processors p.20/36

21 Structured grids Basic spatial discretisation for many CFD solvers Solvers consist of a series of stencil operations + boundary conditions CFD Solvers on Many-core Processors p.21/36

22 Stencil operations i, j+1, k i, j, k+1 i 1, j, k i+1, j, k i, j, k 1 i, j 1, k CFD Solvers on Many-core Processors p.22/36

23 Stencil operations Evaluate 2 u x 2 on a regular grid: DO K=2,NK-1 DO J=2,NJ-1 DO I=2,NI-1 D2UDX2(I,J,K) = (U(I+1,J,K) - 2.0*U(I,J,K) + & U(I-1,J,K))/(DX*DX) END DO END DO END DO CFD Solvers on Many-core Processors p.23/36

24 Boundary conditions Set a variable to a fixed value on the i=0face: DO K=1,NK DO J=1,NJ U(0,J,K) = END DO END DO CFD Solvers on Many-core Processors p.24/36

25 SBLOCK Vertical approach to structured grids Mini-language and library Can target any processor without changing the solver definition Currently supports CPUs and NVIDIA GPUs (Cell support is coming) CFD Solvers on Many-core Processors p.25/36

26 Fundamental abstraction Blocks with patches CFD Solvers on Many-core Processors p.26/36

27 Stencil kernels kind = "stencil" avin = ["dx"] bpin = ["u"] bpout = ["d2udx2"] inner_calc = [ {"lvalue": "d2udx2" "rvalue": """u[1][0][0] - 2.0f*u[0][0][0] + u[-1][0][0])/(dx*dx)""" ] CFD Solvers on Many-core Processors p.27/36

28 Kernel compilation CFD Solvers on Many-core Processors p.28/36

29 TBLOCK Developed in-house at the lab by John Denton Blocks with arbitrary patch interfaces Simple and fast algorithm 15,000 lines of Fortran 77 Main solver routines are only 5,000 lines Widely used in industry and academia CFD Solvers on Many-core Processors p.29/36

30 Turbostream Turbostream is TBLOCK in SBLOCK 2000 lines of C 3000 lines of Python kernels Code generated from Python kernels is 15,000 lines Source code is very similar to TBLOCK every subroutine has an equivalent SBLOCK kernel CFD Solvers on Many-core Processors p.30/36

31 Speed-up results Two different scenarios CFD Solvers on Many-core Processors p.31/36

32 High-end desktop 2 Intel Quad Cores 6 NVIDIA GPUs 3,000 30x speed-up Can do routine design calculations in less than 2 minutes CFD Solvers on Many-core Processors p.32/36

33 Cluster 4 GPUs in 1 U (NVIDIA Tesla) But needs extra control unit Not as dense as CPU clusters (yet) Speed-ups of 10x on a per-cost, per-watt basis CFD Solvers on Many-core Processors p.33/36

34 Cluster We now have one of these! CFD Solvers on Many-core Processors p.34/36

35 Performance Cluster scaling Scaling when increasing job size: Number of GPUs CFD Solvers on Many-core Processors p.35/36

36 Conclusions Many-core processors can speed up CFD calculations Difficult to support all platforms and mantain portability through hand-coding Use a framework instead - write once run anywhere CFD Solvers on Many-core Processors p.36/36

Turbostream: A CFD solver for manycore

Turbostream: A CFD solver for manycore processors Tobias Brandvik Whittle Laboratory University of Cambridge Aim To produce an order of magnitude reduction in the run-time of CFD solvers for the same hardware