DLR.de Folie 1 HPCN-Workshop 14./15. Mai 2018 HPC Usage for Aerodynamic Flow Computation with Different Levels of Detail Cornelia Grabe, Marco Burnazzi, Axel Probst, Silvia Probst DLR, Institute of Aerodynamics and Flow Technology, C²A²S²E HPCN-Workshop DLR Göttingen 14./15. Mai 2018
DLR.de Folie 2 HPCN-Workshop 14./15. Mai 2018 Outline Introduction HPC Cluster at the institute Scaling properties HPC usage at the department Summary and Outlook Virtual Product
DLR.de Folie 3 HPCN-Workshop 14./15. Mai 2018 Introduction C²A²S²E department at Institute of Aerodynamics and Flow Technology: DLR TAU-Code (external aerodynamics of aircraft) DLR THETA-Code (external and internal fluiddynamics) In Göttingen, focus on performance prediction (in TAU) Turbulence modeling (different levels of detail) Transition modeling
DLR.de Folie 4 HPCN-Workshop 14./15. Mai 2018 Introduction What are the different levels of detail? Regarding the flow physics, how many details of the flow are resolved What size of turbulent scales are resolved?
DLR.de Folie 5 HPCN-Workshop 14./15. Mai 2018 Introduction What are the different levels of detail? Regarding the flow physics, how many details of the flow are resolved What size of turbulent scales are resolved?
DLR.de Folie 6 HPCN-Workshop 14./15. Mai 2018 Introduction What are the different levels of detail? Regarding the flow physics, how many details of the flow are resolved What size of turbulent scales are resolved? Production Energy Cascade Dissipation
DLR.de Folie 7 HPCN-Workshop 14./15. Mai 2018 Introduction What are the different levels of detail? Regarding the flow physics, how many details of the flow are resolved What size of turbulent scales are resolved? log(e(κ)) E: Kinetic Energy κ: Wave length Production Energy Cascade Dissipation log(κ)
DLR.de Folie 8 HPCN-Workshop 14./15. Mai 2018 Introduction Re c = ρu c/μ N: Number of grid points RANS Reynolds-Averaged Navier-Stokes Time averaging of the main flow field (Reynolds decomposition) Resulting additional terms (fluctuating velocities leading to Reynolds stresses ) are modeled using a Turbulence Model The wall-normal resolution of the boundary layer depends on the Reynolds number, typical sizes are N = 10 5-10 8 Status Quo in Aeronautics Industry + Cheap, steady flow Range of validity limited log(e(κ)) RANS log(κ)
DLR.de Folie 9 HPCN-Workshop 14./15. Mai 2018 Introduction Re c = ρu c/μ N: Number of grid points LES Large Eddy Simulation (wall-resolved) Spatial filtering of the main flow field (filter based on grid) Navier-Stokes Equations are solved for the large eddies. Small eddies are modeled using a sub-grid scale model *Re c = 10 7 : N 10 13 (general conservative estimate) **Re c = 10 7 : N 10 11 (clean wing, no separation, rather optimistic ) + Less modeling, more turbulent scales resolved Expensive, inherently unsteady, log(e(κ)) RANS LES **Spalart et al., 1997 * Choi, Moin, Physics of Fluids, 2012 log(κ)
DLR.de Folie 10 HPCN-Workshop 14./15. Mai 2018 Introduction Re c = ρu c/μ N: Number of grid points WM-LES Large Eddy Simulation (wall-modeled) LES without resolving the boundary layer but using wall functions *Re c = 10 7 : N 10 7 log(e(κ)) RANS LES * Choi, Moin, Physics of Fluids, 2012 log(κ)
DLR.de Folie 11 HPCN-Workshop 14./15. Mai 2018 Introduction Re c = ρu c/μ N: Number of grid points DNS Direct Numerical Simulation No averaging and no modeling, approximation only with respect to numerical dissipation Direct numerical solution of the Navier-Stokes-Equations *Re c = 10 7 : N = 10 19 + No modeling, all turbulent scales resolved Very Expensive, inherently unsteady log(e(κ)) RANS LES *Choi,Moin, Physics of Fluids, 2012 DNS log(κ)
DLR.de Folie 12 HPCN-Workshop 14./15. Mai 2018 Introduction LES in R&D: wall-resolved LES performed by NASA (Uzun et al., AIAA SciTech 2018): Re c = ρu c/μ Ma = U /a N: Number of grid points Nominally 2D flow (axisymmetric), Re c = 2 x 10 6, Ma = 0.875 Transonic flow with shock, separation and reattachement using up to N = 2 x 10 10 Computations of several weeks performed on Cori (ranks 8 in top500) Cray XC40 with Intel Knights Landing nodes on 170k cores (from 622k cores) ~120.000 points/core Bachalo, Johnson, AIAA Journal, 1986
DLR.de Folie 13 HPCN-Workshop 14./15. Mai 2018 Introduction 100 Log. of Number of grid points (for typical number of cores Relevant aerodynamic configuration) 19 10 8 5 1 0.01 1 100 1000 100000 Wallclock Time (days)
DLR.de Folie 14 HPCN-Workshop 14./15. Mai 2018 Introduction HRLM Hybrid RANS/LES Methods RANS-mode wherever sufficient, LES-mode wherever needed Based on the unified formulation of the Navier-Stokes Equations l RANS l RANS l LES l LES RANS-mode + Workable compromise between accuracy and effort LES-mode Transition between RANS- and LES-regions (Flavours of HRLM: DES, DDES, IDDES, ) RANS LES
DLR.de Folie 15 HPCN-Workshop 14./15. Mai 2018 Introduction 100 Log. of Number of grid points (for typical number of cores Relevant aerodynamic configuration) 19 10 8 5 1 0.01 1 100 1000 100000 Wallclock Time (days)
DLR.de Folie 16 HPCN-Workshop 14./15. Mai 2018 Introduction Summary RANS is the standard today for relevant aerodynamic configurations and will be for a long time for a lot of configurations Hybrid RANS/LES Methods might become achievable for industrially relevant aerodynamic configurations Application of (wall-resolved) LES and DNS is limited to small Reynolds numbers and academic configurations
DLR.de Folie 17 HPCN-Workshop 14./15. Mai 2018 HPC Cluster at the institute C²A²S²E-2 Cluster (Braunschweig): ranked 159 in top500 in 2013 SGI ICE X, Intel Xeon E5-2695v2 12C 2.400GHz, Infiniband FDR 560 nodes, 13,4k cores, 71,7k GB memory SCART2 Cluster (Göttingen): IBM idataplex dx360 M4, Intel Xeon E5-2680 V2 2.800GHz, Infiniband FDR 256 nodes, 5.12k cores, 16,4k GB memory
DLR.de Folie 18 HPCN-Workshop 14./15. Mai 2018 The DLR TAU-Code Compressible, unstructured/hybrid CFD software package Finite-Volume-based (second order, upwind/central) Explicit Runge-Kutta or implicit LU-SGS scheme for time integration Multigrid acceleration Implicit dual time stepping for time accurate flows Local grid adaptation and grid deformation Chimera technique for overlapping grids Parallelization based on domain decomposition
DLR.de Folie 19 HPCN-Workshop 14./15. Mai 2018 Scaling properties of DLR TAU-Code 89% 80% 95% 65% 73% 48% ~6.5K points/core 31% Parallel efficiency (vs 24 cores) 78% 50% 68% Courtesy of J. Jägersküpper
DLR.de Folie 20 HPCN-Workshop 14./15. Mai 2018 Different levels of detail Capability vs. Capacity computing Capacity computing: Computations over a range of operating points ( polar ) RANS Mesh size depends on Reynolds number and geometric complexity. Mostly attached flow Quantities of interest: the aerodynamic coefficients (c D, c L and c m ), i.e. integrated forces over the surface, c p Mostly steady flow.
DLR.de Folie 21 HPCN-Workshop 14./15. Mai 2018 Different levels of detail Capability vs. Capacity computing Capability computing: Fixed operating conditions HRLM Mesh size determined by Reynolds number, geometric complexity, HRLM flavour and flow characteristics (w/o separation, reattachement) Quantities of interest: Velocity and skin friction distributions; 2nd order quantities like the Reynolds stresses. Unsteady; statistical averaging after transient phase
DLR.de Folie 22 HPCN-Workshop 14./15. Mai 2018 RANS computations Ranges from 2D airfoil configuration with ~10 5 points and time to solution of ~10 h to 3D aircraft configurations in high-lift with ~10 8 points and time to solution of ~10 weeks
DLR.de Folie 23 HPCN-Workshop 14./15. Mai 2018 RANS computations JAXA Standard Model (JSM) Configuration with deployed high-lift devices Re = ρu c/μ Ma = U /a N: Number of grid points Re = 1.93 x 10 6, Ma = 0.162, 8 operating conditions (varying the angle of attack) N = 10 8 RANS(4eq) 1 operating point takes ~4 days on SCART-cluster using 24 nodes (480 cores) ~200.000 points/core JAXA Wallclock time ~1 month
DLR.de Folie 24 HPCN-Workshop 14./15. Mai 2018 RANS computations JAXA Standard Model (JSM), Work in Progress
DLR.de Folie 25 HPCN-Workshop 14./15. Mai 2018 RANS computations JAXA Standard Model (JSM), Work in Progress AIAA-2018-1258
DLR.de Folie 26 HPCN-Workshop 14./15. Mai 2018 HRLM computations RANS models very often fail to predict separated/unsteady flows Hybrid RANS/LES Methods Complement RANS computations Problem: additional layer of computational cost by time accurate computations Smallest cell determines the time step size The flow needs to pass the geometry several times (transient phase) before averaging can be started. Determines number of time steps N t Convective Time Unit 1 CTU [s] = c/u c U
DLR.de Folie 27 HPCN-Workshop 14./15. Mai 2018 HRLM computations DLR-F15 3-element airfoil Re = 2 x 10 6, Ma = 0.15, α = 7.05, quasi 2D N = 27 x 10 6, N t = 5000/CTU (7 CTUs) RANS(2eq)-IDDES Local geometry-induced separations Shear layer behind main-wing trailing edge Shear layer behind slat trailing edges Impingement of shear layer on slat (noise) Separation on flap driven by adverse pressure gradient
DLR.de Folie 28 HPCN-Workshop 14./15. Mai 2018 HRLM computations DLR-F15 3-element airfoil Re = 2 x 10 6, Ma = 0.15, α = 7.05, quasi 2D N = 27 x 10 6, N t = 5000/CTU (7 CTUs) CASE-Cluster (8 Knoten, 192 Cores) Wallclock time: 2.5 months ~140.000 points/core λ-isosurface coloured by x-velocity
DLR.de Folie 29 HPCN-Workshop 14./15. Mai 2018 HRLM computations DLR-F15 3-element airfoil Global IDDES is expensive Synthetic turbulence at interfaces New grid: N = 10 x 10 6 Saves 62% of points and computational time
DLR.de Folie 30 HPCN-Workshop 14./15. Mai 2018 HRLM computations Gust simulation Re = 6 x 10 6, Ma = 0.15, α = 0, quasi 2D N = 35 x 10 6, N t = 400/CTU (10 CTUs) RANS(2eq)-IDDES HRLM: Zonal IDDES + Synthetic turbulence Synthetic turbulence, local grid refinement Naca-profile as vortex-generator DLR-F15 two-element airfoil
DLR.de Folie 31 HPCN-Workshop 14./15. Mai 2018 HRLM computations Gust simulation Re = 6 x 10 6, Ma = 0.15, α = 0, quasi 2D Exp N = 35 x 10 6, N t = 400/CTU (10 CTUs) SCART-Cluster: 24 nodes, 480 cores Wallclock time: 18 days ~73.000 points/core 2D RANS HRLM
DLR.de Folie 32 HPCN-Workshop 14./15. Mai 2018 HRLM computations NASA-CRM simulation Re = 11.6 x 10 6, Ma = 0.25, α = 18 N = 50 x 10 6, N t = 500/CTU (13 CTUs) SCART-Cluster: 20 nodes, 400 cores Wallclock time: 8 days ~120.000 points/core RANS(1eq)-DDES Flow velocity
DLR.de Folie 33 HPCN-Workshop 14./15. Mai 2018 HRLM computations NASA-CRM simulation Mean velocity contour plots in the wake PIV window Experiments (PIV) RANS DDES
DLR.de Folie 34 HPCN-Workshop 14./15. Mai 2018 HRLM computations NASA-CRM simulation Mean velocity profiles in the wake
DLR.de Folie 35 HPCN-Workshop 14./15. Mai 2018 Summary I Strong scaling limit for basic RANS or HRLM computations is at ~5 x 10 3 points/core Typical mesh sizes and HPC resources that are available on a daily basis allow ~10 4-10 5 points/core It appears (to me) that configurations are getting computationally more demanding at the same rate (or faster) as the HPC resources increase.
DLR.de Folie 36 HPCN-Workshop 14./15. Mai 2018 A320 computation on CASE in 2011 In 2011: HRLM computation of an A320 configuration with N = 8 x 10 7 = complete configuration with deployed high-lift devices and engine (N = 3 x 10 7 ) + a small region refined for LES to analyse noise (N = 5 x 10 7 ) Very small time step size to account for acoustics Wallclock time: 70 days CASE-Cluster (2048 cores) 39.000 points/core
DLR.de Folie 37 HPCN-Workshop 14./15. Mai 2018 Summary II Strong scaling limit for basic RANS or HRLM computations is at ~5 x 10 3 points/core Typical mesh sizes and HPC resources that are available on a daily basis currently allow ~10 4-10 5 points/core It appears (to me) that configurations are getting computationally more demanding at the same rate (or faster) as the HPC resources increase. Application of HRLM to aerodynamically relevant configurations is realistic in the near future.
DLR.de Folie 38 HPCN-Workshop 14./15. Mai 2018 HPC Resources for LES Back to the (wall-resolved) LES on a plain wing Re c = 10 7 N 10 11, N t 100.000/CTU (50 CTUs) Complete CASE-Cluster: 13440 cores 7.5 x 10 6 points/core Assumed time-to-solution: decades (optimistic)
DLR.de Folie 39 HPCN-Workshop 14./15. Mai 2018 HPC Resources for LES Back to the (wall-resolved) LES on a plain wing Re c = 10 7 N 10 11, N t 100.000/CTU (50 CTUs) OR, (based on 5000 points/core and results for channel flow with 0.76 x 10 6 points and perfect weak scaling) 2 s/time step on 20.000k cores Resulting in a wallclock time of 116 days.
DLR.de Folie 40 HPCN-Workshop 14./15. Mai 2018 Outlook The DLR TAU-Code: domain decomposition for parallelization A new CFD code ( Flucs ) is developed at DLR featuring hybrid parallelization: Domain decomposition for parallelization (1 domain per socket) + overlap of communication and computation Shared-memory parallelization per domain (1 thread per CPU core) A certain version of Flucs was selected as the basis for a joint CFD capability developed by Airbus, ONERA and DLR.
DLR.de Folie 41 HPCN-Workshop 14./15. Mai 2018 HPC Usage for Aerodynamic Flow Computation with Different Levels of Detail Cornelia Grabe, Marco Burnazzi, Silvia Probst, Axel Probst DLR, Institute of Aerodynamics and Flow Technology, C²A²S²E HPCN-Workshop DLR Göttingen 14./15. Mai 2018