Aerodynamics of a hi-performance vehicle: a parallel computing application inside the Hi-ZEV project

Size: px

Start display at page:

Download "Aerodynamics of a hi-performance vehicle: a parallel computing application inside the Hi-ZEV project"

Gyles Sparks
6 years ago
Views:

1 Workshop HPC enabling of OpenFOAM for CFD applications Aerodynamics of a hi-performance vehicle: a parallel computing application inside the Hi-ZEV project A. De Maio (1), V. Krastev (2), P. Lanucara (3), F. Salvadore (3) (1) Nu.m.i.d.i.a. S. r. l. (2) Dept. of Industrial Engineering, University of Rome Tor Vergata (3) CINECA Roma, Dipartimento SCAI

2 Summary Hi-ZEV project outline Preliminary evaluation of the OpenFOAM code Prototype car simulations: aerodynamic results and scalability/performance tests Conclusions

Hi-ZEV: a collaborative industrial research project Granted by the Italian Ministry of Economic Development s program «Industria 2015 Nuove Tecnologie per il Made in Italy» The project aim

3 Hi-ZEV: a collaborative industrial research project Granted by the Italian Ministry of Economic Development s program «Industria 2015 Nuove Tecnologie per il Made in Italy» The project aim is the development of an Innovative High Performance Car with Low Environmental Impact based on an Electrical/Hybrid Powertrain The project started on 01/01/2011 and will last until 31/12/2013

4 Hi-ZEV: the partners Technos Reat Fondazione Italiana Nuove Comunicazioni Icomet Microsistemi srl Elettromedia Advanced Devices spa Dyesol Italia srl Leaff Engineering srl ISAM spa Concept Inn srl HPH Consulting

5 Hi-ZEV: the partners Team Leader and Project Coordinator Technos Reat Fondazione Italiana Nuove Comunicazioni Icomet Microsistemi srl Elettromedia Advanced Devices spa Dyesol Italia srl Leaff Engineering srl ISAM spa Concept Inn srl HPH Consulting

6 Hi-ZEV: the partners Team Leader and Project Coordinator Technos Reat Fondazione Italiana Nuove Comunicazioni Icomet Microsistemi srl Elettromedia Advanced Devices spa Dyesol Italia srl Leaff Engineering srl ISAM spa Concept Inn srl HPH Consulting

7 Hi-ZEV: technical Key Points Very light vehicle (low weight/power ratio) High performance Hybrid Powertrain for a wide range torque availability Very advanced chassis and suspensions for an excellent road-holding Accurate Fluid-Dynamic Design

8 Hi-ZEV: technical Key Points Very light vehicle (low weight/power ratio) High performance Hybrid Powertrain for a wide range torque availability Very advanced chassis and suspensions for an excellent road-holding Accurate Fluid-Dynamic Design CFD

The role of CFD inside the project In the early, as well as in

to optimize: 1. the external aerodynamics of the vehicle; 2.

parallelized code (OpenFOAM ) with the the HPC infrastructure

9 The role of CFD inside the project In the early, as well as in the more advanced design stages, CFD can be effectively used to optimize: 1. the external aerodynamics of the vehicle; 2. the underhood aerodynamics/thermal management; 3. The HVAC systems. OpenFOAM + HPC CFD The combination of an open source fully parallelized code (OpenFOAM ) with the the HPC infrastructure of CASPUR/CINECA represents an incredibly powerful and efficient answer to these needs. External aerodynamics Underhood HVAC

Preliminary simulations on the Matrix cluster Preliminary evaluation of OpenFOAM on the Matrix infrastructure Standard external aerodynamics test case (Ahmed body) 8 cores x node (2 x quad core AMD

10 Preliminary simulations on the Matrix cluster Preliminary evaluation of OpenFOAM on the Matrix infrastructure Standard external aerodynamics test case (Ahmed body) 8 cores x node (2 x quad core AMD Opteron 2.1 GHz) 320 nodes with 16 GB RAM each Infiniband DDR connection between nodes 20 Tflops peak perfomance, 177 Mflops/W sustained performance OpenFOAM OpenMPI Scotch for decomposition Steady state solver (simplefoam) on unstructured grids (up to 6*10 6 cells) High-Re RANS turbulence modeling (RNG/realizable k-e + WF) Up to 256 cores (32 nodes) involved

11 Preliminary simulations on the Matrix cluster: computational domain

12 Ahmed body results: wake flow structures, ϕ=25 Symmetry plane 3D (Q- criterion, Q=1 0 4 s - 2 ) (RKE) (RNG)

13 Ahmed body results: wake flow structures, ϕ=25 Symmetry plane 3D (Q- criterion, Q=1 0 4 s - 2 ) (RKE) (RNG)

14 Ahmed body results: wake flow structures, ϕ=25 Symmetry plane 3D (Q- criterion, Q=1 0 4 s - 2 ) (RKE) (RNG)

15 Ahmed body results: wake flow structures, ϕ=35 Symmetry plane 3D (Q- criterion, Q=1 0 4 s - 2 ) (RKE) (RNG)

16 Ahmed body results: wake flow structures, ϕ=35 Symmetry plane 3D (Q- criterion, Q=1 0 4 s - 2 ) (RKE) (RNG)

17 Ahmed body results: velocity profiles in the symmetry plane ϕ=25 ϕ=35

18 Ahmed body results: velocity profiles in the symmetry plane ϕ=25 ϕ=35

Ahmed body results: integrated rear pressure drag Overall comparison: Comments: Rear pressure drag coefficients (ϕ =25) Total Difference (%)* Slant Base RKE 0.147 0.088 0.235-13.3 RNG 0.147 0.083 0.

19 Ahmed body results: integrated rear pressure drag Overall comparison: Comments: Rear pressure drag coefficients (ϕ =25) Total Difference (%)* Slant Base RKE RNG Lienhart et al Rear pressure drag coefficients (ϕ =35) Total Difference (%)* Slant Base RKE RNG Lienhart et al Results are aligned with previous CFD studies on the 25 /35 configurations The realizable k-ε captures fairly well the relative drag reduction (~ 8%) in the 25 to 35 passage

20 Ahmed body results: some considerations about scalability Case description: Finest grid (~6*10 6 cells) PCG linear solver on pressure equation cores ( nodes) progression Speedup specific efficiency (%) Speedup specific efficiency Nodes increase sse... = speedup relative increase nodes relative increase

21 Ahmed body results: some considerations about scalability Case description: Speedup specific efficiency Finest grid (~6*10 6 cells) PCG linear solver on pressure equation cores ( nodes) progression Almost linear inter-node scaling (at least in the considered interval) Speedup specific efficiency (%) Aaaaaaa Nodes increase

22 Prototype car simulations Aims: 1. Aerodynamic optimization of the Hi-ZEV prototype external design; 2. More systematic scalability tests on the CASPUR/CINECA HPC infrastructures. Two hybrid (prisms+tetras) grids considered: *10 6 cells (symmetric); 2. 15*10 6 cells (complete geometry). Matrix (AMD Opteron) 8 cores x node (2 x quad core AMD Opteron 2.1 GHz) 320 nodes with 16 GB RAM each Infiniband DDR connection between nodes 20 Tflops peak perfomance, 177 Mflops/W sustained performance OpenFOAM Scotch Three architectures selected for the performance tests

Prototype car simulations Aims: 1. Aerodynamic optimization of the Hi-ZEV prototype external design; 2. More systematic scalability tests on the CASPUR/CINECA HPC infrastructures.

23 Prototype car simulations Aims: 1. Aerodynamic optimization of the Hi-ZEV prototype external design; 2. More systematic scalability tests on the CASPUR/CINECA HPC infrastructures. Two hybrid (prisms+tetras) grids considered: *10 6 cells (symmetric); 2. 15*10 6 cells (complete geometry). Jazz (Intel Xeon) 12 cores x node (2 x six-core Intel X GHz ) 16 nodes with 48 GB RAM each Infiniband QDR connection between nodes 14.3Tflops peak perfomance, 785 Mflops/W sustained performance OpenFOAM Scotch Three architectures selected for the performance tests Each node equipped also with 2 nvidia Tesla GPU computing units, not involved in the OpenFOAM simulations

5*10 6 cells (symmetric); 2. 15*10 6 cells (complete geometry). Fermi (BG/Q) 16 cores x node (IBM PPCA2 @ 1.

24 Prototype car simulations Aims: 1. Aerodynamic optimization of the Hi-ZEV prototype external design; 2. More systematic scalability tests on the CASPUR/CINECA HPC infrastructures. Two hybrid (prisms+tetras) grids considered: *10 6 cells (symmetric); 2. 15*10 6 cells (complete geometry). Fermi (BG/Q) 16 cores x node (IBM 1.6 GHz) nodes ( cores) with 16 GB RAM each (1 GB x core) Network interface with 11 links ->5D Torus 2 Pflops peak perfomance OpenFOAM Scotch Three architectures selected for the performance tests

25 Prototype car simulations: computational domain half car top outlet moving floor inlet side symmetry plane

26 Prototype car simulations: aerodynamic results (OF vs. Fluent) OpenFOAM settings: Symmetrical prism/tetra grid (exactly the same for both codes) simplefoam pressure-based solver Realizable k-ε for turbulence + standard WF TVD scheme for momentum convection, upwind for k/ε Fluent settings: Symmetrical prism/tetra grid (exactly the same for both codes) pressure-based solver Realizable k-ε for turbulence + nonequilibrium WF Second-order upwind scheme for all convective terms

27 Prototype car simulations: aerodynamic results (OF vs. Fluent) Aerodynamic coefficients OpenFOAM Fluent C d = 0.32, C L = 0.14 C d = 0.31, C L = 0.17

28 Prototype car simulations: aerodynamic results (OF vs. Fluent) Pressure distribution around the car, y=0 (symmetry plane) C p = p p 1 ρ U 2 2 Fluent, 6000 iterations C p p p = 1 ρ U 2 2 OpenFOAM, 4500 iterations

29 Prototype car simulations: aerodynamic results (OF vs. Fluent) Pressure distribution around the car, y= C p = p p 1 ρ U 2 2 Fluent, 6000 iterations C p p p = 1 ρ U 2 2 OpenFOAM, 4500 iterations

30 Prototype car simulations: aerodynamic results (OF vs. Fluent) Pressure distribution around the car, y= C p = p p 1 ρ U 2 2 Fluent, 6000 iterations C p p p = 1 ρ U 2 2 OpenFOAM, 4500 iterations

31 Prototype car simulations: aerodynamic results (OF vs. Fluent) Total pressure distribution around the car, y=0 (symmetry plane) C pt = pt p p p t, Fluent, 6000 iterations C pt = pt p p p t, OpenFOAM, 4500 iterations

32 Prototype car simulations: aerodynamic results (OF vs. Fluent) Total pressure distribution around the car, y= C pt = pt p p p t, Fluent, 6000 iterations C pt = pt p p p t, OpenFOAM, 4500 iterations

33 Prototype car simulations: aerodynamic results (OF vs. Fluent) Total pressure distribution around the car, y= C pt = pt p p p t, Fluent, 6000 iterations C pt = pt p p p t, OpenFOAM, 4500 iterations

34 Prototype car simulations: aerodynamic results (OF vs. Fluent) Total pressure distribution around the car, z=0.11 C pt = pt p p p t, Fluent, 6000 iterations C pt = pt p p p t, OpenFOAM, 4500 iterations

35 Prototype car simulations: inter-node scalability tests (Matrix vs. Jazz) Case description: Speedup, Matrix vs Jazz, PCG Symmetrical grid (~7.5*10 6 cells) 24 PCG and GAMG linear solver on pressure equation 50 iterations monitoring, starting from a fairly converged solution Speedup Matrix, PCG Jazz, PCG The computing node is selected as the fundamental unit Number of nodes speedup ( time per step) = 1 ( time per step) node N nodes

36 Prototype car simulations: inter-node scalability tests (Matrix vs. Jazz) Case description: Speedup, Matrix vs Jazz, GAMG Symmetrical grid (~7.5*10 6 cells) 16 PCG and GAMG linear solver on pressure equation 50 iterations monitoring, starting from a fairly converged solution Speedup Matrix, GAMG Jazz, GAMG The computing node is selected as the fundamental unit Number of nodes speedup ( time per step) = 1 ( time per step) node N nodes

37 Prototype car simulations: inter-node scalability tests (Matrix vs. Jazz) Case description: Speedup, Matrix, GAMG vs PCG Symmetrical grid (~7.5*10 6 cells) 24 PCG and GAMG linear solver on pressure equation 50 iterations monitoring, starting from a fairly converged solution Speedup Matrix, PCG Matrix, GAMG The computing node is selected as the fundamental unit Number of nodes speedup ( time per step) = 1 ( time per step) node N nodes

38 Prototype car simulations: inter-node scalability tests (Matrix vs. Jazz) Case description: Speedup, Jazz, GAMG vs PCG Symmetrical grid (~7.5*10 6 cells) 24 PCG and GAMG linear solver on pressure equation 50 iterations monitoring, starting from a fairly converged solution Speedup Jazz, PCG Jazz, GAMG The computing node is selected as the fundamental unit Number of nodes speedup ( time per step) = 1 ( time per step) node N nodes

39 Prototype car simulations: inter-node scalability tests (Matrix vs. Jazz) Case description: Symmetrical grid (~7.5*10 6 cells) PCG and GAMG linear solver on pressure equation 50 iterations monitoring, starting from a fairly converged solution The computing node is selected as the fundamental unit Comments: The PCG solver clearly outperforms GAMG when the parallelization starts to become extensive (approximately above 100 processes for the half-car case) Jazz appears to scale better than Matrix, probably because of the more capable infiniband network (QDR vs DDR) and of better cache filling as the single processes become smaller

40 Prototype car simulations: absolute and single-node performances (Matrix vs. Jazz) Case description: Time- per- step, Matrix, GAMG vs PCG Symmetrical grid (~7.5*10 6 cells) 70 Matrix, PCG PCG and GAMG linear solver on pressure equation 50 iterations monitoring, starting from a fairly converged solution time (s) Matrix, GAMG Time-per-step evaluated on a percore basis Number of cores

41 Prototype car simulations: absolute and single-node performances (Matrix vs. Jazz) Case description: Time- per- step, Jazz, GAMG vs PCG Symmetrical grid (~7.5*10 6 cells) 30 Jazz, PCG PCG and GAMG linear solver on pressure equation 50 iterations monitoring, starting from a fairly converged solution time (s) Jazz, GAMG Time-per-step evaluated on a percore basis Number of cores

42 Prototype car simulations: absolute and single-node performances (Matrix vs. Jazz) Case description: Time- per- step, single- node, Matrix, GAMG vs PCG Symmetrical grid (~7.5*10 6 cells) 300 Matrix, PCG PCG and GAMG linear solver on pressure equation 50 iterations monitoring, starting from a fairly converged solution time (s) Matrix, GAMG Time-per-step evaluated on a percore basis Number of cores

43 Prototype car simulations: absolute and single-node performances (Matrix vs. Jazz) Case description: Time- per- step, single- node, Jazz, GAMG vs PCG Symmetrical grid (~7.5*10 6 cells) PCG and GAMG linear solver on pressure equation 50 iterations monitoring, starting from a fairly converged solution time (s) Jazz, PCG Jazz, GAMG Time-per-step evaluated on a percore basis Number of cores

44 Prototype car simulations: absolute and single-node performances (Matrix vs. Jazz) Case description: Symmetrical grid (~7.5*10 6 cells) PCG and GAMG linear solver on pressure equation 50 iterations monitoring, starting from a fairly converged solution Time-per-step evaluated on a percore basis Comments: Though the very inefficient intranode scaling, the newer Intel arch. is (as expected) much faster than the AMD one If the procs. number is kept in the acceptable scaling range, the GAMG solver is always faster than the PCG one (e. g. 40% faster on 64 Matrix cores)

45 Prototype car simulations: scalability tests (Fermi, symmetrical grid) Case description: Speedup efficiency, 1 6 ppn, PCG vs GAMG Symmetrical grid (~7.5*10 6 cells) PCG and GAMG linear solver on pressure equation 50 iterations monitoring, starting from a fairly converged solution Speedup efficiency (%) Fermi, PCG, 16 PPN Fermi, GAMG, 16 PPN 16 and 32 MPI processes per node considered Number of nodes..(%) 100 ( ) node se = time per step ( time per step ) 1 1 N N nodes

46 Prototype car simulations: scalability tests (Fermi, symmetrical grid) Case description: Speedup efficiency, PCG, 16 ppn vs. 32 ppn Symmetrical grid (~7.5*10 6 cells) PCG and GAMG linear solver on pressure equation 50 iterations monitoring, starting from a fairly converged solution Speedup efficiency (%) Fermi, PCG, 16 PPN Fermi, PCG, 32 PPN 16 and 32 MPI processes per node considered Number of nodes..(%) 100 ( ) node se = time per step ( time per step ) 1 1 N N nodes

47 Prototype car simulations: scalability tests (Fermi, symmetrical grid) Case description: Speedup efficiency, PCG, 16 ppn vs. 32 ppn Symmetrical grid (~7.5*10 6 cells) PCG and GAMG linear solver on pressure equation 50 iterations monitoring, starting from a fairly converged solution Speedup efficiency (%) Fermi, PCG, 16 PPN Fermi, PCG, 32 PPN 16 and 32 MPI processes per node considered Number of nodes What about absolute performance?

48 Prototype car simulations: scalability tests (Fermi, symmetrical grid) Case description: Symmetrical grid (~7.5*10 6 cells) PCG and GAMG linear solver on pressure equation 50 iterations monitoring, starting from a fairly converged solution 16 and 32 MPI processes per node considered time (s) Time- per- step, PCG, 16 ppn vs. 32 ppn Fermi, PCG, 16 PPN Fermi, PCG, 32 PPN Number of nodes Apparently usingo more ppn could be beneficial in terms of absolute performance, but actually when the number of nodes reaches a practical value (64) the benefit vanishes, and in addition

49 Prototype car simulations: I/O performance tests (Fermi, symmetrical grid) Case description: Output generation time, PCG, 16 ppn vs. 32 ppn Symmetrical grid (~7.5*10 6 cells) PCG linear solver on pressure Output generation time and initialization time monitored time (s) Fermi, PCG, 16 PPN Fermi, PCG, 32 PPN 16 and 32 MPI processes per node considered Number of nodes

50 Prototype car simulations: I/O performance tests (Fermi, symmetrical grid) Case description: Initialization time, PCG, 16 ppn vs. 32 ppn Symmetrical grid (~7.5*10 6 cells) PCG linear solver on pressure Output generation time and initialization time monitored 16 and 32 MPI processes per node considered time (s) Fermi, PCG, 16 PPN Fermi, PCG, 32 PPN Number of nodes

51 Prototype car simulations: comments about Fermi runs (symmetrical grid) Case description: Symmetrical grid (~7.5*10 6 cells) PCG and GAMG linear solver on pressure equation 50 iterations monitoring, starting from a fairly converged solution 16 and 32 MPI processes per node considered Comments: The case is of course too small to prove Fermi s real potential, but up to the minimum practical node number (64) the SIMPLE iteration scaling is acceptable (PCG) when the I/O capability of the nodes gets actually saturated, a dramatic drop in the I/O efficiency occurs (and things get even worse with 32 ppn)

52 Further simulations on Fermi: doubled grid Case description: Doubled grid (~15*10 6 cells) Time- per- step, PCG, symm. vs. doubled 3 PCG solver on pressure equation 2,5 2 Only 16 ppn considered Comparison made assuming the same mesh-per-node load distribution (i. e. doubling the number of nodes for the bigger grid) time (s) 1,5 1 0, Number of nodes (symm-double) Fermi, PCG, symm Fermi, PCG, double

53 Further simulations on Fermi: doubled grid Case description: O. g. t., PCG, symm. vs. doubled Doubled grid (~15*10 6 cells) 40 PCG solver on pressure equation Only 16 ppn considered Comparison made assuming the same mesh-per-node load distribution (i. e. doubling the number of nodes for the bigger grid) time (s) Number of nodes (symm-double) Fermi, PCG, symm Fermi, PCG, double

54 Further simulations on Fermi: doubled grid Case description: I. t., PCG, symm. vs. doubled Doubled grid (~15*10 6 cells) 600 PCG solver on pressure equation Only 16 ppn considered Comparison made assuming the same mesh-per-node load distribution (i. e. doubling the number of nodes for the bigger grid) time (s) Number of nodes (symm-double) Fermi, PCG, symm Fermi, PCG, double

55 Further simulations on Fermi: doubled grid Case description: Doubled grid (~15*10 6 cells) PCG and GAMG linear solver on pressure equation Only 16 ppn considered Comments: The SIMPLE iteration weak-scaling performance appears fairly good and thus should encourage more tests on bigger cases, but the I/O issues are confirmed Comparison made assuming the same mesh-per-node load distribution (i. e. doubling the number of nodes for the bigger grid)

56 Conclusions (1) Hi-ZEV a is successful example of how industry can take advantage from the combination of parallelized open-source CFD toolkits and highly qualified HPC infrastructures, in a collaborative project framework The OpenFOAM code has been evaluated on conventional AMD and Intel HPC facilities for external aerodynamics applications, showing: Good accuracy compared to well established commercial CFD codes; Interesting parallel performances (still not totally exploited), at least for small/medium size cases (~ 10 7 cells) and depending on the optimal pressure solver choice (PCG scales better, GAMG is faster for smal procs. numbers)

57 Conclusions (2) The OpenFOAM performances have been assessed also on the BG/Q supercomputer Fermi and, in spite of the (relatively) small size of the considered cases, the following remarks can be extracted: The solver iteration scaling performances are promising (with PCG), especially in the perspective of coping with much bigger problems; Though for the considered cases a more conventional architecture (e. g. Intel Xeon) seems to be a better choice, a deeper investigation should be made in order to include also performance vs. energy consumption aspects; Unfortunately, for massively parallel applications (thousands of processes) a dramatic I/O efficiency question rises (further evaluation needed)

58 Aknowledgments A. De Maio (1), V. Krastev (2), P. Lanucara (3), F. Salvadore (3) M. Testa (1) (for providing the half-car grid and Fluent results) (1) Nu.m.i.d.i.a. S. r. l. (2) Dept. of Industrial Engineering, University of Rome Tor Vergata (3) CINECA Roma, Dipartimento SCAI

59 Workshop HPC enabling of OpenFOAM for CFD applications

OpenFOAM on BG/Q porting and performance

OpenFOAM on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA SYSTEM OVERVIEW OpenFOAM : selected application inside of PRACE project Fermi : PRACE Tier- System Model: IBM-BlueGene /Q