Ryan C. Hulguin TACC-Intel Highly Parallel Computing Symposium April 10th-11th, 2012 Austin, TX

Size: px

Start display at page:

Download "Ryan C. Hulguin TACC-Intel Highly Parallel Computing Symposium April 10th-11th, 2012 Austin, TX"

Adela Daniel
5 years ago
Views:

1 Ryan C. Hulguin TACC-Intel Highly Parallel Computing Symposium April 10th-11th, 2012 Austin, TX

2 Outline Introduction Knights Ferry Technical Specifications CFD Governing Equations Numerical Algorithm Solver Pseudocode Test Problems Intel MIC Performance of Test Problems Intel MIC Performance after Optimization General Observations

Introduction CPU clock speeds have recently hit a plateau, and further improvements are now achieved through parallel computing Intel has introduced the

use on the Intel MIC software development platform known as Knights Ferry (KNF) The solvers currently run in native mode only, meaning the entire

3 Introduction CPU clock speeds have recently hit a plateau, and further improvements are now achieved through parallel computing Intel has introduced the Intel Many Integrated Core (Intel MIC) architecture that combines many Intel CPU cores onto a single chip Two CFD solvers using OpenMP are developed for use on the Intel MIC software development platform known as Knights Ferry (KNF) The solvers currently run in native mode only, meaning the entire application is launched on Knights Ferry card itself A strong scaling study is done to determine how well these OpenMP applications can effectively use the many cores

4 Knights Ferry Knights Ferry (KNF) is the software development platform (SDP) for the Intel Many Integrated Core (Intel MIC) architecture. Core Count 32 cores Hardware Threads 4 per core IO Bus Memory Type Memory Size PCIe Gen2 GDDR5 2 Gigabytes

5 Euler Equations!!!"!!!!"!!!!"!!!!"!!!!!!!!"!!"!!"!!!!!!!"!!!!!!!!"#!!"#!!!!!!!!!!"!!"#!!!!!!!!"#!!!!!!!!!!"!!"#!!"#!!!!!!!!!!!!!!!!"#!$%&!'&(#"$)!!!!!!!!!!*+&!$%&!,&-./"$"&#!!!"#!$%&!0+&##1+&!!!"#!$%&!$.$*-!&(&+2)!0&+!1("$!,.-13&!

6 BGK Model Boltzmann Equation!!!!"!!!!!!!!!!!!!"!!!!!"!!!!!!!"!!!!!!!!!!!!!!"!!!!"#!$%&!'()*+*","$-!."#$("*/$")0!1/02$")0!+$!."#2(&$&!3&,)2"$-!!!!!!!!!!!!!!!!!!+(&!$%&!."#2(&$&!4),&2/,+(!3&,)2"$"&#!!!!!!!!!+(&!$%&!*/,5!6+3&(+7&8!3&,)2"$"&#!!!"#!$%&!0/4*&(!.&0#"$-!!!"#!$%&!$&4'&(+$/(&!!!"#!$%&!7+#!3"#2)#"$-!"0.&9! :0!"#!$%&!:0/.#&0!0/4*&(!;%"2%!2%+(+2$&("<&#!$%&!1,);! :0!"#!$%&!(+$")!*&$;&&0! $%&!4&+0!1(&&!'+$%!!!!+0.!$%&!2%+(+2$&("#$"2!,&07$%!!!!!

7 Comparison of the CFD solvers Two separate CFD solvers are developed to showcase the capability of the Intel MIC coprocessors The first is based on the Euler equations The second is based on the Boltzmann equation Both are developed using a Newton based iterative algorithm to converge the solutions Data parallelism on the Intel MIC coprocessor cores is achieved through the use of OpenMP threads. Euler Solver Boltzmann Solver Number of equations per physical grid point 5 Hundreds of thousands Target applications Inviscid fluid flow Rarefied gas flow

8 & Numerical Algorithm!"#$%&'&%(%)"%$'*&+$,&(-&$./',"(%+0&!!!!0&& 122)"3',"(%&(-&4$5,(%6+&7$,8(9&*$+/),+&"%&!!!!!!!!!!!!!!!!!!!! & 58$*$&!&"+&,8$&4$5,(%&",$*',"(%&'%9&!&"+&,8$&:'3(;"'%&'+&9$-"%$9&;$)(5&!!"!!!!!!! &<&!!!!!!!& =$'**'%>"%>&,$*7+&)$'9+&,(&!!!!!!!& 58$*$&!!!!!!!!!!! )"%$'*"A$9&+B+,$7&(-&$./',"(%+&>"#$+&!!!!!!!!!!!!!!!!"!!!!!!!!!! More details can be found in the dissertation of Glenn Brook Brook, R. Glenn, A Parallel, Matrix-Free Newton Method for Solving Approximate Boltzmann Equations on Unstructured Topologies, PhD Dissertation, University of Tennessee at Chattanooga, December &

9 Numerical Algorithm Continued!"#$%&'()*$*+#,&+*-#$./0&+#$#1.&+*(2$'&2$)#$ '&3+$*2+($&$0#4+&$5(,6.4&+*(2$&3$3"(72$)#4(7$!!!!!!!!!!!!!!!!!!!!!"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! $!!!"#$8&'()*&2$0(#3$2(+$2##0$+($)#$ #9/4*'*+4:$'&4'.4&+#0$&20$3+(,#0$ ;+$'&2$)#$'&4'.4&+#0$*6/4*'*+4:$+",(.<"$ +"#$.3#$(5$0.&4$2.6)#,3$&20$!&:4(,$ 3#,*#3$#9/&23*(23$*2$+"#$0.&4$3/&'#=$!!!!!!!!!!!!"#$!!!!!!!!!!!!!!! $!!!!"#$!!!!!!!!!!!!!!! $!!!!!!!"#$!!!!!!!!!!! $ $ 7"#,#$!!!!>$!!!$*3$&2$&,)*+,&,:$/#,+.,)&+*(2$ &20$!! $*3$!+"$-#'+(,$*2$+"#$3+&20&,0$)&3*3$5(,$!! =$!!!!!!!!!!! $

10 Solver Pseudocode Parallel for loop over grid points Set initial values End parallel for loop over grid points Loop over timesteps Loop over number of Newton iterations Apply boundary conditions Parallel for loop over grid points Reset values of dq to zero End parallel for loop over grid points Loop over number of Jacobi iterations Parallel for loop over grid points Solve for ddq via Jacobi iteration End parallel for loop over grid points Parallel for loop over grid points Update dq using ddq End parallel for loop over grid points End loop over number of Jacobi iterations Parallel for loop over grid points Apply dq End parallel for loop over grid points End loop over number of Newton iterations End loop over timesteps

Test Problems Run on the Intel MIC Sod Shock Initial conditions:! =1.0 u = 0.0 P =1.0! = 0.125 u = 0.0 P = 0.

11 Test Problems Run on the Intel MIC Sod Shock Initial conditions:! =1.0 u = 0.0 P =1.0! = u = 0.0 P = 0.1 Unsteady flow problem using the Euler equations The first test problem uses the Euler solver to simulate a Sod Shock. In a shock wave, the properties of a fluid change almost instantaneously. The standard Sod Shock starts off with a fluid at rest with initial conditions shown to the left. The Sod Shock is a popular test case for verifying a solver s ability to appropriately capture shocks and contact discontinuities in unsteady fluid flows.

12 Test Problems Run on the Intel MIC The second test problem uses the Boltzmann solver to simulate a Couette flow. In Couette flows, gas is initially at rest between two infinitely long parallel plates. For this problem, the left plate is stationary while the right plate moves. Over time, the gas settles into a solution that does not change. Couette flow makes a great test problem to verify a solver s ability to handle solid surfaces and moving boundary conditions. u wall = 0m/s T wall = 273.0K Couette Flow! 0 = 9.28!10 "8 kg/m 3 u x0 = 0.0m/s u y0 = 0.0m/s T 0 = 273.0K Kn =1.199 Steady state flow problem using the BGK model Boltzmann equation u wall = 300m/s T wall = 273.0K This particular test problem used 27 grid cells in physical space and 36x36x36 grid points in velocity space

13 Euler Solver Solution

14 BGK Model Boltzmann Solver Solution

15 Intel MIC Performance 32 Strong Scaling Study of Euler Solver on KNF 16 Speedup 8 4 Single Precision 2 Double Precision Double Precision (2x Problem Size) Number of OpenMP Threads

16 Intel MIC Performance Continued 32 Strong Scaling Study of BGK Solver on KNF 16 Speedup 8 4 Single Precision Number of OpenMP Threads

17 Initial Intel MIC Performance Remarks Previous speedup results were gathered using untuned and unoptimized versions of the solvers Rob Van der Winjgaart, a senior Intel software engineer, put the model BGK Boltzmann code through phases of optimization His optimizations include fusing loops to expose more parallelism vectorizing loops through alignment and compiler directives using intrinsics where appropriate

18 Intel MIC Performance Revisited 64 Strong Scaling Study of Optimized BGK Solver on KNF Speedup No affinity Compact Scatter Number of OpenMP Threads

19 Couette Problem Revisited In the previous Euler solution, doubling the problem size improved the overall speedup plot Here, the Couette Problem is rerun using the optimized solver provided by Rob, but this time using 37 grid cells, and 46x46x46 grid points in velocity space That is over a 200% increase in the number of state variables to be solved

20 64 Strong Scaling Study of Optimized BGK Solver on KNF with Larger Problem Size 32 Speedup No affinity Compact Scatter Number of OpenMP Threads

21 Observations from CFD Applications developed for Intel MIC Performance results indicate that iterative algorithms using OpenMP threads can effectively use the Intel MIC cores General optimization, vectorization of loops, and making sure that each thread has plenty of work are all key to using the Intel MIC cards effectively Parallel speedups greater than the total number of cores available can be achieved

22 Acknowledgments Thanks to Intel and NICS for making this research possible Special thanks to Rob van der Wijngaart for optimizing the BGK model Boltzmann solver and explaining the optimizations Particular thanks to R. Glenn Brook for coordinating this research and providing guidance and support

23 Contact Ryan C. Hulguin

Vincent C. Betro, R. Glenn Brook, & Ryan C. Hulguin XSEDE Xtreme Scaling Workshop Chicago, IL July 15-16, 2012

Vincent C. Betro, R. Glenn Brook, & Ryan C. Hulguin XSEDE Xtreme Scaling Workshop Chicago, IL July 15-16, 2012 Outline NICS and AACE Architecture Overview Resources Native Mode Boltzmann BGK Solver Native/Offload