Parallelizing Lattice Boltzmann Methods with OpenMP

Size: px

Start display at page:

Download "Parallelizing Lattice Boltzmann Methods with OpenMP"

Dortha Harvey
6 years ago
Views:

1 Parallelizing Lattice Boltzmann Methods with OpenMP July 2012

2 Table of Contents 1 Introduction 2 3 4

3 Table of Contents 1 Introduction 2 3 4

4 2D flow with a circular obstacle

5 Infrastructure Intel Manycore Testing Lab (MTL) 1 shared login node, for workload development.

6 Infrastructure Intel Manycore Testing Lab (MTL) 1 shared login node, for workload development. 3 batch nodes for benchmarking.

7 Infrastructure Intel Manycore Testing Lab (MTL) 1 shared login node, for workload development. 3 batch nodes for benchmarking. MPI is not a supported option.

8 Infrastructure Intel Manycore Testing Lab (MTL) 1 shared login node, for workload development. 3 batch nodes for benchmarking. MPI is not a supported option. Red Hat Enterprise Linux Server release 5.4, with kernel el5.

9 Infrastructure Intel Manycore Testing Lab (MTL) 1 shared login node, for workload development. 3 batch nodes for benchmarking. MPI is not a supported option. Red Hat Enterprise Linux Server release 5.4, with kernel el5. Nodes: 4 Intel R Xeon R E7-4860, 10 cores each, 24M Cache and 2.26 GHz Clock Speed.

10 Infrastructure Intel Manycore Testing Lab (MTL) 1 shared login node, for workload development. 3 batch nodes for benchmarking. MPI is not a supported option. Red Hat Enterprise Linux Server release 5.4, with kernel el5. Nodes: 4 Intel R Xeon R E7-4860, 10 cores each, 24M Cache and 2.26 GHz Clock Speed. Login node: 256Gb RAM, Hyperthreading enabled.

11 Infrastructure Intel Manycore Testing Lab (MTL) 1 shared login node, for workload development. 3 batch nodes for benchmarking. MPI is not a supported option. Red Hat Enterprise Linux Server release 5.4, with kernel el5. Nodes: 4 Intel R Xeon R E7-4860, 10 cores each, 24M Cache and 2.26 GHz Clock Speed. Login node: 256Gb RAM, Hyperthreading enabled. Batch nodes: 64Gb RAM, Hyperthreading disabled.

12 Infrastructure Intel Manycore Testing Lab (MTL) 1 shared login node, for workload development. 3 batch nodes for benchmarking. MPI is not a supported option. Red Hat Enterprise Linux Server release 5.4, with kernel el5. Nodes: 4 Intel R Xeon R E7-4860, 10 cores each, 24M Cache and 2.26 GHz Clock Speed. Login node: 256Gb RAM, Hyperthreading enabled. Batch nodes: 64Gb RAM, Hyperthreading disabled. Compilers: Intel 11.1, gcc (4.1.2 and 4.5.1)

13 Infrastructure Intel Manycore Testing Lab (MTL) 1 shared login node, for workload development. 3 batch nodes for benchmarking. MPI is not a supported option. Red Hat Enterprise Linux Server release 5.4, with kernel el5. Nodes: 4 Intel R Xeon R E7-4860, 10 cores each, 24M Cache and 2.26 GHz Clock Speed. Login node: 256Gb RAM, Hyperthreading enabled. Batch nodes: 64Gb RAM, Hyperthreading disabled. Compilers: Intel 11.1, gcc (4.1.2 and 4.5.1) PBSPro 10.2

14 Table of Contents 1 Introduction 2 3 4

15 Thread affinity Intel: export KMP AFFINITY={compact scatter none}

16 Thread affinity Intel: export KMP AFFINITY={compact scatter none} GNU

17 Thread affinity Intel: export KMP AFFINITY={compact scatter none} GNU compact: export GOMP CPU AFFINITY=0-39

18 Thread affinity Intel: export KMP AFFINITY={compact scatter none} GNU compact: export GOMP CPU AFFINITY=0-39 scatter: export GOMP CPU AFFINITY=" "

19 Thread affinity: Compact

20 Thread affinity: Scatter

21 CPU affinity with gcc. Small size gcc, 400x200, 1000 iterations Affinity: none Affinity: scatter Affinity: compact 6 Time Threads

22 CPU affinity with gcc. Medium size gcc, 800x400, 1000 iterations Affinity: none Affinity: scatter Affinity: compact 30 Time Threads

23 CPU affinity with gcc. Large size gcc, 1600x800, 1000 iterations Affinity: none Affinity: scatter Affinity: compact Time Threads

24 Table of Contents 1 Introduction 2 3 4

25 Execution time of code compiled with gcc and icc x400, 1000 iterations gcc icc 30 Time Threads

26 Speedup of code compiled with gcc and icc x400, 1000 iterations gcc icc Speedup Threads

27 Execution time, with a subset of cores restricted via taskset Time gcc, 400x200, 1000 iterations, using taskset Threads 10 processors 20 processors 30 processors 40 processors

28 Execution times, different problem sizes 100 it it it Threads 1600x x x x x x x x

29 Table of Contents 1 Introduction 2 3 4

30 GCC: There seems to be a (scheduling?) bug triggered by our code.

31 GCC: There seems to be a (scheduling?) bug triggered by our code.

32 GCC: There seems to be a (scheduling?) bug triggered by our code. Noticeable impact on performance

33 GCC: There seems to be a (scheduling?) bug triggered by our code. Noticeable impact on performance Reduces instability

34 GCC: There seems to be a (scheduling?) bug triggered by our code. Noticeable impact on performance Reduces instability GCC problem is not triggered when using affinity

35 GCC: There seems to be a (scheduling?) bug triggered by our code. Noticeable impact on performance Reduces instability GCC problem is not triggered when using affinity Additional analysis is needed, in particular, the effect of different scheduling algorithms.

36 References I Introduction Akyil, L., Breshears, C., Corden, M., Fedorova, J., Fischer, P., Gabb, H., Gromova, V., Hoeflinger, J., Hubbard, R., Kukanov, A., O Leary, K., Ott, D., Palmer, E., Pegushin, A., Petersen, P., Rosenquist, T., Tersteeg, A., Tsymbal, V., Voss, M., Zipplies, T.: Intel Guide for Developing Multithreaded Application. Intel Corporation (2011)

37 References II Montes, M., Sacco, C.: Implementación paralela de métodos de Lattice Boltzmann. In: Primer Congreso sobre Los métodos numéricos en la enseñanza, la ingeniería y las ciencias, UTN - Facultad Regional Haedo (August 2010) Montes, M., Sacco, C.: Métodos de Lattice Boltzmann en equipos multicore. In: 2do. Congreso Argentino de Ingeniería Aeronáutica, Instituto Universitario Aeronáutico (November 2010)

38 References III Succi, S.: The Lattice Boltzmann Equation for Fluid Dynamics and Beyond (Numerical Mathematics and Scientific Computation). Numerical mathematics and scientific computation. Oxford University Press, USA (August 2001) Wolf-Gladrow, D.A.: Lattice-gas cellular automata and lattice Boltzmann models an introduction. 1 edn. Springer (March 2000)

Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide

Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide Introduction What are the intended uses of the MTL? The MTL is prioritized for supporting the Intel Academic Community for the testing, validation