Scalable, Hybrid-Parallel Multiscale Methods using DUNE

Size: px

Start display at page:

Download "Scalable, Hybrid-Parallel Multiscale Methods using DUNE"

Robert McKenzie
6 years ago
Views:

1 MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE R. Milk S. Kaulmann M. Ohlberger December 1st 2014

2 Outline MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 2 /28 Abstraction of Multiscale Methods Implementation + Results Conclusion / Outlook

EXA-DUNE: MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 3 /28 "Flexible PDE Solvers Numerical Methods and Applications" 7 groups from 5 german universities Aim: develop new

3 EXA-DUNE: MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 3 /28 "Flexible PDE Solvers Numerical Methods and Applications" 7 groups from 5 german universities Aim: develop new numerical algorithmic and computational techniques to enable exa-scale computing for PDEs on heterogeneous massively parallel architectures DUNE and FEAST (TU Dortmund hardware-oriented numerics)

4 Problem Statement MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 4 /28 Assume we are looking for u ɛ U solving R ɛ [u ɛ ](v) = 0 v U where R ɛ : U U Example: U = H0(Ω) 1 R ɛ [u] = b ɛ [u] I b ɛ [u](v) = A ɛ u v I(v) = fv Ω Ω A ɛ (x) = A ( ) x ɛ : Ω R d d

Scales MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 5 /28 Idea: Make use of possible scale-seperation Macroscopic scale represented by

5 Scales MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 5 /28 Idea: Make use of possible scale-seperation Macroscopic scale represented by coarse-scale space U H U on coarse partition T H of Ω Microscopic scale represented by fine-scale space U h U on fine partition T h of Ω Spaces should be nested U H U h U

6 Abstract Method MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 6 /28 Let π UH : U U H denote a projection onto the coarse space with U f h = {u h U h : π Uh (u h ) = 0} Compute and approximate discrete solution u ɛ µh = u H + u f h U H U f h satisfying R ɛ µ[u H + u f h ](v H ) = 0 v H U H R ɛ µ[u H + u f h ](v f h ) = 0 v f h U f h Concrete choices for U H U h π UH and further localization of fine scale equation yield various multiscale methods.

7 MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 7 /28 Concrete example: The MsFEM Choose U H = S 1 0(T H ) as a globally continuous finite element space on T H Let U h = S 1 0(T h ) and U ht = S 1 0(T h T) Define local correctors Q T (φ H ) for all T T H : A ɛ Q T (φ H ) φ h dx = A ɛ φ H φ h dx T Find coarse-scale solution u MsFEM span{φ H + Q T (φ H ) φ H U H }: A ɛ u MsFEM φ H dx = f φ H dx T T T φ h U ht φ H U H

8 MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 8 /28 Pure MPI Implementation

9 MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 8 /28 Pure MPI Implementation

10 MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 8 /28 Pure MPI Implementation

11 Testcase setup MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 9 /28 Standard diffusion problem in Ω = [0 1] 3 with boundary conditions u(x) = 0 on Ω\x 3 {0 1} and Neumann-zero elsewhere. With diffusion A ɛ and source f ɛ given as: )) A ɛ (x 1 x 2 x 3 ) := 1 8π 2 2 ( 2 + cos ( 2π x1 ɛ cos ( ) 2π x1 ɛ f ɛ (x) := (A ɛ (x) v ɛ (x)) v ɛ (x 1 x 2 x 3 ) := sin(2πx 1 ) sin(2πx 2 ) + ɛ 2 cos(2πx 1) cos(2πx 2 ) sin ( 2π x1 ɛ )

12 Testcase setup MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 10 /28

13 Scaling setup MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 11 /28 Discretizations based on DUNE-fem DUNE-spgrid (hexahedrons) on coarse and fine scale 32 3 cubes on coarse scale with 16 3 fine cells each 134e+06 total Hardware: Cheops Cluster Cologne: Dual socket 6-core Xeon Westmere nodes Infiniband interconnect BiCGStab with Jacobi preconditioning on both levels

14 Scaling results MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 12 /28

15 MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 13 /28 Wall time distribution

16 MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 14 /28 Problem: Coarse solve Very few degrees of freedom per MPI-rank.

17 MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 14 /28 Problem: Coarse solve Very few degrees of freedom per MPI-rank. Decrease number of ranks while increasing amount of work per rank!

18 MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 14 /28 Problem: Coarse solve Very few degrees of freedom per MPI-rank. Decrease number of ranks while increasing amount of work per rank! Simple for local problems: they re already embarassingly parallel

19 MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 15 /28 new implementation - hybrid mpi

20 MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 15 /28 new implementation - hybrid mpi

21 MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 15 /28 new implementation - hybrid mpi

22 Roadblock MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 16 /28 Discretization framework does not support this mode of thread parallelism.

23 Roadblock MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 16 /28 Discretization framework does not support this mode of thread parallelism. Yes we tried patching it.

24 Roadblock MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 16 /28 Discretization framework does not support this mode of thread parallelism. Yes we tried patching it. Solution? Recast entire code into DUNE-gdt framework.

25 MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 17 /28 DUNE-Generic Discretization Toolkit provides common abstraction for spaces mappers etc. from DUNE-fem DUNE-pdelab and DUNE-fem-localfunctions designed to stack cell-local operations to conserve iterations over the grid leverages interchangeable LA backends from DUNE-stuff: DUNE-istl Eigen dense matrix/vector classes from DUNE-common BSD-2 clause licensed available at

26 New Scaling setup MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 18 /28 CG-FEM Discretizations via DUNE-gdt with DUNE-pdelab backend DUNE-spgrid (hexahedrons) on coarse and fine scale 32 3 cubes on coarse scale with 16 3 fine cells each 134e+06 total or 64 3 cubes on coarse scale with 8 3 fine cells each also 134e+06 total Hardware: Palma Cluster Münster: Dual socket 6-core Xeon Westmere nodes Infiniband interconnect coarse system: DUNE-istl BiCGStab with UMFPACK enabled AMG preconditioning but no parmetis UMFACK direct sparse solve for local problems

27 MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 19 /28 Speedup hybrid pure MPI: 32x16

28 Efficiency: 32x16 MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 20 /28 runtime factor hybrid/pure

29 MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 21 /28 Speedup hybrid pure MPI: 64x8

30 MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 22 /28 Wall time distribution hybrid MPI 32x cores

31 MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 23 /28 Wall time distribution pure MPI 32x cores

32 MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 24 /28 Memory 1 reduction: 64x8 1 rank-averaged peak resident size

33 MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 25 /28 Memory 2 quotient: hybrid/pure 64x8 2 core-averaged peak resident size

34 Conclusions MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 26 /28

35 Conclusions MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 26 /28 Hybrid MPI-SMP parallelism viable but no panacea

36 Conclusions MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 26 /28 Hybrid MPI-SMP parallelism viable but no panacea Never ever switch discretization frameworks on non-trivial code

37 What s next MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 27 /28 Making it easier to implement different methods

38 What s next MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 27 /28 Making it easier to implement different methods Rigorous performance model further single node/single thread optimization

39 What s next MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 27 /28 Making it easier to implement different methods Rigorous performance model further single node/single thread optimization Scaling tests on bigger machines

40 What s next MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 27 /28 Making it easier to implement different methods Rigorous performance model further single node/single thread optimization Scaling tests on bigger machines Field test heterogenous hardware (Intel MIC)

41 What s next MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 27 /28 Making it easier to implement different methods Rigorous performance model further single node/single thread optimization Scaling tests on bigger machines Field test heterogenous hardware (Intel MIC) Detailed SMP analysis: MPI rank per node or per socket? Do we need different load balancing? SIMD kernels?

42 What s next MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 27 /28 Making it easier to implement different methods Rigorous performance model further single node/single thread optimization Scaling tests on bigger machines Field test heterogenous hardware (Intel MIC) Detailed SMP analysis: MPI rank per node or per socket? Do we need different load balancing? SIMD kernels? Fault tolerance error recovery

43 MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 28 /28 Thank you for your attention. Get the code:

Software and Performance Engineering for numerical codes on GPU clusters

Software and Performance Engineering for numerical codes on GPU clusters H. Köstler International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering Harbin, China 28.7.2010 2 3