Adaptive Refinement Tree (ART) code. N-Body: Parallelization using OpenMP and MPI

Size: px

Start display at page:

Download "Adaptive Refinement Tree (ART) code. N-Body: Parallelization using OpenMP and MPI"

Ruth George
5 years ago
Views:

1 Adaptive Refinement Tree (ART) code N-Body: Parallelization using OpenMP and MPI 1

2 Family of codes N-body: OpenMp N-body: MPI+OpenMP N-body+hydro+cooling+SF: OpenMP N-body+hydro+cooling+SF: MPI 2

3 History: Particle-Mesh (PM): 1980, Klypin & Shandarin Adaptive Mesh Refinement (AMR) with irregular mesh: 1996, Khokhlov N-body ART: 1997, Kravtsov, Klypin, Khokhlov Hydro OpenMP: 2000 Kravtsov, Khokhlov Hydro MPI: 2005 Rudd, Kravtsov Radiative Transfer: , Gnedin, Kravtsov 3

4 Adaptive Refinement: Mesh is refined where the density exceeds a given threshold. Other quantity (such as jumps in pressure) can be used as additional condition for refinement. Refinement field defines how refinement is done. Each cell can be split into 8 new cells, each having twice smaller size. This is ideal for tracing anisotropic structures such as filaments. Adjacent cells can differ not more than by one level Time-step decreases by factor two with each level 4

5 Adaptive Refinement Zero-level Mesh 5

6 First-level Adaptive Refinement 5

7 Adaptive Refinement 5

8 Second-level Adaptive Refinement 5

9 Second-level Adaptive Refinement 5

10 Adaptive Refinement 5

11 Third-level Adaptive Refinement 5

12 Adaptive Refinement 5

13 Smoothing of the Refinement Field is required to reduce the noise in the mesh structure. 6

14 Smoothing of the Refinement Field is required to reduce the noise in the mesh structure. 6

15 Time stepping Scale of the time-step (actually step in the expansion parameter) is defined by the time-step at the zerolevel mesh On each subsequent level of refinement the time-time step decreases by factor two. For a particle moving with a constant speed the fraction of a cell, which it crosses per one time-step is independent on the level of refinement at which the particle moves. Courant condition : particles should not move more than a fraction of a cell per step. It is a global (refinement level independent) condition In practice, maximum particle displacement is of a cell. 7

16 Time stepping Refinement structure is rebuild every zerolevel time step. 8

17 Domain decomposition Used one way or another in MPI codes Rectangular domains Filling curves Load balancing and adaptive domains are issues to handle 9

18 ART: MPI The whole domain of integration is split into non-overlapping covering set of parallelepipeds Boundaries of the parallelepipeds can move in order to equalize the load balance Each domain is handled by one MPI task. Data (coordinates and velocities) of each domain are stored in separate directories 10

19 Example of domain decomposition Nx =3 Ny =4 System with 11 degrees of freedom Boundaries in y- direction may not be aligned All boundaries in x-direction are aligned 11

20 Simple idea how to split domains to get equal load For each domain we have cpu spend on previous time-step Assume that cpu is evenly distributed inside a domain Need to change positions of boundaries so that each domain will have the same cpu density of CPU Distribution of total cpu for all domains with the same x-boundaries

21 Solve 1d linear problem and then apply it to all directions CPU(x) 1 2/3 1/3 13

22 Solve 1d linear problem and then apply it to all directions CPU(x) 1 2/3 1/3 13

23 Solve 1d linear problem and then apply it to all directions It works fine when the cpu-density of domains are not too different CPU(x) 1 2/3 1/3 13

24 Fine tuning of boundary adjustment Case of two boundaries in 1d. Each boundary can have 3 positions: xold-dx, xold, xold+dx. We have 9 possible combinations. Estimate total cpu for each combination and chose the best density of CPU

25 Primary particles inside buffer zone are sent to the domain. xt At larger distances each domain creates large particles and sends them to other domains 15

26 16

27 OpenMP Each domain is handled by OpenMP After making one zero-level step (many steps on high levels), Get cpu timing and redistribute domain boundaries large particles are discarded. Primary particles, which left their domain are sent to their new domain. Primary particles, which come to the domain, are received Receive particles from buffer zone Create, send/receive large particles. Ready to go. 17

28 18

29 19

30 Load balancing 20

Asynchronous OpenCL/MPI numerical simulations of conservation laws

Asynchronous OpenCL/MPI numerical simulations of conservation laws Philippe HELLUY 1,3, Thomas STRUB 2. 1 IRMA, Université de Strasbourg, 2 AxesSim, 3 Inria Tonus, France IWOCL 2015, Stanford Conservation