Technische Universität München Fakultät für Informatik. Computational Science and Engineering (Int. Master s Program)

Size: px

Start display at page:

Download "Technische Universität München Fakultät für Informatik. Computational Science and Engineering (Int. Master s Program)"

Justina Barnett
5 years ago
Views:

1 Technische Universität München Fakultät für Informatik Computational Science and Engineering (Int. Master s Program) Parallel Refinement and Coarsening of recursively structured Adaptive triangular grids Master s Thesis Anas Obeidat 1st examiner: Jun.-Prof. Dr. Michael Bader 2nd examiner: Univ.-Prof. Dr. Hans-Joachim Bungartz Assistant advisor: Csaba Vigh, M.Sc. Thesis handed on:

2 I hereby declare that this thesis is entirely the result of my own work except where otherwise indicated. I have only used the resources given in the list of references Date Anas Obeidat

3 Abstract This master thesis is concerned with the parallel implementation of a refinement and coarsening schemes that were applied on an recursively adaptive structured triangular grids. This implementation contributes in a previous work to simulate the Propagation of Oceanic Waves (Tsunami simulation) using Sierpinski space-filling curve and system of stacks to exchange the information. ii

4 Acknowledgment First of all I would thank Prof. Michael Bader for his guidance during this interesting and challenging topic. Also I do not want to forget the cooperation and support that I got from M.Sc. Vigh, and my other partner in this thesis Dermirel, I would like to thank Vigh for his strong support and tips during the last six months even that he was in the US for the last three months, also the friendship and co-work that I got from my friend Ömer Demirel. Finally I would like to send my respect and appreciation to Prof. Hans-Joachim Bungartz and his Scientific Computing chair at TU München for giving me the opportunity to take part in this international master program, where I obtained deep knowledge and advanced skills in computational science. iii

5 Contents 1 Introduction 1 2 Mathematical Modeling Shallow water equation Vector form of SWE Weak form of the Shallow Water Equations Discretization Numerical Flux The Grid Grid Generation Saving the Grid Traversing the Grid Sierpinski filling curve Triangle Types and Stack System Triangle Types Colouring The Stacks System The New Stacks Parallel grid and load redistribution 24 iv

6 Contents 5.1 Parallel grid Implementing the parallel grid Load redistribution Diffusion All-to-All Test cases and results Dynamic Symmetric Circle Dynamic non-symmetric Circle Moving Target Results Implementation Fork and Join Initial Traversal Empty Traversal Adapt_mark Traversal Edge_mark Traversal Adapt Traversal The implementation Outlook 39 Bibliography 40 v

7 1 Introduction This work is taking part in the Propagation of Oceanic Waves (Tsunami simulation). This thesis is about implementing a Parallel refinement and coarsening frame work, that is considered to be the next step of the previous frame work. In previous frame work we had a refinement and coarsening of recursively structured adaptive triangular grid, the main focus of my work is to run this framework in parallel. Some extra algorithms and schemes were implemented in the new frame work, taking in consideration not to loose the basic implemented ideas like recursively structured grid, traversing the grid via Sierpinski curve, and the stack system. Chapter 2 will talk about the Mathematical Modeling even that I did not contribute in this part, as my focus was on the grid itself, however in this chapter I will give a fast explanation about the model like the Shallow water equation the vector form of SWE. Then I will talk about the grid, generating, saving, and traversing it. In Chapter 4 I will explain the old Stack System, the new one, and the Triangle Types, implementing the grid in parallel and the schemes used for load-redistribution will take part in Chapter 5,then i will show the results, the test cases, the implementation, and in the Outlook a short talk about the next steps will be explained. 1

8 2 Mathematical Modeling As my work contributes in The simulation of the propagation of oceanic waves (Tsunami Simulation), hence, my work concerned on the data structure and the grid itself here I will talk shortly about the mathematical modeling which was presented by Schweiger[12], Böck[7], and finally by Demirel[8]. The mathematical modeling was concerned about using Discontinuous Galerkin Solver to solve the Shallow water equation. 2.1 Shallow water equation SWE is widely used to model the dynamics of incompressible fluids and in our situation the water, often refers as partial differential equations (PDEs), were it is a nonlinear set of hyperbolic mass and momentum conservation laws[8]. SWE can be used to model the flows in the ocean as the flow s depth scale is too short compared to the length, the same situation is applied on the Tsunami waves The Shallow Water Equations are described by deriving three-dimensional Navier- Stokes equations as the following [1]: ξ t + (vh) = 0 (2.1) where v t + v v + τ bfv + f c k v + g ξ ν T H (Hv) = 1 H F (2.2) ξ: The vertical deviation from the flat ocean surface. v: The velocities in x- and y-directions. v = (v x, v y ). H: The total height of the wave is modeled as H = ξ + h b 2

9 Chapter 2. Mathematical Modeling 2.2. Vector form of SWE F : Some configurations such as the pressure of the atmosphere. τ bf : The friction coefficient on the bottom of the ocean. f c : Fictitious(Coriolis) force. k: Respective local normal vector. g: Gravitational acceleration. ν T : Depth-averaged viscosity of the fluid. Fo more detailed see [8] 2.2 Vector form of SWE To be able to solve the SWE a vector form was derived in many ways see [1] [2], the method presented by [10] was used to present the following vector form [8]: u t + divf(u) = 0 (2.3) 2.3 Weak form of the Shallow Water Equations As we need to avoid the strong form of the SWE, DGS was used as it works with the weak form of SWE. The method of how to get the weak form of SWE can be derived can be seen in [8], [12] SWE in their weak forms: Ω ξ t Ω ϕdxdy + F j (ξ) nϕds F j (ξ) ϕdxdy = 0, ϕ V, j = 1 : 3 (2.4) Ω With the integer j is the jth component in the F Flux vector. Ω is the boundary of the Ω and the n is the vector normal to that boundary[8] 3

10 Chapter 2. Mathematical Modeling 2.4. Discretization 2.4 Discretization The triangulation scheme has been chosen in the previous work presented by Bader[4] to discretize the domain, in order to be able to solve the problem numerically using DGC, more details about the domain and triangulation are explained in the next chapter. where the volume V consists of piecewise discontinuous polynomials p k in their respective intervals I k [12]. Each of triangular elements T k has a polynomial p k function which we do not demand to be continuos on the shared edges, this fact gives us the test volume V that that DGS needs, where the volume V consists of piecewise discontinuous polynomials p k in their respective intervals I k [8]. So the test volume V looks like V m = {p(x, y) : T k R p is a polynomial of degree n and n m on T k } 2.5 Numerical Flux In our domain, the physical quantities such as mass and impulse need to be exchanged between the neighbour elements. beccause of the discontinuity we can not compute the boundaries of the triangles directly, For that, we have to make approximations on the boundaries using the numerical fluxes F on every shared edge. And those fluxes exchange during the traversals to ensure a correct simulation. 4

11 3 The Grid In this section we will talk about the Grid, how to generate, save, traverse it. Propagation of Ocean waves meets several challenges considering the Grid and data structures side, as we require: Strong Adaptivity Huge number of cells Efficiency in storing the grid Load balancing and distributing the grid s cells over the processes. An acceptable speed on refining and coarsening the grid. All of those challenges and requirements make us think carefully in generating and saving the grid. 3.1 Grid Generation It is important in any successful Computational Science Simulation to have an accurate and fast solution, and a good discretization of the geometry is crucial to have a high quality solution of the numerics. We adapt in our code a Structured Grid figure 3.1 which allows us to store the computation grid efficiently as it helps us to reduce the amount of the grid information that need to be stored. 5

solver - which is important to get a fast solution for the system of equation that obtained from the discretization process-with a minimal amount of memory.

12 Chapter 3. The Grid 3.1. Grid Generation Figure 3.1: Structured Grid A research group [5] has represented a method based on recursively sub structured grids, we use this method because it allows us to implement an iterative multi-grid solver - which is important to get a fast solution for the system of equation that obtained from the discretization process-with a minimal amount of memory. For all of that our Grid (Computational domain Ω) is constructed using a recursive bisection of triangular grid, in another way we start from a parent triangle cell, each cell is recursively subdivided into two children until a desired resolution or level of adaptive refinement is obtained figure 3.2 Figure 3.2: Recursive Bisection of Triangular Grid 6

13 Chapter 3. The Grid 3.2. Saving the Grid The Grid is supposed to meet the following requirements: 1. Adaptivity: In math point of view to get a small error as much as possible, so the finer the cells are the more accurate. The results are, but from computer science view finer cells lead to more memory to be saved. So To have a good balance between both we only refine the mesh as much as we can in the places where more is expected, and we coarse as much as we can on the other places 2. Conformity[5]: It means that no hanging cells are allowed in the generated adaptive grid. We might face hanging nodes in the case where two cells adjacent to a marked edge were bisected, so to avoid this situation a communication is required during the refinement of the adaptive grid, this means when a cell is chosen to be refined usually the adjacent cells will have to be refined too so we can keep the conformity, but this forced refinement might lead us to force refinement to a further cells and so on in which we call a cascade of refinement as we can see in Figure 3.3 Figure 3.3: Refinement cascade: the requested refinement of the dark-coloured cell (thick line) forces the refinement of four further cells (dashed lines)[5] 3.2 Saving the Grid From the computer science point of view we try to save as less information about our grid as we can, now using a recursive structured adaptive grid which is characterized by solid-neighbours relations and it s recursively refined gives us little information to be saved. 7

14 Chapter 3. The Grid 3.2. Saving the Grid One way of representing such a grid is by using a binary tree, which we will call a refinement tree 3.4 Figure 3.4: Recursively constructed Triangular Grid and its corresponding Binary Tree[6] For example, if we start storing only the neighbourhood relation on the most coarse level, so the nodes represent the triangles, where the root is the parent cell-triangle-, then we refine that parent cell by representing that as two children for that parent cell, and by keep going until the desired level, the result will be a binary tree that represents our refined grid 3.5, so this scheme leads us to have the triangles that are used for the numerical calculation to be represented by the leaves of the refinement tree. 8

15 Chapter 3. The Grid 3.2. Saving the Grid Figure 3.5: Step by Step refining the grid and building its corresponding Binary TreeTree[16] Traversing the refinement tree in depth-first scheme gives us a sequential order of the grid cells that are equivalent to the Sierpinski space-filling carve order, which linearized our grid, thus, we represent the grid s cells as a bitstream or triangle_stream, so we need one bit per cell to represent the refinement information. One bit per cell (triangle) 9

16 Chapter 3. The Grid 3.3. Traversing the Grid where: The refined cells are labeled by 1. The leaves are labeled by 0. The leaves are labeled by 0. If we Traverse the refinement tree of the adaptive grid in 3.4 in depth-first order we will get the following bitstream: Traversing the Grid Now as we are using an iterative scheme we want to go through this grid, so we might think about doing that simultaneously but we might face a problem that we need to store the location of each cell with it s neighbours, which is expensive. As a solution we traverse the grid in cell-oriented scheme-cell by cell-to forward and backward, to perform several Adaptive refinement and Coarsening traversals as much as required, but we are facing a problem in knowing the order of the element to be traversed and to be sure that all the elements got visited, and no element has been visited twice. As a solution we use the algorithmic scheme presented by Bader and Zenger [4] allowed us to implement such an iterative solver without the need to store the neighbour s relationships, the algorithm is based on the space-filling curve, and for the triangular grid, so this is Sierpinski filling curve Sierpinski filling curve Several filling curve techniques such as Piano, Hilbert, and Sierpinski are available, but as the first two techniques are working on Grid the subdivided into squares Sierpinski is using Triangular Grid so it fits perfectly in our domain [11] Figure 3.6 shows the generation of Sierpinski curve in one coarse triangle and the levels of Sierpinski curve as we refined that parent cell 10

17 Chapter 3. The Grid 3.3. Traversing the Grid Figure 3.6: The first six levels of Sierpinski filling curve [14] So as illustrated in the previous figure the curve runs on all the cells (elements), which traverse the grid in a linear way. Traversing the refinement tree in depth-first order which produce the Linear bitstream and traversing the grid using Sierpinski curve is giving us the same order 3.7, also we do not need to save the neighbourhood so we do not need extra memory Figure 3.7: Traversing the grid using Sierpinski curve and the corresponding refinement tree [14] 11

18 4 Triangle Types and Stack System 4.1 Triangle Types Sierpinski curve as mentioned by Schweiger[12] is ruled by traversing the grid parallel to the Hypotenuse as it enters the grid cell. This rule divides our grid cells into three types according to how the Sierpinski curve is visiting them 4.1. Thus the main three types are: H: The SFC enters from the hypotenuse and exit through the leg. K: The SFC enters from the leg and exit through the hypotenuse. V: The SFC enters from the leg and exit through the other one. Figure 4.1: Grid s triangle types according to how SFC visits them. Refining those triangles generate six recursive relations as shown in Figure 4.2, Where: H o (V o, K o ), H n (V n, K o ), V o (H o, K o ), V n (H n, K n ), K o (H n, V o ), K n (H n, V n ). o stands to old edge, the edge that carries the current flux. n stands to new edge, the edge that its flux will be updated. 12

19 Chapter 4. Triangle Types and Stack System 4.1. Triangle Types Old edge indicates that this edge carries the current flux term, new edge is the edge that it s flux value will be updates in the next iteration[12]. Figure 4.2: Triangle types, six recursive relations Colouring Colouring scheme was introduced in-order to know from which stack the edge should be pushed/popped. We use the triangle types to colour the edges (Green or Red), which help us to know exactly to which stack the current element should be pushed/popped. Node colouring In Schraufstetter s[14] work they used to push/pop the Triangles Node into the corresponding Stack as they used a classical Finite Element methods, were we have nodelocated unknowns -in addition to cell- and edge-located unknowns-. Node colouring can be seen in figure

Triangle s reference colour Before talking about Edge colouring we will elucidate the Triangle s reference colour which is used in the Edge colouring as we will see next.

20 Chapter 4. Triangle Types and Stack System 4.1. Triangle Types Figure 4.3: Node Colouring [14] As we are using DGS where out data is suited on the edges (Flux s), plus we need to synchronize the refined edges, edge colouring scheme has been chosen. Triangle s reference colour Before talking about Edge colouring we will elucidate the Triangle s reference colour which is used in the Edge colouring as we will see next. Triangle s reference colour was also described by Schraufstetter[14], we starts with four green-coarse triangles which we give them the reference colour manually 4.4. Figure 4.4: four green-coarse triangles Now the scheme simply is as we refine the parent triangle. The two children take the opposite reference colour that the parent has, so if the parent s reference colour is green the reference colour of the children will be red 4.5. How ever using the triangle types to 14

21 Chapter 4. Triangle Types and Stack System 4.1. Triangle Types colour the edges should be enough but in our code we still use the Triangle s reference colour to help doing that. Figure 4.5: Triangle colouring Edge colouring Edge colouring scheme was introduced by VIGH[15], we use the triangle s reference colour and the triangle type to colour the edges 4.6, the rule for colouring the edges is: The hypotenuse takes the opposite colour of the triangle, while the two legs take the same triangle colour But, what we really interested in is the edge that the Sierpinski curve is not crossing, we will call it the colour_edge, however, we are also colouring the other edges according to the rule. For example: Ho/n: The colour_edge is the left leg, so it takes the same colour as the triangle. Ko/n: The colour_edge is the Right leg, so it takes the same colour as the triangle. Vo/n: The colour_edge is the Hypotenuse, so it takes the triangle s opposite colour. 15

22 Chapter 4. Triangle Types and Stack System 4.1. Triangle Types Figure 4.6: Edge colouring As a summery our grid is looking like 4.7: Figure 4.7: The grid with coloured triangles and edges 16

Chapter 4. Triangle Types and Stack System 4.2. The Stacks System 4.2 The Stacks System So in the previous work Schweiger[12], and Böck[? ] introduced four Stacks-types those four types are 4.

23 Chapter 4. Triangle Types and Stack System 4.2. The Stacks System 4.2 The Stacks System So in the previous work Schweiger[12], and Böck[? ] introduced four Stacks-types those four types are 4.8: Input: store the element before the traversal starts. Output: store the element After the traversal ends. Red: store the red elements Green: store the green elements Figure 4.8: The old stack system[6] Though we introduce a new stack system to face the challenge of having a full Parallel refinement and coarsening code The New Stacks Taking benefit of how Bader[6], Schweiger[12], have classified the edges, we introduced new edges type figure 4.9: The first edge in the element that Sierpinski curve visits. 17

Chapter 4. Triangle Types and Stack System 4.2. The Stacks System Current_edge: The first edge in the element that Sierpinski curve visits. Next_edge: The next edge the Sierpinski curve visits.

24 Chapter 4. Triangle Types and Stack System 4.2. The Stacks System Current_edge: The first edge in the element that Sierpinski curve visits. Next_edge: The next edge the Sierpinski curve visits. Coloured_edge: The element s edge that the curve is not crossing Domain_boundary_edge: The element s edge that the curve is not crossing.it might be any of the previous types, so it s an additional characteristic to the element s edge, defining that this is also a domain boundary edge. Process_boundary_edge: It might be any of the first three types, it gives an extra characteristic to the edge, defining that this edge is in between two different processes. Figure 4.9: Edge Types 18

25 Chapter 4. Triangle Types and Stack System 4.2. The Stacks System As a result of introducing new types of edges; new data structure has been introduced too: Crossed_edge Colour_edge_output Colour_edge_input Colour_temp_edge Process_boundary So in total we have eight stacks-all First In Last Out-(LIFO) as described in [15], where each of the Colour_edge_input/output, Colour_temp_edge, Process_boundary are basically two stacks green one and red one depending on the colour of the edge. While the Crossed_edge is a linear array. One Advantage of this new Stack implementation that no data left in the colour stack after finishing one full traversal where we ensure that all the edges go to the output stack, however this implementation is more complicated than the pervious one and needed more code etc.., plus it generated two different output streams. Crossed_edge In the initial traversal Sierpinski curve start traversing the grid for the first time, and here the crossed_edge is filled by the edges that the curve is crossing one by one. At the end of the traversal the crossed_edge is filled by the current_edge and next_edge those edges are then coloured and pushed to the corresponding stack, The last pushed edge will be the first popped when the next traversal starts as we go forward and backward. Colour Stacks The coloured stacks are: colour_edge_output: At the end of the traverse contains all the old coloured edges colour_edge_input: At the beginning of the traverse contains all the new coloured edges colour_temp_edge: The none domain_boundary_edges are pushed here, cause it might be used later. 19

26 Chapter 4. Triangle Types and Stack System 4.2. The Stacks System process_boundary_edge: The edges that are shared between two processes are pushed here. The algorithm for popping/pushing the colour_edge work exactly for the triangles of type Ho/n, Ko/n where we use the same colour reference that the parent has to know to which stack we push/pop the colour_edge, while in Vo/n we use the opposite colour reference. Pushing the colour_edge Algorithm 4.1 H o,k o -Push colour = parent_colour_ref erence P ush(colour_edge) colour_edge_output_stack(colour) Algorithm 4.2 V o -Push colour = N OT (parent_colour_ref erence) P ush(colour_edge) colour_edge_output_stack(colour) In old triangle some calculation are done and the edge will not be used in the current traversal anymore, so the colour_edge is pushed to it s corresponding coloured output_stack. Algorithm 4.3 H n,k n - Push if (colour_edgeisdomain_boundary_edge) then colour = parent_colour_ref erence P ush(colour_edge) colour_edge o utput_stack(colour) else colour = parent_colour_ref erence P ush(colour_edge) colour_temp_edge_stack(colour) end if 20

27 Chapter 4. Triangle Types and Stack System 4.2. The Stacks System Algorithm 4.4 V n - Push if (colour_edgeisdomain_boundary_edge) then colour = N OT (parent_colour_ref erence) P ush(colour_edge) colour_edge o utput_stack(colour) else colour = N OT (parent_colour_ref erence) P ush(colour_edge) colour_temp_edge_stack(colour) end if The new triangle means that we need to check if the edge is a domain_boundary_edge, if it is we push it directly to the output_stack; cause we know that this edge will not be used by other triangles. But, if it is not so we need to push it to the temp_edge_stack as it is shared with another triangle to the neighbour cell. Popping the colour_edge Algorithm 4.5 H o,k o - Pop if (colour_edgeisdomain_boundary_edge) then colour = parent_colour_ref erence colour_edge = P op(colour_edge_output_stack(colour)) else colour = parent_colour_ref erence colour_edge = P op(colour_temp_edge_stack(colour)) end if Algorithm 4.6 V o - Pop if (colour_edgeisdomain_boundary_edge) then colour = N OT (parent_colour_ref erence) colour_edge = P op(colour_edge_output_stack(colour)) else colour = N OT (parent_colour_ref erence) colour_edge = P op(colour_temp_edge_stack(colour)) end if Popping is the opposite case so if the triangle is old and the colour_edge is a domain boundary we pop it from the colour_input stack, and if it is not we take it from the colour temp_edge stack. 21

28 Chapter 4. Triangle Types and Stack System 4.2. The Stacks System Algorithm 4.7 H n, K o -Pop colour = parent_colour_ref erence colour_edge = P op((colour_edge_output_stack(colour)) Algorithm 4.8 V n,-pop colour = N OT (parent_colour_ref erence) colour_edge = P op((colour_edge_output_stack(colour)) For new triangles there is no need to check of the colour_edge is a domain_boundary_edge, cause the edge will be used for the first time so we pop it directly from the colour input stack Process_Boundary_edge Also called the communication stacks, they are used to exchange the refine/coarsen information, and the Flux s between the shared edges. Communication stacks are allocated with an over estimated number of process_boundary_edges before starting visiting the cells as following: in each p r o c e s s : a l l o c a t e edge mpi s t r u c t u r e c r o s s e d / coloured_edges = 1 + (2 (max_depth 1) ) && num_c o a r s e_t r i a n g l e s c o l o u r_temp_edges= c r o s s e d / coloured_edges /2 do i =0, i < MPI_s i z e 1 a l l o c a t e 4 Process_boundary_s t a c k s ( c o l o u r_temp_edges /16) end do After That we initialize four communications stacks where the process_boundary_edges are pushed/popped to depending on their colour (Green, Red) and the current Sierpinski direction (Forward, Backward) as following: i n i t i a l i z e p r o c e s s boundary : do i =0, i < MPI_s i z e 1 Process_boundary_stack ( Forward, Red) Process_boundary_stack ( Backward, Red) Process_boundary_stack ( Forward, Green ) Process_boundary_stack ( Backward, Green ) end do 22

29 Chapter 4. Triangle Types and Stack System 4.2. The Stacks System Initially, we traverse the grid and discover the shared edges and pop them to their corresponding process_boundary_stack. Then we use to functions synchronize/update process_boundary_edges to merge and update the edges with the process_boundary_edges. 23

30 5 Parallel grid and load redistribution 5.1 Parallel grid Distributing the Computational domain Ω over several process s, where each has its own part of the domain and use the communication stacks to ensure correct exchanging of the process_boundary_edges during the traversals 5.1. Adaptivity force to keep the process_boundary_edges in the communication stacks up to date, so we need to synchronize and update the communication stacks with the interprocess edges to maintain the conformity and the correctness of the flux terms. Figure 5.1: Process boundary between two processes 24

31 Chapter 5. Parallel grid and load redistribution 5.1. Parallel grid Implementing the parallel grid To implement a parallel adaptive grid Vigh has introduced Process boundary stack, where we have four stacks that we push/pop the process_boundary_edges according to their colour and to the current traversal s direction forward/backward, the stacks were introduced in the previous chapter as following: Process_boundary_stack(Forward, Red, MPI_neighbour_rank) Process_boundary_stack(Backward,Red, MPI_neighbour_rank) Process_boundary_stack(Forward,Green, MPI_neighbour_rank). Process_boundary_stack(Backward,Green, MPI_neighbour_rank). Those stacks are called communication stacks, and MPI_neighbour_rank is the rank of the process that the process_boundary_edge is shared with. So as we traverse the grid some elements get refined and some get coarsened, so the processes need to communicate and exchange the process_boundary_edges which contain some important data like the current depth, refine the edge, coarsen the edge, etc, those data are necessary to maintain the conformity of the grid. The implementation After allocating and initializing the communication stacks we split the domain over the processes, so each process will know which part of the grid it responsible to. Then we start marking the edges to know which needs to be refined and which to be coarsen here we need to check if this edge belongs to the process region and if Yes we need to synchronize/update the new data of the edge with it s corresponding communication stack so the stacks will keep track of the edges that need to be refined and coarsen. Then at the end of the traversal each process send/receive the shared elements of the Process_boundary_stack to its neighbour this is done using MPI_SEND_RECV [17]. For instance: send_recv(process_boundary_stack(red,forward), process_boundary_stack(red,backward)) In the following Figure 5.2, Process one (P1) sends Process_boundary_stack(Red,Forward) Process two (P2) receives in Process_boundary_stack(Red,Backward) 25

32 Chapter 5. Parallel grid and load redistribution 5.2. Load redistribution Figure 5.2: Process boundary stack communication 5.2 Load redistribution Our grid is an adaptive one, during the traversal we coarsen and refine the element according to a specific geometry, which affects the size of the stream of element, also affect the distribution of the elements over the processes, which might lead us to a bad load-balancing. As example of bad load-balance can be seen in 5.3, where we have a grid that is deeply refined at one point and coarsened; otherwise, now distributing this grid on several processes without load-balancing leads us to have one or two processes sharing the point where the grid is deeply refined so they handle much bigger number of edges/elements than the other processes. 26

33 Chapter 5. Parallel grid and load redistribution 5.2. Load redistribution Figure 5.3: A refined grid that leads to bad load-balance Depending on the data structure we implemented two load redistributing schemes diffusion, and All-to-All Diffusion This scheme will redistribute the grid s numerics cells (triangle_num_stream), those cells contain the following numerics: ξ: The vertical deviation from the flat ocean surface. v: The velocities in x- and y-directions. v = (v x, v y ). The implementation Now each process has its own region of the stream, during adaptation the region s size might change because we append more elements (Refine) or we delete some (Coarsen). At the end of one full traversal and before starting the next initial one we redistribute the triangle_num_stream over the processes in a Diffusion scheme, where we send/re- 27

Chapter 5. Parallel grid and load redistribution 5.2. Load redistribution ceive 40% of the difference between two direct neighbour Processes in the Sierpinski order.

34 Chapter 5. Parallel grid and load redistribution 5.2. Load redistribution ceive 40% of the difference between two direct neighbour Processes in the Sierpinski order. Of course, this simplifies the load balancing step, because the communication pattern becomes much simpler. To find the best percentage we tried several ones and then we found that 40% works good in our test cases. In the following figure 5.4 we illustrate the Diffusion scheme, where the triangle_num_stream is distributed over four processes. Figure 5.4: An example of Diffusion scheme All-to-All This scheme will redistribute the grid s cells (triangle_stream). All-to-All means that the processes contribute to the result, and All processes receive the result, we are using MPI_ALLGATHER, and MPI_ALLGATHERV which are an MPI Collective communication method more details about these methods can be found in [17]. The implementation During the adaptation the region that the process owns changes as a result of refining/coarsening the triangles, and each process mark this modified region, now after finishing the adaptation process and all the triangle_stream s elements got refined or coarsened we need to redistribute the triangle_stream as following: 28

35 Chapter 5. Parallel grid and load redistribution 5.2. Load redistribution 1. Each process use MPI_ALLGATHER to send the number of it s modified elements to all other processes. 2. Then MPI_ALLGATHERV uses this information, so each process can send it s own modified region of triangles. The MPI_ALLGATHERV concatenates these regions together in the new triangle stream, and distributes them to all processes. The redistribution is not made by one single process, actually it is implemented by the MPI_ALLGATHERV function and it happens in parallel[17]. 29

36 6 Test cases and results Three test cases were tested to examine our load redistribution schemes, maximum number of elements that the code can handle without crashing, and of course if we can run the test case in parallel or not. The idea of the test cases is to generate a geometry where we refine the elements that lay on the geometry to the maximum depth, while we coarsen the others to the minimum depth. All those cases are coded in the current code and can be used by making their compilation flags on/off. 6.1 Dynamic Symmetric Circle compilation flag is!dec$ DEFINE CIRCLE_ADAPTATION A circle was created with a center point (0.50.5) and initial radius = 0.25, in each iteration the radius is increased by 0.2. As this case is a symmetric one it was difficult to see whether the load redistribution schemes work good, however, we could simulate this case up to 16 process with max/min depth= 20/30. The following figure shows the Dynamic Symmetric Circle with max/min depth=16/20, simulated with 7 processor. What is interesting here for example is having a look of the first two images, where the purple blue process s region is changed and got smaller as those two got more elements as a result of passing the circle in their region. 30

37 Chapter 6. Parallel grid and load redistribution 6.1. Dynamic Symmetric Circle 31

38 Chapter 6. Parallel grid and load redistribution 6.1. Dynamic Symmetric Circle Figure 6.1: Dynamic Symmetric Circle 32

39 Chapter 6. Parallel grid and load redistribution 6.2. Dynamic non-symmetric Circle 6.2 Dynamic non-symmetric Circle compilation flag is!dec$ DEFINE CIRCLE_ADAPTATION_01_CENTER To avoid the symmetric case a circle was created with a center point (0.10.1) and initial radius 0.25, in each iteration the radius is increased by 0.2. This case was created to examine the load redistribution as the wave enters the Domain from the lower left corner and go out from the upper right one. The following figure shows the Dynamic non-symmetric Circle with max/min depth=16/20, simulated with 7 processor. 33

40 Chapter 6. Parallel grid and load redistribution 6.2. Dynamic non-symmetric Circle Figure 6.2: Dynamic non-symmetric Circls 34

41 Chapter 6. Parallel grid and load redistribution 6.3. Moving Target 6.3 Moving Target compilation flag is!dec$ DEFINE MOVING_TARGET For the same purpose that we mentioned in 6.2, with a point target starts moving four degrees in each iteration over the Circumference of a ghost circle with radius = 0.25 and center(0.5,0.5). I will nit show images of this test case, because it is difficult to see the moving target as it represents a point in the grid, however the result of this case was presented in my final presentation successfully. 6.4 Results Load redistribution In the following table we see the element s percentage that each process own Test case Process 1 Process 2 Process 3 Process 4 Dynamic non-symmetric Circle 25% 25% 25% 25% Initialization 30.5% 30.5% 19.5% 19.5% Adapt_Traversal 1 28% 28% 22% 22% Redistribute 31% 31% 19% 19% Adapt_Traversal 2 29% 29% 23% 23% Redistribute Redistribute after 20 iteration 25% 25% 25% 25% Adapt_Traversal % 19.5% 30.5% 30.5% Redistribute after iteration 60 25% 24% 25% 25% Speed up All the test cases were run with max/min depth = 20/30 Test case # of element 2 processes 4 processes 8 processes Dynamic Symmetric Circle 2,400, Dynamic non-symmetric Circle 2,097, Moving Target 2,000,

42 7 Implementation Before introducing a roughly algorithm explaining the whole process, we need to introduce the Forking and Joining scheme first additional to that some key-words need to be introduced too 7.1 Fork and Join the triangle_stream and the numeric_triangle_stream are allocated with estimated size which depends on the current depth. This estimated is more than the required size of the stream, because we do not know the amount of refined/coarsened element yet. So we Fork the stream by allocating another triangle_stream, after finishing that adaptation process, we simply Join the two streams by coping the stream that we got after the adaptation to the other one, then we deallocate the old one. 7.2 Initial Traversal Is the first traversal in our implementation, this traversal is responsible on discovering the edges and assign region to each process. When calling it for the next time, its responsible of Fork/Join the numeric_triangle_stream, and also redistributing it in a Diffusion scheme. 7.3 Empty Traversal No adaptation process here, it is used to calculate the old/new flow. And to synchronize the interprocess edges with the process_boundary stacks. 36

43 Chapter 7. Implementation 7.4. Adapt_mark Traversal 7.4 Adapt_mark Traversal Here we start marking the edges that need to be refined/coarsened, And also synchronize/update the interprocess edges with the process_boundary stacks. 7.5 Edge_mark Traversal To preserve the conformity of the grid, were we mark the additional edges to refined/coarsened as a result of the previous traversal, and then we update the process_boundary stacks. 7.6 Adapt Traversal Perform real adaptation, also responsible of All-to-ALl load redistribution(redistribute the triangle_stream). 37

44 Chapter 7. Implementation 7.7. The implementation 7.7 The implementation I n i t i a l i z e the Geometry s Test case a l l o c a t e stack_system a l l o c a t e p r o c e s s_boundary s t a c k s c a l l I n i t i a l Traversal c a l l Empty Traversal Do i =0 To max_i t e r a t i o n c a l l Adapt_mark Traversal edge_mark : Do c a l l Edge_mark Traversal i f ( no more a d d i t i o n a l edges are marked ) Exit edge_mark end do edge_mark c a l l Adapt Traversal : S t a r t : Fork t r i a n g l e_stream Adapt Join t r i a n g l e_stream r e d i s t r i b u t i o n t r i a n g l e_stream End c a l l I n i t i a l Traversal S t a r t : I f ( second c a l l ) Fork t r i a n g l e_stream Join t r i a n g l e_stream r e d i s t r i b u t i o n numeric_t r i a n g l e_stream c a l l Empty Traversal End DO d e a l l o c a t e stack_system d e a l l o c a t e p r o c e s s_boundary s t a c k s 38

45 8 Outlook At the end, I was able to fully accomplish my task of having a Parallel refinement and Coarsening of recursively structured adaptive triangular grids. Finally Demirel s[8] code and mine were supposed to be merged together to have a full solution of the Discontinuous Galerkin Solver for the Shallow Water Equation that is working in parallel on an Adaptive grid, but unfortunately Demirel was unable to fulfill his task, so the next step should be to merge the two codes. Also the speed up that we reach in the current code was not as good as we hoped to, because each process is traversing the whole grid accessing the parts the do not belong to that process. The next essential step is to increase the speed up using several methods. For example, we coarsen the regions that do not belong to the process as much as we can, in this case the process shall not spend much time in traversing those regions. Another approach is Traversal-Cutting, which means that we are not even allowing the process to access a region that does not belong to that process. 39

46 Bibliography [1] Aizinger, V. and Dawson, C.: A discontinuous galerkin method for threedimensional shallow water equations. J. Sci. Comput., 22(1): , [2] Ambati, V. and Bokhove, O.: Flooding and drying in discontinuous galerkin discretizations of shallow water equations. In: Wesseling, P., E. O nate and J. Périaux (Herausgeber): ECCOMAS CFD 2006, European Conference on Computational Fluid Dynamics. TU Delft, September [3] Bader, M.: Raumfüllende kurven. Lecture Notes, TU München, [4] Bader, M. and Zenger, C.: Efficient storage and processing of adaptive triangular grids using Sierpinski curves. in Computational Science - ICCS 2006, vol of Lecture Notes in Computer Science, Springer, 2006, pp [5] Bader, M., Schraufstetter, S., Vigh, C. and Behrens, J.: Memory efficient adaptive mesh generation and implementation of multigrid algorithms using Sierpinski curves. International Journal of Computational Science and Engineering, 4, 2008, pp [6] Bader, M., Bök C.,Schweiger, J., and Vigh, C.: Dynamically adaptive simulation with minimal memory requirement -solving the shallow water equation using sierpinski curves. International Journal of Computational Science and Engineering, 4, 2009, pp. 4. [7] Böck, C.: Discontinuous-galerkin-verfahren zum lösen der flachwassergleichungen auf adaptiven dreiecksgittern. Diplomarbeit, TU München, April [8] Demirel, Ö.: Parallelisation of a Discontinuous Galerkin Solver for the Shallow Water Equation. Master s Thesis, TU München, December [9] Radzieowski, C.: Numerische simulation zeitabhängiger probleme auf dynamischadaptiven dreiecksgittern. Diplomarbeit, TU München, November [10] Remacle, J.-F., S. Franzao, X. Li und M. Shephard.: An adaptive discretization of shallow water equations based on discontinuous Galerkin methods. International Journal for Numerical Methods in Fluids, 52(8): ,

47 Chapter 8. Outlook Bibliography [11] Sagan, H.: Space filling curves. Springer, New York, Heidelberg, Berlin, [12] Schwaiger, J.: Adaptive discontinuous-galerkin-verfahren zum lösen der flachwassergleichungen mit verschiedenen randbedingungen. Diplomarbeit, TU München, September [13] Schwanenberg, D., R. Liem und Köngeter, J.: Discontinuous galerkin method for the shallow water equations. Hydroinformatics 2000, [14] Schraufstetter, S.: Speichereffiziente algorithmen zum lösen partieller differentialgleichungen auf adaptiven dreiecksgittern. Diplomarbeit, TU München, July [15] Vigh, C.: Memory-efficient adaptive grid generation using Sierpinski curves, Masters Thesis, TU München, January [16] Vigh, C.: Lehrstuhltreffen presentation, presentation, TU München, April [17] MPI A Message-Passing Interface Standard Version 2.1:, University of Tennessee, June 23,

Efficient Storage and Processing of Adaptive Triangular Grids using Sierpinski Curves

Efficient Storage and Processing of Adaptive Triangular Grids using Sierpinski Curves Csaba Attila Vigh, Dr. Michael Bader Department of Informatics, TU München JASS 2006, course 2: Numerical Simulation: