Simulating tsunami propagation on parallel computers using a hybrid software framework

Size: px

Start display at page:

Download "Simulating tsunami propagation on parallel computers using a hybrid software framework"

Dominick Gardner
5 years ago
Views:

1 Simulating tsunami propagation on parallel computers using a hybrid software framework Xing Simula Research Laboratory, Norway Department of Informatics, University of Oslo March 12, 2007

2 Outline Intro Parallelization Vision HLRS 1 Introduction 2 A hybrid software framework for parallelization 3 Desirable simulation setup for future 4 Performance analysis done at HLRS

3 List of Topics 1 Introduction 2 A hybrid software framework for parallelization 3 Desirable simulation setup for future 4 Performance analysis done at HLRS

4 The origin of the word tsunami

2004 Indian Ocean Tsunami) Induced by asteroid impact (such as the

5 Different types of tsunamis Tsunamis: large waves formed by rapid mass movements Induced by subwater earthquake (such as Dec Indian Ocean Tsunami) Induced by asteroid impact (such as the Mjølnir Impact) Induced by landslide (of great importance to the Norwegian fjords)

6 Motivation Wave propagation simulation is very important for studying tsunamis A computational challenge huge computational domain different physics required in different areas Parallel computing should reuse existing serial wave codes should allow different math models/resolutions in different areas Objective: a framework for parallel hybrid tsunami simulations

7 Huge computations (example: Indian Ocean) 1km 1km resolution overall: about mesh points 200m 200m resolution overall: 10 9 mesh points

8 Computational challenge Example: Indian Ocean 1km 1km resolution is not sufficient everywhere 200m 200m resolution overall is too much We need smart computing : High resolution only in areas where necessary Simple mathematical model in vast areas Advanced mathematical model (due to complicated physics) in small areas Result: parallel hybrid tsunami simulator Desirable resolution requires number of mesh points number of time steps many thousands

9 List of Topics 1 Introduction 2 A hybrid software framework for parallelization 3 Desirable simulation setup for future 4 Performance analysis done at HLRS

10 Parallelization objectives Requirement 1: easy parallelization Reuse of serial wave codes during parallelization Different serial codes collaborate inside a hybrid framework Requirement 2: efficient for computational resource FEM only in areas where unstructured meshes and advanced numerics are needed FDM elsewhere

subdomains Each subdomain: (relatively) independent

11 Basic idea: divide and conquer Domain decomposition: one global solution domain is divided into many subdomains Each subdomain: (relatively) independent working unit Collaboration between the subdomains: communication

12 Overall parallelization strategy Ω = P s=1ω s Divide a vast ocean domain into many subdomains Uniform local meshes and FDM on most of the subdomains Unstructured local meshes and FEM on selected subdomains A global iteration among all subdomains During each iteration a subdomain independently updates its local solution Exchange of local solutions between neighboring subdomains at end of each iteration Solution of L Ω (u) = f Ω is found as u 0,u 1,...,u i L Ωs (u i s) = f i Ω s 1 s P u i = P s=1u i s

13 Convergence among subdomains Schwarz methods work as the numerical foundation Small amount of overlap between neighboring subdomains (overlapping domain decomposition) Originally well-known as a parallel numerical strategy for solving large linear systems We apply DD at software level (not at linear-algebra level ) No global matrices/vectors exist, all represented by the collection of subdomain matrices/vectors Neighboring subdomain meshes may be non-matching and/or of different types

14 A generic library of Schwarz methods Schwarz methods: a general approach to solving PDEs in parallel, a generic library can be programmed Object-oriented programming is well suited Generic components: subdomain solvers and a global administrator class SubdomainSolver: generic interface of a subdomain solver, only declaration of standard functions, no implementation class Administrator: implementation of generic functions for invoking communication and checking global convergence

15 A framework of hybrid tsunami simulators Objective: a generic framework for creating hybrid parallel tsunami simulators, based on existing serial codes Starting point C++ Boussinesq solver using FEM: class Boussinesq Legacy F77 code using FDM: a set of subroutines Direct parallelization of either code requires too much work A hybrid software framework class SubdomainBQFEMSolver : public Boussinesq, public SubdomainSolver class SubdomainBQFDMSolver : public SubdomainSolver (calling F77 subroutines internally) HybridBQSolver : public Administrator Implementation using Diffpack (

16 Flexibility Intro Parallelization Vision HLRS Free choice between SubdomainBQFEMSolver and SubdomainBQFDMSolver for each subdomain Adaptive mesh refinement allowed for FEM subdomains Neighboring subdomains may use non-matching local meshes Possible to incorporate other serial codes as subdomain solvers

17 List of Topics 1 Introduction 2 A hybrid software framework for parallelization 3 Desirable simulation setup for future 4 Performance analysis done at HLRS

18 Subdomain preparation p1 p4 768 New finite element code 700 Finite difference legacy code p2 Simulating tsunami propagation on parallel computers p3 using a h

19 Coarse-mesh simulation of Indian Ocean Tsunami Initial wave elevation after the earthquake

20 Coarse-mesh simulation snapshot 1 After 1.4 hours

21 Coarse-mesh simulation snapshot 2 After 2.8 hours

22 List of Topics 1 Introduction 2 A hybrid software framework for parallelization 3 Desirable simulation setup for future 4 Performance analysis done at HLRS

23 Motivation for my HPC Europa visit Vector-CPU based system at HLRS Extensive experience with performance analysis at HLRS Purpose: a fine-grained diagnosis of the tsunami simulator and our parallel PDE library

24 Observations so far (1) When the computational domain has no points on land, the parallel computation is well balanced On SX-8, the main work at each time step goes to the discretization, not solving the resulting distributed linear system

Observations so far (2) When the computational domain has points on land, the parallel computation is not balanced Causes of imbalance: Imbalance in

25 Observations so far (2) When the computational domain has points on land, the parallel computation is not balanced Causes of imbalance: Imbalance in the distributed discretization (some subdomains have many points on land) Imbalance in the parallel DD solver (some subdomain problems are easier to solve)

26 Observations so far (3) The SX compiler does not optimize the discretization phase very well C++ code Many levels of nested for-loops Extensive use of virtual functions

27 Observations so far (4) Vectorization is enabled for some parts of the code Example: vector addition x = y + z #pragma cdir nodep for (int i=0; i<length; i++) tmp_x[i] = tmp_y[i] + tmp_z[i]; Percentage of vectorized code is increased from 6-7% to 13-14% in the solution phase

28 Observations so far (5) Vectorization does not work for some parts of the code Example: sparse matrix-vector multiplication y = Ax Compressed row storage Indirect (and random) access of data entries #pragma cdir nodep for (i = 1; i <= nrows; i++) { rstart = ad.irow(i); rstop = ad.irow(i+1); #pragma cdir novector tmp = 0.0; for (r = rstart; r < rstop; r++) tmp += entries(r) * x(ad.jcol(r)); y(i) += tmp; } Vectorization of the inner for-loop has to be turned off!

29 Conclusions Schwarz methods: numerical foundation for the parallelization Object-oriented programming enables a hybrid framework of tsunami simulators Full flexibility in choosing subdomain solvers different mathematical models different discretizations different local meshes different codes Some parts of the tsunami simulator are improved due to analysis done at HLRS Challenge: performance and load balancing

Simulation of tsunami propagation

Simulation of tsunami propagation Xing Cai (Joint with G. Pedersen, S. Glimsdal, F. Løvholt, H. P. Langtangen, C. Harbitz) Simula Research Laboratory Dept. of Informatics, University of Oslo 2nd escience