Fault Tolerant Domain Decomposition for Parabolic Problems

Similar documents
Computer Project 3. AA Computational Fluid Dyanmics University of Washington. Mishaal Aleem March 17, 2015

Module 1: Introduction to Finite Difference Method and Fundamentals of CFD Lecture 6:

CS205b/CME306. Lecture 9

A Moving Mesh Method for Time dependent Problems Based on Schwarz Waveform Relaxation

MATHEMATICAL ANALYSIS, MODELING AND OPTIMIZATION OF COMPLEX HEAT TRANSFER PROCESSES

A Moving Mesh Method for Time Dependent Problems based on Schwarz Waveform Relaxation

1 2 (3 + x 3) x 2 = 1 3 (3 + x 1 2x 3 ) 1. 3 ( 1 x 2) (3 + x(0) 3 ) = 1 2 (3 + 0) = 3. 2 (3 + x(0) 1 2x (0) ( ) = 1 ( 1 x(0) 2 ) = 1 3 ) = 1 3

Performance Comparison between Blocking and Non-Blocking Communications for a Three-Dimensional Poisson Problem

The 3D DSC in Fluid Simulation

Performances and Tuning for Designing a Fast Parallel Hemodynamic Simulator. Bilel Hadri

Performance of CFD Application on a Grid Platform between USA, China and Germany

Comparison of different solvers for two-dimensional steady heat conduction equation ME 412 Project 2

Development of a Maxwell Equation Solver for Application to Two Fluid Plasma Models. C. Aberle, A. Hakim, and U. Shumlak

Diffusion Wavelets for Natural Image Analysis

Numerical schemes for Hamilton-Jacobi equations, control problems and games

Semi-Conservative Schemes for Conservation Laws

AMath 483/583 Lecture 24. Notes: Notes: Steady state diffusion. Notes: Finite difference method. Outline:

MPI OpenMP algorithms for the parallel space time solution of Time Dependent PDEs

Explicit\Implicit time Integration in MPM\GIMP. Abilash Nair and Samit Roy University of Alabama, Tuscaloosa

Coping with the Ice Accumulation Problems on Power Transmission Lines

New Basis Functions and Their Applications to PDEs

AN ALGORITHM FOR BLIND RESTORATION OF BLURRED AND NOISY IMAGES

AMath 483/583 Lecture 24

A Random Variable Shape Parameter Strategy for Radial Basis Function Approximation Methods

The Immersed Interface Method

Literature Report. Daniël Pols. 23 May 2018

SOLVING PARTIAL DIFFERENTIAL EQUATIONS ON POINT CLOUDS

Solving Partial Differential Equations on Overlapping Grids

Parallel Implementations of Gaussian Elimination

Exploiting a database to predict the in-flight stability of the F-16

Fluent User Services Center

MESHLESS SOLUTION OF INCOMPRESSIBLE FLOW OVER BACKWARD-FACING STEP

Runge Kutta Methods Optimized For Advection Problems

Chapter 6. Semi-Lagrangian Methods

Collocation and optimization initialization

Domain Decomposition and hp-adaptive Finite Elements

Mathematics for chemical engineers

Geometric Modeling Assignment 3: Discrete Differential Quantities

On the order of accuracy and numerical performance of two classes of finite volume WENO schemes

Ill-Posed Problems with A Priori Information

Index. C m (Ω), 141 L 2 (Ω) space, 143 p-th order, 17

A Load Balancing Fault-Tolerant Algorithm for Heterogeneous Cluster Environments

10. Cartesian Trajectory Planning for Robot Manipulators

An introduction to mesh generation Part IV : elliptic meshing

21. Efficient and fast numerical methods to compute fluid flows in the geophysical β plane

BACK AND FORTH ERROR COMPENSATION AND CORRECTION METHODS FOR REMOVING ERRORS INDUCED BY UNEVEN GRADIENTS OF THE LEVEL SET FUNCTION

SNAC: a tutorial. Eunseo Choi. July 29, Lamont-Doherty Earth Observatory

CS 450 Numerical Analysis. Chapter 7: Interpolation

Bindel, Spring 2015 Numerical Analysis (CS 4220) Figure 1: A blurred mystery photo taken at the Ithaca SPCA. Proj 2: Where Are My Glasses?

Geodesics in heat: A new approach to computing distance

Harmonic Spline Series Representation of Scaling Functions

Reduced-Order Modelling strategies for the finite element approximation of the Incompressible Navier-Stokes equations

Spline Solution Of Some Linear Boundary Value Problems

Capturing, Modeling, Rendering 3D Structures

New formulations of the semi-lagrangian method for Vlasov-type equations

An Optimization Method Based On B-spline Shape Functions & the Knot Insertion Algorithm

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND

Solve Non-Linear Parabolic Partial Differential Equation by Spline Collocation Method

Finite difference methods

Application of Finite Volume Method for Structural Analysis

Determination of Free Surface in Steady-State Seepage through a Dam with Toe Drain by the Boundary Element Method

Analysis of Replication Control Protocols

The WENO Method in the Context of Earlier Methods To approximate, in a physically correct way, [3] the solution to a conservation law of the form u t

THE IDENTIFICATION OF THE BOUNDARY GEOMETRY WITH CORNER POINTS IN INVERSE TWO-DIMENSIONAL POTENTIAL PROBLEMS

Modelling and implementation of algorithms in applied mathematics using MPI

99 International Journal of Engineering, Science and Mathematics

Fully discrete Finite Element Approximations of Semilinear Parabolic Equations in a Nonconvex Polygon

ADAPTIVE APPROACH IN NONLINEAR CURVE DESIGN PROBLEM. Simo Virtanen Rakenteiden Mekaniikka, Vol. 30 Nro 1, 1997, s

Numerical Analysis of Shock Tube Problem by using TVD and ACM Schemes

REPORT DOCUMENTATION PAGE

AMath 483/583 Lecture 21 May 13, 2011

Partial Differential Equations

Shape from Shadows. A Hilbert Space Setting. Michael Hatzitheodorou 1. INTRODUCTION

Error in Numerical Methods

Edge Detection Free Postprocessing for Pseudospectral Approximations

Numerical Methods for Hyperbolic and Kinetic Equations

Medical Image Segmentation using Level Sets

Finite element solution of multi-scale transport problems using the least squares based bubble function enrichment

Chapter 1 BACKGROUND

Solution of 2D Euler Equations and Application to Airfoil Design

Radial Basis Function-Generated Finite Differences (RBF-FD): New Opportunities for Applications in Scientific Computing

Numerically Stable Real-Number Codes Based on Random Matrices

Intensity Transformations and Spatial Filtering

PAMIHR. A Parallel FORTRAN Program for Multidimensional Quadrature on Distributed Memory Architectures

MPI OpenMP algorithms for the parallel space time solution of Time Dependent PDEs

HONOM EUROPEAN WORKSHOP on High Order Nonlinear Numerical Methods for Evolutionary PDEs, Bordeaux, March 18-22, 2013

Using Adaptive Sparse Grids to Solve High-Dimensional Dynamic Models. J. Brumm & S. Scheidegger BFI, University of Chicago, Nov.

Stability Assessment of Electric Power Systems using Growing Neural Gas and Self-Organizing Maps

Chapter 13. Boundary Value Problems for Partial Differential Equations* Linz 2002/ page

Multigrid Pattern. I. Problem. II. Driving Forces. III. Solution

A NUMERICAL SOLUTION OF ONE DIMENSIONAL HEAT EQUATION USING CUBIC B-SPLINE BASIS FUNCTIONS

The Level Set Method. Lecture Notes, MIT J / 2.097J / 6.339J Numerical Methods for Partial Differential Equations

Numerical Verification of Large Scale CFD Simulations: One Way to Prepare the Exascale Challenge

ODEs occur quite often in physics and astrophysics: Wave Equation in 1-D stellar structure equations hydrostatic equation in atmospheres orbits

Transactions on Information and Communications Technologies vol 15, 1997 WIT Press, ISSN

PROGRAMMING OF MULTIGRID METHODS

Modern Methods of Data Analysis - WS 07/08

Anisotropy-preserving 5D interpolation by hybrid Fourier transform

Sımultaneous estımatıon of Aquifer Parameters and Parameter Zonations using Genetic Algorithm

Efficient 3D Gravity and Magnetic Modeling

Transcription:

Fault Tolerant Domain Decomposition for Parabolic Problems Marc Garbey and Hatem Ltaief Department of Computer Science, University of Houston, Houston, TX 77204 USA garbey@cs.uh.edu, ltaief@cs.uh.edu 1 Introduction The objective of this paper is to present some numerical schemes for the time integration of parabolic problems that can recover from a failure of the computer system. We construct an algorithmic solution of the problem in the context of domain decomposition and distributed computing. Our model problem is the heat equation: u t = u + F (x, t), (x, t) Ω (0, T ), u Ω = g(x), u(x, 0) = u o (x). (1) We suppose that the time integration is done by a first order implicit Euler scheme, U n+1 U n = U n+1 + F (x, t n+1 ), (2) dt and that Ω is partitioned into N subdomains Ω j, j = 1..N. The computation of these subdomains is distributed among N Processing Units (PUs) or computers. We anticipate that one or several PUs may stall or get disconnected. We complement the distributed architecture of these N PUs, with S additional PUs called spare processing units. The problem in designing a Fault Tolerant (FT) code that can survive to several failures of PUs decomposes as follows: (Pb 1) guaranty that if one or several PUs get disconnected the code can still be executed. (Pb 2) provide an algorithm that can restart the time integration from the data that are available in the distributed memory or file system. While FT is not so critical for a standard application running for few hours on a medium scale parallel system, it becomes a real issue for long time runs on

2 Marc Garbey and Hatem Ltaief a large system or grid computing architecture. In both cases the probability of failures of computing units becomes almost certain, and parallel input/output are not as efficient as ordinary check point procedures may require. Pb1 is solved by middlewares, like FT-MPI [1, 2] for example, which ensure the application goes on while some processors have failed. On the other hand, the application should be FT as well because these middlewares do not guaranty to get the correct numerical solution after a failure. In this paper we focus on the design of numerical algorithms that solved (Pb2) without using global check pointing. Global checkpointing does not scale on large parallel system and is impractical on a grid of computers. Indeed, as the number of nodes and the problem size increases, the cost of checkpointing and recovery increases, while the mean time between failures decreases. The approach we have taken is as follows: spare processors are used to efficiently store the subdomain data of the application during execution in local asynchronous mode in their local memory. In case of failure, a spare processor takes over for the failed processor without the entire system having to roll back to a globally consistent checkpoint. The numerical problem that we address can be defined as follows: We assume that spare processors have stored copies of all subdomain data U n(j) j, j = 1..N, in their local memory. A priori the time step n(j) n(k) for j k. We look for a (parallel) reconstruction process of U M at a common time step M (min j {n(j)}, max j {n(j)}), from subdomains data U n(j) j, j = 1..N, at different but nearby time steps. The code can then restart from U M. Because the time interval between two asynchronous back ups is small, we are looking for a numerical procedure that is completely explicit and does not require the complexity of a standard parameter identification method. In the next section, we will discuss several algorithmic ideas to solve this problem. 2 The Fault Tolerant algorithms For the simplicity of the presentation, we will restrict ourselves to the one dimensional heat equation problem Ω = (0, 1), discretized on a regular Cartesian grid: Uj n = U n+1 n+1 j+1 2Uj + U n+1 j 1 dt h 2 + F n+1 j, (3) and we assume that dt h. However, most of the ideas presented here should be generalized easily to higher space dimension. We will review progressively few numerical methods U n+1 j

Fault Tolerant Domain Decomposition for Parabolic Problems 3 to reconstruct a uniform approximation of U M, from disparate data U n(j) in each subdomain Ω j. 2.1 Interpolation method Let M = 1 N Σ j=1..n n(j). We look for an approximation of Uj M in Ω. We assume that we have at our disposal U n(j) and U m(j) at two time steps n(j) < m(j), in each subdomain Ω j. Then, if U n(j) U m(j) Ωj is below some tolerance number, we may use a second order interpolation/extrapolation in time to get an approximation of U M. The numerical error should be of order ((m(j) n(j))dt) 2. This simple procedure reduces the accuracy of the scheme and introduces small jump at the interfaces between subdomains. This method is perfectly acceptable when one is not interested in accurately computing transient phenomena. However, this method is not numerically efficient in the general situation. 2.2 Forward Time Integration Let us assume that for each subdomain, we have access to U n(j). For simplicity we suppose that n(j) is a monotonically increasing sequence. We further suppose that we have stored in the memory of spare processors the time history of the artificial boundary conditions Ij m = Ω j Ω j+1 for all previous time steps n(j) m n(j + 1). We can then reconstruct with the forward time integration of the original code, the solution U n(n), as follows: Processor one advances in time u n 1 from time step n(1) to time step n(2) using boundary conditions I m 1, n(1) m n(2). Then, Processors one and two advance in parallel u n 1 and u n 2 from time step n(2) to n(3) using neighbor s interface conditions, or the original interface solver of the numerical scheme. This process is repeated until the global solution u n(n) is obtained. This procedure can be easily generalized in the situation of Figure 1 where we do not assume any monotonicity on the sequence n(j). The thick line represents the data that needs to be stored in spare processors, and the interval with circles are the unknowns of the reconstruction process. The advantage of this method is that we can easily reuse the same algorithm as in the standard domain decomposition method, but restricted to some specific subsets of the domain decomposition. We reproduce then the exact same information U M as the process had no failures. The main drawback of the method is that saving at each time step the artificial boundary conditions may slow down the code execution significantly. To illustrate this difficulty, we have implemented a 3D benchmark code with a very simple explicit/implicit domain decomposition procedure for (1). We refer to [3] for a

4 Marc Garbey and Hatem Ltaief more sophisticated method along these lines. We use a Krylov method to implicit the time stepping per domain and impose explicitly the boundary values on the artificial boundaries. The domain of computation Ω = (0, 1) 3 is distributed on a two dimensional grid of p x p y processors. This two dimensional grid is extended with an additional row of p x spare processors. We have implemented a totally asynchronous back up of the subdomains every K time steps. We have also the option to back up in addition all two dimensional artificial interfaces generated in the sequence of K 1 consecutive time steps in between two back ups of all subdomains. The communication between active processors and spare processors is done by non-blocking communication, and the back up data are small enough to fit in the main memory of spare processors. Fig. 1. Illustration of the reconstruction procedure with the forward method. Figures 2 and 3 report on the numerical experiment with 36 PUs and 6 spare processors on two different architectures. Each processor has a block of 18 18 98 grid points. The first system used in figure 2 is a beowulf cluster with dual AMD 32 bit processors and a gigabit ethernet network, while the second system used in figure 3 is a dual Itanium cluster that has a Myrinet network. While the penalty to back up the subdomain on spare processors is particularly high with the system that uses Gigabit ethernet, it can be seen that saving the artificial boundary conditions every time step significantly slows down the application on both computer architecture systems. We will now discuss a reconstruction method that produces an approximate solution without using the time series of artificial boundary conditions. 2.3 Backward Integration and Space Marching Let us suppose now that we asynchronously store only the subdomain data, and not the chronology of the artificial interface condition. To be more specific,

Fault Tolerant Domain Decomposition for Parabolic Problems 5 120 save u(.,t) save u(.,t) and BC 18 16 save u(.,t) save u(.,t) and BC 100 14 80 12 % time used in saving 60 % time used in saving 10 8 6 40 4 20 2 0 0 1 2 3 4 5 6 7 8 9 10 Back up Frequency 2 1 2 3 4 5 6 7 8 9 10 Back up Frequency Fig. 2. Overhead with Dual AMD/Gigabit Ethernet Fig. 3. Overhead with dual Itanium/Myrinet Network we suppose that we have access for each subdomain to the solution at two different time steps n(i), m(i) with m(i) n(i) = K >> 1. The Forward Implicit scheme provides an explicit formula when we go backward in time: Uj n = U n+1 j dt U n+1 n+1 j+1 2Uj + U n+1 j 1 h 2 F n+1 j, (4) The existence of the solution is granted by the forward integration in time. Two difficulties are first the instability of the numerical procedure and second the fact that one is restricted to the cone of dependence as shown in Figure 4. Fig. 4. Illustration in one space dimension of the problem with the third solution. We have in Fourier modes Û n k = δ k Û n+1 k,

6 Marc Garbey and Hatem Ltaief with δ k 2 h (cos(k 2 π h) 1), k N 2. The expected error is at most in the order ν where ν is the machine precision h K and K the time step. Therefore, the backward time integration is still accurate up to time step K with ν h K h2. To stabilize the scheme, one can use the Telegraph equation that is a perturbation of the heat equation: ɛ 2 u t 2 2 u x 2 + u = F (x, t), x (0, 1), t (0, T ) (5) t The asymptotic convergence can be derived from [4] after time rescaling. The general idea is then to use the previous scheme (4) for few time steps and pursue the time integration with the following one ɛ U n+1 j 2 Uj n + U n 1 j dt 2 U j+1 n 2 U j n + un j 1 h 2 + U n+1 j Uj n dt = F n j. (6) Let us notice that the time step dt should satisfy the stability condition dt < ɛ 1/2 h. We take in practice dt = dt/p where p is an integer. The smaller is ɛ, the more unstable is the scheme (6) and the flatter is the cone of dependence. The smaller is ɛ, the better is the asymptotic approximation. We have done a Fourier analysis of the scheme and Figure 5 shows that there is a best compromise for ɛ to balance the error that comes from the instability of the scheme and the error that comes from the perturbation term in the telegraph equation. We have obtained a similar result in our numerical experiments. To construct the solution outside the cone of dependencies we have used a standard procedure in inverse heat problem that is the so called space marching method [5]. This method may require a regularization procedure of the solution obtained inside the cone using the product of convolution ρ δ u(x, t), where ρ δ = 1 δ t2 exp( π δ 2 ). The following space marching scheme: Uj+1 n 2 U j n + Uj 1 n h 2 = U n+1 j U n 1 j + Fj n, (7) 2 dt 2 dt is unconditionally stable, provided δ π. The last time step U n(i)+1 to be reconstructed uses the average

Fault Tolerant Domain Decomposition for Parabolic Problems 7 N=100 ; h=pi/100 10 6 10 4 Heat equation 10 2 10 0 Asymptotic 10 2 dt =h 10 4 Telepgraph Equation 10 6 10 1 10 0 Optimum epsilon Fig. 5. Stability and error analysis with Fourier U n(i)+1 = U n(i) + U n(i)+2. 2 We have observed that filtering as suggested in [5] is not necessary in our reconstruction process. Figure 6 illustrates the numerical accuracy of the overall reconstruction scheme that combines (4) and (7) for Ω = ( π, π), dt = h = 0.0314, K = 7 and F such that the exact analytical solution is cos(q 1 x)(sin(q 2 t) + 1 2 cos(q 2 t)), q 1 = 2.35, q 2 = 1.37. In this specific example our method gives better results than the interpolation scheme provided that K 7. For larger K we can use the scheme (6) for time steps below m(i) 7. However the precision may deteriorate rapidly in time. 3 Conclusion We have presented the problem of FT algorithm for a parabolic operator. We have reviewed several procedures to reconstruct the solution in each subdomain from a set of subdomain solutions given at disparate time steps. This problem is quite challenging because it is very ill posed. We found a satisfactory solution by combining explicit reconstruction techniques that are a backward integration with some stabilization terms and space marching. We are currently applying these ideas to multi-dimensional parabolic problems.

8 Marc Garbey and Hatem Ltaief 4.5 x 10 4 4 3.5 3 L2 norm 2.5 2 1.5 1 forward error backward error in the cone global error error based on interpolation 0.5 0 0 0.05 0.1 0.15 0.2 0.25 Time Fig. 6. Numerical accuracy of the overall reconstruction scheme Acknowledgement Research reported here was supported by Award 0305405 from the National Science Foundation. References 1. Gabriel, E., Fagg, G., Bukovsky, A., Angskun, T., Dongarra, J., A Fault-Tolerant Communication Library for Grid Environments, 17th Annual ACM International Conference on Supercomputing (ICS 03) International Workshop on Grid Computing and e-science, San Francisco (2003) 2. Gropp and Lusk, Fault Tolerance in Message Passing Interface Programs, International Journal of High Performance Computing Applications, 18: 363-372 (2004) 3. C.Dawson and T.Dupont, Explicit/Implicit, Conservative Domain Decomposition Procedures for Parabolic Problems Based on Block-Centered Finite Differences, Sinum Vol 31, Issue 4, pp 1045-1061 (1994) 4. W.Eckhaus and M.Garbey, Asymptotic analysis on large time scales for singular perturbation problems of hyperbolic type, SIAM J. Math. Anal. Vol 21, No 4, pp867-883 (1990) 5. D.A.Murio, The Mollification Method and the Numerical Solution of Ill-posed Problems, Wiley, New York (1993)