MULTIGRID REDUCTION IN TIME FOR NONLINEAR PARABOLIC PROBLEMS: A CASE STUDY

Size: px

Start display at page:

Download "MULTIGRID REDUCTION IN TIME FOR NONLINEAR PARABOLIC PROBLEMS: A CASE STUDY"

Geraldine West
5 years ago
Views:

1 MULTIGRID REDUCTION IN TIME FOR NONLINEAR PARABOLIC PROBLEMS: A CASE STUDY R.D. FALGOUT, T.A. MANTEUFFEL, B. O NEILL, AND J.B. SCHRODER Abstract. The need for paraeism in the time dimension is being driven by changes in computer architectures, where performance increases are now provided through greater concurrency, not faster cock speeds. This creates a botteneck for sequentia time marching schemes because they ack paraeism in the time dimension. Mutigrid Reduction in Time (MGRIT) is an iterative procedure that aows for tempora paraeism by utiizing mutigrid reduction techniques and a mutieve hierarchy of coarse time grids. MGRIT has been shown to be effective for inear probems, with speedups of up to 50 times. The goa of this work is the efficient soution of noninear probems with MGRIT, where efficiency is defined as achieving simiar performance when compared to an equivaent inear probem. The benchmark noninear probem is the p-lapacian, where p = 4 corresponds to a we-known noninear diffusion equation, and p = 2 corresponds to the standard inear diffusion operator, our benchmark inear probem. The key difficuty encountered is that the noninear time-step sover becomes progressivey more expensive on coarser time eves, as the time-step size increases. To overcome such difficuties, mutigrid research has historicay targeted an accumuated body of experience regarding how to choose an appropriate sover for a specific probem type. To that end, this paper deveops a ibrary of MGRIT optimizations and modifications, most importanty an aternate initia guess for the noninear time-step sover and deayed spatia coarsening, that wi aow many noninear paraboic probems to be soved with parae scaing behavior comparabe to the corresponding inear probem. Key words. mutigrid, mutigrid-in-time, paraboic probems, noninear, reduction-based mutigrid, pararea, high performance computing AMS subject cassifications. 65F10, 65M22, 65M55 1. Introduction. Previousy, increasing cock speeds aowed for the speedup of sequentia time integration simuations of a fixed size, and aowed simuations to be refined in both space and time without an increase in overa wa-cock time. However, increases in cock speed have stagnated, eading to a sequentia time integration botteneck. Future speedups wi be avaiabe from more concurrency, and hence, speedups for time marching simuations must aso come from increased concurrency. By aowing for paraeism in time, much greater computationa resources can be brought to bear and overa speedups can be achieved. Because of this, interest in parae-in-time methods has grown over the ast decade. Perhaps the most we known parae-in-time agorithm, Pararea [24], is equivaent [14] to a two-eve mutigrid scheme. This work focuses on the mutigrid reduction in time (MGRIT) method [10]. MGRIT is a true mutieve agorithm and has optima parae communication behavior, as opposed to a two-eve scheme, where the size of the coarse-eve imits concurrency. Work on parae-in-time methods actuay goes back at east 50 years [33] and incudes a variety of approaches. Work regarding direct methods incudes [32, 36, 27, Submitted to the editors 6/29/2016. Center for Appied Scientific Computing, Lawrence Livermore Nationa Laboratory, P.O. Box 808, L-561, Livermore,CA This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore Nationa Laboratory under Contract DE-AC52-07NA LLNL-JRNL Department of Appied Mathematics, University of Coorado at Bouder, Bouder, Coorado. This work was performed under the auspices of the U.S. Department of Energy under grant numbers (SC) DE-FC02-03ER25574 and (NNSA) DE-NA and Lawrence Livermore Nationa Laboratory under contract B

2 6, 15]. There are iterative approaches, as we, based on mutipe shooting, domain decomposition, waveform reaxation, and mutigrid, incuding [22, 16, 25, 1, 17, 18, 38, 5, 39, 37, 20, 19, 24, 7, 31, 9, 40, 10]. For a gente introduction to this history, pease see the review paper [12]. This work focuses on mutigrid approaches (and MGRIT in particuar) because of mutigrid s optima agorithmic scaing for both parae communication and number of operations. An additiona attraction of MGRIT is its non-intrusive nature, where the user empoys an existing sequentia time-stepping routine within the context of the MGRIT impementation. This work uses XBraid [41], an open source impementation of MGRIT deveoped at Lawrence Livermore Nationa Laboratory (LLNL). MGRIT soves, in parae, a genera first-order, ordinary differentia equation (ODE) and corresponding time discretization: (1.1) u t = f(u, t), u(0) = u 0, t [0, T ], (1.2) u(t + δt) = Φ(u(t), u(t + δt)) + g(t + δt), where Φ is a noninear operator that represents the chosen time-stepping routine and g is a time dependent function that incorporates a the soution independent terms. In the inear case, the appication of Φ is either a matrix vector mutipication, e.g. forward Euer, or a spatia sove, e.g. backward Euer. Sequentia time marching schemes are optima in that they move from time t = 0 to t = T using the fewest possibe appications of Φ. By appying Φ iterativey, in comparaby expensive but highy parae mutigrid cyces, MGRIT sacrifices additiona computation for tempora concurrency. Both methods are optima [10], i.e. O(N) where N is the tota number of time-steps, but the constant for MGRIT is higher. This creates a crossover point wherein the added concurrency overcomes the extra computationa work. Beyond this crossover point MGRIT provides a speedup over sequentia methods. Appication of MGRIT to inear paraboic probems was studied in [10]. Figure 1 shows a strong scaing study of MGRIT for inear diffusion on the machine Vucan, an IBM BG/Q machine at LLNL. The probem size was (257) (space time). Three data sets are presented, a standard sequentia time-stepping run (square markers), a time-ony parae run of MGRIT (circe markers), and a space-time parae run of MGRIT (triange markers). The space-time parae runs used an 8 8 processor grid in space, with a additiona processors added in time. Both MGRIT curves represent the use of tempora and spatia coarsening, so that the ratio of δt/h 2 is fixed on coarse time-grids, where h is the spatia mesh width. The maximum speedup achieved by the bue curve (triange markers) is approximatey 50, and the crossover point where MGRIT provides a speedup is at about 128 processors in time. Achieving a simiar efficiency for noninear paraboic probems is the chief goa of this paper. When considering the performance of MGRIT, the appication of Φ is the dominant process. For a inear probem with impicit time-stepping, each appication of Φ equates to soving one inear system. When an optima spatia sover such as cassica mutigrid in space [28, 4, 34, 21] is used, the work required for a time-step evauation is independent of the time-step size. However, for noninear probems, each appication of Φ invoves an iterative noninear sove, and when using a common method such as Newton s method, convergence can depend strongy on the initia approximation even when inear mutigrid is used as the inner sover. In particuar, the previous time-step 2

Fig. 1: Time to sove 2D inear diffusion on a (128) 2 16385 space-time grid using sequentia time-stepping and two different processor decompositions of MGRIT. [10] is farther away on coarser grids.

3 Fig. 1: Time to sove 2D inear diffusion on a (128) space-time grid using sequentia time-stepping and two different processor decompositions of MGRIT. [10] is farther away on coarser grids. To expore the effects of introducing a noninearity that exhibits this key difficuty, a mode noninear paraboic probem, known as the p-lapacian, is considered: (1.3) u t (x, t) ( u(x, t) p 2 u(x, t)) = b(x, t), x Ω, t [0, T ], subject to the foowing Neumann boundary and initia conditions: (1.4) (1.5) u(x, t) p 2 u(x, t) n = g(x, t), x Ω, t (0, T ], u(x, 0) = u 0 (x), x Ω. The p-lapacian for p = 4 is we-known as a means of modeing soi erosion and transport [2] and has aso found uses in image processing (denoising, segmentation and inpainting) and machine earning (see [8] for an overview and [23] for an introduction). In this paper, the mode noninear probem corresponds to p = 4, whie the comparabe inear probem corresponds to p = 2, which is the standard diffusion operator. Our study wi consist of investigating parae-in-time for Equation (1.3) in the context of XBraid and MGRIT. We wi baance user concerns such as overa timeto-soution, memory use, and non-intrusiveness. To begin the study, we consider a naive appication of MGRIT to (1.3), where a arge increase in the cost of a noninear sove (for Newton s method) on the coarser tempora grids is observed. This is caused by the reativey arge time-steps (compared to the finest grid) and the associated poor initia guess to the Newton sover on the coarse eves. These increases counteract the strength of mutigrid, where speedup is achieved by using cheap coarse grid probems to acceerate convergence on the fine grid. Therefore, our strategy is to minimize the cost of each noninear sove, which is measured with a cost estimate. The desired resut is to achieve simiar efficiencies for the noninear and inear versions of (1.3), whie aso taking into account common user concerns, such as non-intrusiveness and storage costs. Utimatey, this paper s goa is to present a detaied, experimentay backed, ibrary of optimizations and modifications that can be used to efficienty impement MGRIT for a arge range of noninear paraboic probems. Thus, we have carefuy chosen our mode noninear paraboic probem and we note that mutigrid research has historicay targeted such an accumuated body of experience regarding how to choose an appropriate sover for a specific probem type. 3

4 To reduce the average cost of each Φ evauation, an improved initia guess for each time-step is investigated. For sequentia time-stepping, a commony used approach is to take the previous time-step as the initia guess. However, in the parae-intime setting this approach is fawed because on coarse eves, where the time-step size is arge, this is a poor initia guess. The recent parae-in-time works of [35, 26, 30] consider another option based on the iterative nature of many parae-intime agorithms, where information from previous parae-in-time iterations is used as the initia guess. This increases storage requirements, but incurs no additiona computationa cost. For MGRIT, this entais using as the initia guess the soution from the previous evauation of Φ at each time point and on each MGRIT eve. With this initia guess, as MGRIT converges, the cost of a Newton sove on each eve goes to zero 1. However, this aso creates a tension between memory usage and computationa efficiency because MGRIT need not store the soution from the previous Φ evauation at a time points. Using the soution from the previous Φ evauation as the initia guess whenever it is avaiabe aows a user with memory constraints to reap the benefits of this improved initia guess, whie imiting per-processor storage costs. This reduced storage resut and the importance of improved initia guesses for MGRIT are both important contributions of this work. Foowing this, a spatia coarsening strategy is pursued to imit δt/h 2 on coarse time grids. The goa is to reduce the number of Newton iterations required on coarse eves by reducing the spatia variabiity of the noninearity whie aso improving the condition number of the inner inear probems. Recognizing this specific importance of spatia coarsening when controing the cost of noninear time-step sovers on coarse time grids is an important contribution of this work. In addition, the smaer probem sizes drasticay reduce coarse grid compute times and coarse grid storage costs. Two additiona strategies incude avoiding unnecessary work on the first MGRIT cyce and optimizing the number of eves in the MGRIT hierarchy. Further speedups can be obtained by oosening the Newton sover toerance during the first three MGRIT iterations on a eves, where the approximate soution is sti poor. Overa, the most effective strategies are spatia coarsening and the improved initia guess, but the other strategies combined have a simiary significant impact on runtime. Together, these strategies produce an MGRIT agorithm for noninear probems that has an efficiency simiar to that found for a corresponding inear probem. In Section 2, the genera MGRIT framework is discussed. In Section 3, some impementation detais are given. In Section 4, our strategy for improving the performance of MGRIT is proposed and justified. In Section 5, this strategy is impemented. In sections 7.1 and 7.2, weak and strong scaing resuts are presented. 2. MGRIT overview. First, a brief overview of the MGRIT agorithm for time independent inear probems is presented. The noninear, time dependent, extension foows in Section 2.1. Define a uniform tempora grid with time-step δt and nodes t j, j = 0,..., N t (non-uniform grids can easiy be accommodated). Further, define a coarse tempora grid with time-step T = mδt and nodes T j = j T, j = 0, 1,..., N t /m, for some coarsening factor, m. This is depicted in Figure 2. In bock 1 Assuming the Newton sover measures noninear convergence with a fixed absoute toerance. 4

5 T 0 T 1 t 0 t 1 t 2 t 3... tm T = mδt δt t Nt Fig. 2: Fine- and coarse-grid tempora meshes. Fine-grid points (back) are present on ony the fine-grid, whereas coarse-grid points (red) are on both the fine- and coarsegrid. trianguar form, the time-stepping probem (1.2) is I u 0 Φ u 1 (2.1) Au = I Φ I. u Nt = g 0 g 1. g Nt = g. Sequentia time marching is a forward bock sove of this system. MGRIT soves this system iterativey, in parae, using a coarse-grid correction scheme based on mutigrid reduction. Both are O(N) methods, but MGRIT is highy concurrent. Mutigrid reduction strategies are a variation of cycic reduction methods and, as such, successivey eiminate unknowns in the system. If the fine points are eiminated, the system becomes: I u,0 Φ m I u,1 (2.2) A u =.... = Rg = g. Φ m I u,nt/m Idea restriction, R, and interpoation, P, are defined as in [10], so that the system (2.2) can be formed in a mutigrid fashion. Let, I Φ m 1... Φ I (2.3a) R =..., Φ m 1... Φ I (2.3b) I Φ T... Φ m 1,T P T =.... I Φ T... Φ m 1,T The interpoation injects at coarse points before extending those vaues to fine points, i.e., it is injection from the coarse- to fine-grid foowed by F-reaxation (defined beow). With this, the idea coarse-grid operator is A = RAP. 2 This is referred to as idea because the soution of (2.2) yieds an exact soution at the coarse points. This 2 We note that RAP = R I AP, where R I is injection at the coarse points. Thus for efficiency, injection is aways used to map to the coarse-eve (ike Pararea). The exception is the spatia coarsening option where spatia restriction and interpoation functions, R x() and P x(), are used to coarsen in space as we as time. 5

6 Φ Φ Φ Φ Φ Φ g g g g g g F-reaxation Φ Φ g g C-reaxation Fig. 3: F- and C-reaxation for coarsening by factor of 4 coarse grid is essentiay a compressed version of the probem. If this is foowed by interpoation, then the exact soution is aso avaiabe at fine points. The imitation of this exact reduction method is that the coarse-grid probem is, in genera, as expensive to sove as the origina fine-grid probem (because of the Φ m evauations). Mutigrid reduction methods address this by approximating A with B, where (2.4) B = I Φ I... Φ I, and Φ is an approximate coarse-grid time-step operator. One obvious choice for defining Φ is to re-discretize the probem on the coarse grid so that a coarse-grid time-step is roughy as expensive as a fine-grid time-step. This paper makes that choice. For instance, with backward Euer, one simpy uses a arger time-step size. Convergence of MGRIT is governed by the approximation, A B, and this choice of using a re-discretization of Φ with T = mδt has proved effective [10, 11]. It is important to note that, whie the definition of this agorithm reies upon Φ and Φ, the internas of these functions need not be known. This is the non-intrusive aspect of MGRIT. The user defines the time-step operator and can wrap existing codes to work within the MGRIT framework. The coarse grid is used to compute an error correction based on the residua equation (see Agorithm 1). Reaxation, a oca fine-grid process, is used to resove fine-scae behavior. Figure 3 shows the actions of F- and C-reaxation on a tempora grid with m = 4. F-reaxation propagates the soution forward in time from each coarse point to the neighboring F -points. Overa, reaxation is highy parae. Each interva of F -points can be updated independenty during F-reaxation. Every C-point update (during C-reaxation) is simiary independent MGRIT agorithm for noninear probems. The inear MGRIT agorithm [10] is easiy extended to the noninear setting using fu approximation storage (FAS), a noninear mutigrid scheme [3]. The FAS description of MGRIT first appeared in [11]. Note that the F-reaxation two-grid variant of noninear MGRIT is equivaent to the Pararea agorithm [14]. The noninear MGRIT agorithm is presented in Agorithm 1 as a two-eve method, but can be used in a mutieve setting by recursivey appying the agorithm at Step 4. Idea interpoation (2.3) is carried out in two steps (7 and 8) for simpicity in impementation. 6

7 F-cyce t, x t, x 4 t, 2 x 4 t, x 16 t, 4 x 16 t, x Fig. 4: Exampe mutigrid V-cyce with space-time coarsening that hods δt/h 2 fixed on the eft, and then time-ony coarsening on the right. Prior to running the agorithm, initia vaues for u at the fine grid C-points must be set. In genera, the C-points are initiaized with the best avaiabe estimate of the soution. Aternativey, the initia guess can be obtained using a fu mutigrid methodoogy. In this case, the fine grid C-points are obtained by interpoating a cheap coarse grid soution to the fine grid (see Section 6.1). Agorithm 1 MGRIT(A, u, g) 1: Appy F- or FCF-reaxation to A(u) = g. 2: Inject the fine grid approximation and its residua to the coarse grid: u,i u mi, r,i g mi (A(u)) mi. 3: If Spatia coarsening, then u,i R x (u,i ), r,i R x (r,i ). 4: Sove B (v ) = B (u ) + r. 5: Compute the coarse grid error approximation: e v u. 6: If Spatia coarsening, then e,i P x (e,i ). 7: Correct u at C-points: u mi = u mi + e,i. 8: If converged, then update F -points: appy F-reaxation to A(u) = g. 9: Ese go to step 1. The reader wi note that with exact arithmetic, MGRIT with FCF-reaxation propagates the initia condition two fu coarse grid time intervas (2 T ) each cyce. Thus, MGRIT is equivaent to a sequentia direct sove in N t /(2m) iterations. With F-reaxation ony, the sequentia soution is achieved in N t /m iterations. The speedup comes from the fact that MGRIT converges in O(1) iterations, as shown in [10]. A variety of cycing strategies are avaiabe in mutigrid (e.g., V, W, F). A resuts presented here use the standard V-cyce depicted in Figure 4. This corresponds to Agorithm 1 with the Sove step turned into a singe recursive ca. The recursion ends when a triviay sized grid, of, say, 5 time points, is reached. At this point a sequentia sover is used. On the right in Figure 4, a V-cyce with ony tempora coarsening is shown. Fu spatia coarsening, depicted on the eft, fixes the paraboic ratio, δt/h 2, on a eves, but can degrade the MGRIT convergence rate for the noninear probem considered here. Deayed spatia coarsening, described in Section 5.3, proved to be a more effective strategy. The MGRIT agorithm is impemented in XBraid [41], an open source package deveoped at LLNL. XBraid conforms to MGRIT s non-intrusive phiosophy and requires the user to wrap an existing time-stepping routine, as we as define a few other basic operations ike a state-vector norm and inner-product. The key computationa 7

8 kerne is the time-stepping (i.e., Φ) routine, but a the specifics are opaque to XBraid and done in user code. This aows the user to add tempora paraeism to existing time-stepping codes with minima modifications. For more detais, see [10] and [41] Storage Costs. The reader shoud aso note the storage costs associated with MGRIT. At a minimum, the soution vaues u i must be stored at each C-point in the tempora hierarchy (the first term in the storage cost mode beow). In addition, some number k of auxiiary vectors must be stored at a points on the coarse eves (the second term beow). Let L be the number of tempora eves, p be the number of tempora processors, and s be the cost to store a vector on eve. Then the storage cost of MGRIT (per tempora processor) is as foows: L 1 (2.5) S C (k) i=0 Nt m i+1 p L 1 s i + i=1 Nt k m i s i. p By defaut, MGRIT stores two auxiiary vectors on coarse grids (k = 2): a restricted copy of the fine-grid soution and the right-hand-side for the coarse FAS system. However, in Section 5.2 we show how storing one additiona auxiiary vector, specificay the most recent soution from the corresponding Φ evauation, aows the user to dramaticay reduce the overa runtime. Equation (2.5) wi be used to compute a memory mutipier that gives an estimate of the per processor increase in memory usage when compared to sequentia timestepping. The memory mutipier used is defined as S C (k)/s Mode probem impementation. The weak form of equations (1.3)-(1.5) reads: find u V h such that (3.1) u t, v h + u p 2 u, v h = b, v h + g, v h Ω, v h V h, where V h is an appropriate finite eement space. Discretizing in time using a backward Euer method and the tempora mesh in Figure 2 gives (3.2) uk+1 u k, v h + u k+1 p 2 u k+1, v h = b k+1, v h + g k+1, v h Ω, v h V h, δt where k = 0, 1,..., N t 1, and u 0 is the initia condition, given in (1.5), projected onto the finite eement space. Define Ψ(u)(v) and f k (v) to be (3.3) (3.4) Ψ(u)(v) = u, v + δt u p 2 u, v, f k (v) = u k + δt b k+1, v + δt g k+1, v Ω. Then, the fina noninear weak form is: find u k+1 V h such that (3.5) Ψ(u k+1 )(v h ) = f k (v h ), v h V h, k = 0, 1, 2,..., N t 1. Each time-step corresponds to the soution of this noninear system (i.e., the inversion of Ψ). The Fréchet derivative of Ψ(u)(v), Ψ (u)(v)[w], is (3.6) (3.7) Ψ Ψ(u + aw)(v) Ψ(u)(v) (u)(v)[w] = im, a 0 a = w, v + δt [ u p 2 + (p 2)( u)( u) T ] w, v. 8

9 Hence, Newtons method for (3.5) is (3.8) u j+1 k+1 = uj k+1 δuj, where the subscripts on u are time-steps, the superscripts on u are the Newton iterations, and δu j is the unique eement of V h such that (3.9) Ψ (u j k+1 )(vh )[δu j ] = Ψ(u j k+1 )(vh ) f k (v h ), for every v h V h Numerica parameters. A tests were competed with T = 4 seconds and Ω = [0, 2] 2 on a reguar grid. The forcing function, b(x, t), was chosen such that the exact soution was u(x, y) = sin(κx) sin(κy) sin(τt), where κ = π and τ = (2 + 1/6)π. Uness otherwise stated, the p-lapacian was used with p = 4. The spatia discretization was computed using standard bi-inear quadriatera eements and MFEM [29], a parae finite eement code. The numerica testing parameters used throughout the paper (uness otherwise mentioned) were as foows. The Newton toerance was fixed at The spatia sover for each Newton iteration was BoomerAMG from hypre b [21]. The BoomerAMG parameters were: HMIS coarsening (coarsen-type 10), one eve of aggressive coarsening, symmetric L1 Gauss-Seide (reax-type 8), extended cassica modified interpoation (interp-type 6), and interpoation truncation equa to 4 nonzeros per row. The machine used for a numerica tests was Vucan, an IBM BG/Q machine at LLNL. Except for the scaing studies, the test probem size was a (64) space-time grid on the domain [0, 2] 2 [0, 4] using 4 processors in space and 128 processors in time. V-cyces and FCF-reaxation were empoyed in every test with a fixed stopping criteria of 10 9 /( δt h). This aowed the same toerance, reative to the fine-grid resoution, to be used in a cases. Note that this is an overy tight toerance with respect to discretization error, set in arge part because of the desire to investigate the agorithm s asymptotic convergence properties. The tempora coarsening factor was m = Estabishing baseines. The goa of this paper is to deveop experience with genera strategies that one can use to obtain an MGRIT agorithm for noninear probems that is as efficient as those previousy seen for inear probems. To that end, an efficiency metric for computationa cost is now presented. Two numerica baseines, through which a improvements wi be measured, are aso given. The MGRIT cost metric used here reies on c (j), the average cost of a Newton sove 3 on grid eve during MGRIT iteration j. This is a sensibe choice because the key computationa kerne for noninear probems is the Newton sove. Section 4.1 provides a detaied discussion of this metric and why it was chosen. Two baseine tests were competed, the first being the sequentia baseine. This baseine determined the average cost of a Newton sove when using a sequentia sover, on a variety of space-time grids. An efficient impementation of MGRIT for noninear 3 By Newton sove, we mean the whoe cost to take a noninear time step with Φ using the Newton sover, incuding its mutipe inner Newton iterations. 9

10 probems wi match (or improve on) these cost estimates. The second baseine test is caed the naive MGRIT baseine. For this baseine, MGRIT was appied to the mode noninear probem in an out of the box fashion. Typicay, time-stepping routines have a hard-coded initia guess equa to the previous time-step and do not impement spatia coarsening. Hence, the naive MGRIT baseine mimics this. Note that this approach did, given enough tempora processors, provide a sma speedup over the sequentia routine (see Section 7.2); however, these speedups were much ess than those seen for inear probems Cost Metric. We define the chosen cost metric, c (j), as the cost of a Newton sove (Φ appication) on grid eve during MGRIT iteration j, averaged over a time-steps on that eve, during that MGRIT iteration, i.e., (4.1) c (j) = a (j) w, where a (j) is the average number of Newton iterations required to take a noninear time-step on eve and MGRIT iteration j, and w is the number of spatia unknowns on eve. Thus, c (j) is the cost estimate of a singe Φ appication. As motivation for this choice, et us examine a simpe cost mode for the entire MGRIT agorithm. Restriction from eve to eve + 1 has the same cost as a C-reaxation on eve. Likewise, interpoation from eve to eve 1 has the same cost as an F-reaxation on eve. Hence, beyond the coarsest grid when using FCFreaxation, each MGRIT eve carries out 3 F-reaxations and 2 C-reaxations, for a tota of (2 + (m 1)/m)N t evauations of Φ, where for simpicity N t, the number of time-steps on eve, is assumed to be an exact mutipe of m. Next, consider an MGRIT V-cyce where eve 0 is the finest and eve L is the coarsest, and ν is the number of V-cyces required for MGRIT to converge to within the residua toerance. Then, the tota cost of the MGRIT agorithm is ( ) ν L 1 (4.2) C(L, ν) N tl c (k) L + c (k) (2 + (m 1)/m)N t. k=1 =1 Ceary, equation (4.2) impies that minimizing the average cost of a Newton sove on each eve and MGRIT iteration wi directy resut in a reduction in the overa cost of the MGRIT cyce, and hence, c (j) can be used to measure the efficiency of the agorithm. Section 5 examines the two most important strategies for minimizing c (j), an improved initia guess for Newton and spatia coarsening. Whie the metric shown here is c (j), another parae performance issue is the variance in the cost of a Newton sove over the tempora domain. On the coarser grids, where a processor might own a singe time-step, synchronization effects impy that an F- or C-reaxation cannot compete unti the sowest processor finishes. This in turn impies that the difference between the maximum and minimum cost per Newton sove woud indicate synchronization probems. We therefore note that c (j) aso tracks this spread between maximum and minimum for this probem, but since this fact is probem dependent, we draw the reader s attention to it Sequentia time-stepping baseine. We now estabish the sequentia time-stepping baseine. Tabe 1 shows estimates of the cost of a Newton sove on a of the space-time grids present in the space-time grid hierarchy when MGRIT is appied to the mode probem outined in Section 3.1. These estimates, cacuated 10

11 δt h a w c 1/1024 1/ e3 1/256 1/ e4 1/64 1/ e4 1/16 1/ e3 1/4 1/ e3 1/1 1/ e2 (a) Deayed spatia coarsening δt h a w c 1/1024 1/ e3 1/256 1/ e4 1/64 1/ e4 1/16 1/ e4 1/4 1/ e4 1/1 1/ e4 (b) No spatia Coarsening Tabe 1: Baseine Newton sover costs for sequentia time-stepping. using Equation (4.1), are the average cost of a Newton sove when using a sequentia sover. Tabe 1a depicts baseine estimates for MGRIT using deayed spatia coarsening, whie Tabe 1b provides baseine estimates for MGRIT with no spatia coarsening. An efficient impementation of MGRIT wi match (or improve on) these cost estimates across a eves. Tabe 1 indicates that the average cost of a Newton iteration is highy dependent on the space-time grid, with there being a considerabe advantage to coarsening simutaneousy in space and time. This is because spatia coarsening reduces both a (j) and w. The decrease in a (j) is expained by considering a standard backward Euer time-step, (4.3) ( I δt ) h 2 G (u k+1 ) = f(u k ), where G is a noninear diffusion operator. As δt/h 2 increases, the noninear operator moves away from the identity, becoming more expensive to sove. Coarsening in space and time bounds this ratio, making the noninear sove from Equation (4.3) cheaper on the coarse grid. A detaied investigation into using spatia coarsening is given in Section 5.3. The other important strategy expored here to reduce the cost of Newton is to improve the initia guess (see Section 5.2). This importance is aso evident in Tabe 1, where a continues to increase with δt, even when spatia coarsening is used. This is because δt is increasing, thus making the initia guess to Newton (the previous time-step) progressivey worse. On the coarse eves, where δt is arge, this is ceary a poor approximation to the soution. In Section 5.2, an aternate initia guess for Newton, where the initia guess is the soution obtained during a previous evauation of Φ at the current time-point, and on the current MGRIT eve, is pursued Naive MGRIT baseine. Next, the so caed naive MGRIT baseine is presented. The naive impementation is an unmodified, out of the box type impementation that wraps a sequentia time-stepping routine without any MGRIT optimizations. In particuar, the previous time-step is used as the initia guess and spatia coarsening is not impemented. Tabe 2 depicts the cost estimates, c (j), for this setting inside an MGRIT cyce. Each δt (coumn) vaue corresponds to a tempora eve, whie the rows represent different XBraid iterations. Due to increasing Newton iteration counts, the cost of a coarse grid Newton sove is, in some paces, 6 times more expensive than a fine grid sove. This can ead to 11

12 Iteration δt = 1/1024 1/256 1/64 1/16 1/ e4 2.10e4 2.20e4 3.16e4 5.86e4 4.92e e4 1.24e4 1.76e4 3.52e4 5.98e4 5.41e e3 1.19e4 1.70e4 3.35e4 6.06e4 5.73e e3 1.17e4 1.72e4 3.38e4 5.94e4 5.73e e3 1.17e4 1.72e4 3.39e4 5.94e4 5.73e e3 1.17e4 1.72e4 3.39e4 5.98e4 5.73e4 Tabe 2: Cost estimates, c (j), for the naive MGRIT baseine (sover 0). The cost estimates are given in raw, unscaed form across each tempora eve and MGRIT iteration. Iteration δt = 1/1024 1/256 1/64 1/16 1/4 1 Tota e5 4.20e4 4.39e4 6.32e4 1.17e5 9.83e4 5.38e e4 2.48e4 3.51e4 7.05e4 1.20e5 1.08e5 4.44e e4 2.38e4 3.41e4 6.71e4 1.21e5 1.15e5 4.31e5.... Tabe 3: For processes active on each tempora eve, estimated average cost to carry out FCF-reaxation, based on Tabe an inefficient method in parae. Consider the case where each MPI process owns m points in time on the finest eve, i.e., one CF-interva. In [10] it was shown that this is an efficient decomposition. On coarser eves, each process then owns at most one point in time. Tabe 3 gives, for the processes active on each tempora eve, the estimated cost of an FCF-reaxation. On the finest grid, these numbers are 2m times the cost estimates in Tabe 2 because FCF-reaxation (for arger m) invoves roughy 2m time-step evauations. Then, on coarser grids, the cost estimate is simpy twice what is in Tabe 2 because each processor owns at most one point, and a F -points are reaxed twice. In this setting, it is immediatey cear how expensive Newton soves on coarse eves can dominate the cost of a V-cyce because the eves must be traversed in order on each processor (i.e., one can sum each row in Tabe 3 for a cost estimate of that V-cyce). Since the dominant cost of a V-cyce is reaxation, this row-sum, given in the fina coumn, is then a cost estimate for each V-cyce. In contrast, for the inear setting (when using an optima spatia sover), the work required woud be independent of time-step size. When targeting an MGRIT efficiency simiar to a inear probem, this wi be a key issue. In concusion, our goa is to present a ibrary of MGRIT optimizations and modifications that can be extended to most noninear paraboic probems. These optimizations and modifications are deveoped with the goa of minimizing the average cost of a Newton sove, focusing on the coarse grids, whie addressing common user concerns of non-intrusiveness and memory usage. Controing any growth in the cost of a Newton sove across tempora eves wi be key to matching the resuts seen with MGRIT for inear probems. By itsef, the naive appication of MGRIT scaes very 12

13 poory when paced aongside the comparabe inear probem (see sections 7.1 and 7.2). 5. Efficient MGRIT for the mode noninear probem. In this section a variety of approaches designed to make MGRIT more efficient for the chosen noninear mode probem (1.3) are investigated. Improvements are measured by comparing to the naive MGRIT baseine of Section Sover ID tabe. Given the number of options considered, and to make discussion easier, each sover is given a numerica ID. Tabe 4 presents each sover considered and its runtime for the chosen test probem size. Aso given is the per processor memory mutipier that gives an estimate of the per processor increase in memory usage (see Section 2.2), when compared to sequentia time-stepping. The mutipier is given for the number of tempora processors p = 128 and p = 4096, which is the argest possibe p for this test probem size. For p = 128, the mutipier is quite arge, but the addition of more resources can make this number reativey sma. Each sover option is discussed in more detai in the foowing subsections. However, a brief description of each sover is presented here for the reader s convenience. Sover 0 refers to the naive MGRIT approach from Section 4.3. The coumn heading Spatia Grids refers to the number of spatia grids used and is the option introduced in Section 5.3. A vaue of 1 for Spatia Grids indicates no spatia coarsening, whie 4 means that the finest spatia grid is coarsened 3 times for a tota of 4 grids. The finest grid is 64 64, so coarsening further in space is not advantageous. The difference between sovers 1 and 2 is that sover 1 deays spatia coarsening so that it begins on the fourth tempora grid. Sover 2 begins spatia coarsening immediatey on the first coarse time grid. Remember, for this probem with 4096 time-steps and m = 4, there are ony 6 tempora grids. Given the bad effects on convergence visibe from no deay in sover 2, uness otherwise mentioned, spatia coarsening is aways deayed. See Section 5.3 for more detais. Sovers 3, 4, 5 and 6 correspond to the improved initia guess introduced in Section 5.2. Here, IIG means that the improved initia guess, specificay the soution from a previous evauation of Φ, is used as the initia guess for each Newton sove. This happens either at every point, or at a the coarse grid points and the fine grid C-points, depending on whether a the points or ony the C-points are stored. Regarding the memory mutipier, the use of IIG requires an additiona vector stored on coarse eves, hence k = 3 (instead of 2) in Equation 2.5. Sovers 7, 8, 9 and 10 correspond to turning on the Skip option introduced in Section 6.1. There, the concept of skipping unnecessary work during the first MGRIT down cyce is introduced. Sovers 11 and 12 correspond to having Cheap first three iters as introduced in Section 6.2. Here, the Newton toerance is reaxed during the first three MGRIT iterations. Sovers 13, 14, 15 and 16 reduce the number of eves in the hierarchy by setting a arger Coarsest grid size as introduced in Section 6.3. For instance with m = 4, using a coarsest grid size of 16 instead of 4 removes the coarsest eve in the hierarchy. This can improve the time-to-soution by avoiding cycing between very sma grids MGRIT with an improved initia guess for Newton s method. Section 4.2 showed that c (j) increases with the time-step size (and hence MGRIT eve) because using the previous time-step as the initia guess for Newton s methods becomes increasingy inaccurate. Thus, this section expores an improved initia guess 13

14 Sover ID Spatia grids IIG Skip Cheap first 3 iters Coarsest grid size Memory mut., p = Runtime per iter Iterations Runtime 0 1 Never No No s s 1 4 Never No No s 7 712s 2 4* Never No No s s 3 1 C points No No s 7 638s 4 4 C points No No s 7 579s 5 1 Aways No No s 7 647s 6 4 Aways No No s 7 591s 7 1 C points Yes No s 7 453s 8 4 C points Yes No s 7 410s 9 1 Aways Yes No s 7 429s 10 4 Aways Yes No s 7 391s 11 1 Aways Yes Yes s 7 395s 12 4 Aways Yes Yes s 7 360s 13 1 Aways Yes Yes s 7 408s 14 4 Aways Yes Yes s 7 354s 15 1 Aways Yes Yes s 8 506s 16 4 Aways Yes Yes s 8 383s Tabe 4: Overa runtimes, storage costs, iteration counts and average time per iteration for the various sover options, with a (64) space-time grid. A * indicates no deay in spatia coarsening. (IIG), which uses as the initia guess for Newton s method at a specific time point and MGRIT eve, the soution from the previous evauation of Φ at that point and eve. As XBraid converges, this vaue becomes an ever improving initia guess. The main drawback of using the IIG is that an extra auxiiary vector must be stored at each point on every coarse grid 4. The extra storage at coarse eves is required because the FAS soution, aready stored there, is not equa to the resut of the previous Φ evauation. This creates a tension between memory usage and computationa efficiency. To imit memory usage, one can aternativey choose to store the soution at the fine grid C-points ony. In this case, the IIG is avaiabe at a time steps except those competed during fine grid F-reaxation, where the previous time-step is used. This approach, referred to as IIG at C-points, is used by sovers 3, 4, 7 and 8. 4 The soution from the previous Φ evauation and the current FAS soution are equivaent on the fine-grid, so the IIG is aready avaiabe at stored fine-grid points. 14

15 Iteration δt = 1/1024 1/256 1/64 1/16 1/ Tabe 5: Reative cost estimates, c (j) /c (j),naive, for MGRIT with the improved initia guess at a points (sover 5), across each tempora eve and MGRIT iteration. The sover setup is otherwise identica to that used for Tabe 2. Cost estimates are scaed entry-by-entry with c (j),naive, the naive MGRIT baseine costs from Tabe 2. Tabe 5 shows c (j) over a eves and iterations, scaed entry-by-entry with the corresponding entry from the naive MGRIT baseine from Tabe 2. For exampe, entry (2,3) in Tabe 5 has been scaed by entry (2,3) from Tabe 2. Vaues ess than one indicate that the cost per Newton sove was reduced by using the improved initia guess, whie vaues greater than one indicate it increased. A arge reduction in c (j) is seen across most tempora grids and iterations. The exception to this is on the fine-grid, where costs increase during eary iterations when using the IIG. This is not surprising. During the first few iterations the IIG is either inaccurate or unavaiabe. 5 A consequence of these increased costs is seen in Tabe 4, where sover 3, which uses the IIG everywhere except the fine grid F-Points, was actuay sighty faster than when the IIG was used at a points. Despite the increased costs on the fine grid, sovers 3 and 5, where the IIG is used as the initia guess to the Newton sover at a points and ony at coarse points (C-points), show arge reductions in overa runtime, a direct consequence of the cost reductions seen in Tabe 5. Moreover, there is no degradation in MGRIT convergence. Sovers 4 and 6, where the IIG is used in conjunction with spatia coarsening, are discussed in Section 5.4. In concusion, these resuts indicate that using the IIG as the initia guess is very beneficia. Using the IIG ony at C-points is an exceent option, in this case reducing the runtime by 44%, with ony a 32% increase in storage costs over the naive approach, when comparing sovers 0 and 3 in Tabe 4. Using the IIG as the initia guess at a points doubed the storage costs of the method, but ony reduced the runtime by 43%. Optionay, a user coud opt to experimentay determine which initia guess to use on each eve and iteration. Such a strategy woud ikey resut in further cost savings, but be highy probem specific. Instead, we wi pursue more genera strategies that, when used in conjunction with the improved initia guess, wi reduce the cost of those first three iterations in Section 6. Either way, the improved initia guess is an effective, non-intrusive strategy that wi provide a arge reduction in runtime for most noninear paraboic impementations of MGRIT. This abiity to effectivey use reduced storage ony at C-points and 5 During the first iteration, at C-points, the user suppied initia guess is returned as the soution from a previous Φ evauation. The user does not define an initia guess at F -points, so the previous time-step must be used. 15

16 the overa importance of improved initia guesses for noninear MGRIT are both important contributions of this work MGRIT with spatia coarsening. Motivated by resuts seen in Section 4.2 and Tabe 1a, spatia coarsening is added to the naive sover from Section 4, as indicated in Agorithm 1. In genera, the user s code defines the separate spatia interpoation and restriction functions P x () and R x (). At first gance spatia coarsening seems intrusive, however, MGRIT semi-coarsens in time and is agnostic to the spatia discretization, thus spatia coarsening is an extra option that can easiy be impemented independenty of the time-integration scheme. Here, the natura finite eement spatia restriction operator (and its transpose) is used to interpoate between reguary refined spatia grids. This operator is provided by MFEM. Spatia interpoation equas the scaed (by 1/4) transpose of restriction so that R x P x I. The choice of spatia interpoation operators is an area of active research; however, it is not surprising that scaing so that RP resembes an obique projection heps MGRIT convergence. A better choice here coud ead to improved resuts beow. The benefit of spatia coarsening is vaidated by the improved timings in Tabe 4. Here, sovers 0, 1 and 2 appy varying eves of spatia coarsening. Beginning spatia coarsening on the first coarse grid (sover 2) eads to a degradation in MGRIT convergence. For practica purposes, this sover is unusabe as the convergence degradation continues for arger probems. This is unfortunate given that this approach has a much smaer runtime per iteration. It is important to note that this degradation is not restricted to noninear probems. For exampe, if we repace the noninear co-efficient of the p-lapacian with the exact, known soution, then the resuting inear probem sti suffers from the same MGRIT degradation due to spatia coarsening. This degradation is sti an area of active research, however recent anaysis suggests that it occurs because the eigenvaues of the time-integration operators (Φ and Φ ) are not bounded away from zero. Our goa is to deveop a spatia coarsening strategy that can be appied to a noninear probems. As such, we omit a detaied, probem specific convergence anaysis and instead provide a simpe strategy that aows spatia coarsening to be used with some noninear probems that exhibit this phenomenon. A simiar strategy was used in [13] to make use of spatia coarsening inside a fu space-time mutigrid method. The strategy, referred to as deayed spatia coarsening, is to imit spatia coarsening to the coarsest time grids. For exampe, sover 1 uses four spatia grids with spatia coarsening deayed unti the fourth tempora eve. Tabe 6 presents a cost anaysis of this strategy, scaing each c (j) with the corresponding cost estimates found using the naive MGRIT baseine in Tabe 2. To highight the cost saving benefits of deayed spatia coarsening, this tabe shows resuts that use the previous time-step as the initia guess for the Newton sove. Section 5.4 discusses the combined effect of spatia coarsening and the improved initia guess, introduced in Section 5.2. Deayed spatia coarsening reduced both the number and cost of Newton iterations on the coarse grids, In fact, on the coarsest grid the average cost per Newton Sove is 100 times smaer than the cost to compete the same sove using the naive baseine. Recognizing the specific importance of spatia coarsening when controing the noninear time-step sover iterations on coarse time grids is an important contribution. The benefit of these dramatic cost reductions is seen in Tabe 4. Sover 1, where spatia coarsening is deayed unti the fourth tempora eve, simutaneousy minimized 16

17 Iteration δt = 1/1024 1/256 1/64 1/16 1/ Tabe 6: Reative cost estimates, c (j) /c (j),naive, for MGRIT with spatia coarsening (sover 1), across each tempora eve and MGRIT iteration. Spatia coarsening begins on the fourth time eve, δt = 1/16. The previous time-step is used as the initia guess at a points. The sover setup is otherwise identica to that used for Tabe 2. Cost estimates are scaed entry-by-entry with c (j),naive, the naive MGRIT baseine costs from Tabe 2. Iteration δt = 1/1024 1/256 1/64 1/16 1/ Tabe 7: Reative cost estimates, c (j) /c (j),naive, for MGRIT when using spatia coarsening and the improved initia guess (sover 6), across each tempora eve and MGRIT iteration. The sover setup is otherwise identica to that used for Tabe 2. Cost estimates are scaed entry-by-entry with c (j),naive, the naive MGRIT baseine costs from Tabe 2. Spatia coarsening again begins on the fourth time eve, δt = 1/16. the MGRIT convergence rate and the coarse grid computationa costs, eading to a 37 % reduction in run-time when compared to sover 0. This is the strategy pursued in this paper and it has proven to be robust in our tests Combining the improved initia guess and spatia coarsening. Sections 5.2 and 5.3 introduced an improved initia guess and spatia coarsening. Individuay these improvements reduced the overa runtime by 43% and 37%, respectivey. Tabe 7 gives a cost anaysis of the MGRIT agorithm when these strategies are combined, using 4 eves of spatia coarsening and the IIG at a points. In this tabe each cost estimate is again scaed by the corresponding cost estimate cacuated using the naive MGRIT baseine. Sovers 4 and 6, from Tabe 4, show the effect of using both of these options. When compared to the naive baseine, sover 4, where the IIG is used ony at C- points, gives a 49% reduction in runtime with ony a 5% increase in storage, when compared to sover 0. Overa, deayed spatia coarsening, and spatia coarsening in genera, is an effective strategy that can be appied to most, if not a, noninear paraboic impe- 17

18 mentations of MGRIT. Ceary, no deay in spatia coarsening is best, however, in cases where this causes a arge degradation in the convergence rate, deayed spatia coarsening can sti be used to baance the computationa savings and convergence degradation. Mitigating this degradation in the convergence rate is an area of active research. 6. Further Cost Savings. Whie the improved initia guess and spatia coarsening are, by far, the most effective strategies presented, severa more strategies for reducing the cost of MGRIT are now considered. The effect of these strategies is significanty smaer than that for spatia coarsening and the improved initia guess; thus we wi now normaize the cost anaysis tabes with respect to the just previousy considered sover configuration. For exampe, the next tabe in Section 6.1 is scaed entry-by-entry by the raw cost estimates corresponding to Tabe 7 from Section 5.4. The tabe after that in Section 6.2 is scaed entry-by-entry by the raw cost estimates corresponding to the tabe from Section 6.1, and so on. An entry greater than 1 represents a deterioration reative to the previous sover configuration, whie a vaue ess than 1 represents an improvement Skipping Unnecessary Work. Even with the improvements so far, c (j) i remains arge during the first three MGRIT iterations. Consider iteration 0 and the down-cyce and up-cyce parts of Figure 4. For the mode probem, where no prior knowedge of the soution is avaiabe, it is cear that reaxation during the down cyce of iteration 0 provides no benefit. The first time that goba information is propagated is during the coarse grid sove, and the subsequent up cyce, of iteration 0. Therefore, a reaxation, and in fact work of any kind, during the down-cyce of iteration 0 can be omitted. In this setting, the initia condition on the finest-grid is injected to the coarsest-grid and then seriay propagated. Then, the soution is interpoated back to the finest-grid. Because we have no knowedge of the soution, the previous time step is used as the initia guess throughout this process. This strategy, simiar to those found in fu mutigrid methods, is a non-intrusive modification of the basic MGRIT agorithm. If some a priori knowedge of the soution is avaiabe to initiaize the finest-grid for MGRIT, then this work shoud not be skipped. In a other cases, this is an effective, cheap, approach to determining an initia guess at the fine-grid C-points. Tabe 8 shows the cost anaysis for this approach (sover 8) when it is added to the sover strategy from Section 5.4 (sover 6), where the IIG is used as the initia guess, aong with four eves of spatia coarsening. The cost anaysis is simiar if spatia coarsening is not used. To better discern the improvements, c (j) is scaed by the corresponding raw cost estimates generated when using sover 6. Skipping work can actuay increase the cost of a Newton sove on some eves during the first few iterations. However, the corresponding sovers 7 and 8 in Tabe 4 show that this strategy reduces the overa runtime with no degradation in MGRIT convergence. The reduced runtime is due to the fact that the time-stepping routine is caed far fewer times during iteration 0, despite being more expensive when it is caed. For instance, the denotes the fact that there is no work done on the first eve for iteration 0, by far the most expensive eve. On coarse eves, during iteration 0, ony interpoation (F-reaxation) is performed Cheap initia iterates. The initia three iterates are the most expensive, with c (j) significanty higher on a eves. The soution at this point is sti inaccurate, so another obvious modification is to reduce the Newton toerance during the first 18

A Comparison of a Second-Order versus a Fourth- Order Laplacian Operator in the Multigrid Algorithm

A Comparison of a Second-Order versus a Fourth- Order Lapacian Operator in the Mutigrid Agorithm Kaushik Datta (kdatta@cs.berkeey.edu Math Project May 9, 003 Abstract In this paper, the mutigrid agorithm