PARALLEL COMPUTING. Tayfun E. Tezduyar. Ahmed Sameh

Size: px

Start display at page:

Download "PARALLEL COMPUTING. Tayfun E. Tezduyar. Ahmed Sameh"

Maximilian Barton
5 years ago
Views:

1 1 FINITE ELEMENT METHODS: 1970 s AND BEYOND L.P. Franca, T.E. Tezduyar and A. Masud (Eds.) c CIMNE, Barcelona, Spain 2004 PARALLEL COMPUTING Tayfun E. Tezduyar Mechanical Engineering Rice University MS Main St Houston, TX 77005, USA tezduyar@rice.edu Ahmed Sameh Computer Science Purdue University West Lafayette, IN 47907, USA sameh@cs.purdue.edu Abstract. We provide an overview of the role of parallel computing in finite element computations, with emphasis on computational fluid mechanics and moving boundaries and interfaces. The class of problems with moving boundaries and interfaces, which includes fluid object and fluid structure interactions as well as free-surface and two-fluid flows, offers a number of computational challenges. These challenges need to be addressed in such a way that 3D computation of complex applications can be carried out efficiently on parallel computers. This requirement becomes an important factor in designing various components of the overall solution strategy, such as solution techniques for the discretized equations and mesh update methods for handling the changes in the spatial domain occupied by the fluid. Our overview includes description of how these types of challenges are addressed. Key words: Parallel computing, mesh update, preconditioned iterative algorithms. 1 INTRODUCTION Within the past decade parallel computing has had a major and indispensable role in 3D finite element computation of complex engineering problems. Together with advanced algorithms capable of efficiently simulating complex, real-world problems, advanced parallel computing platforms have brought simulation and modeling to a new level. Also within the past decade, we witnessed many changes in parallel computing platforms from Thinking Machines CM 5 to CRAY T3E, from IBM SP to PC-clusters. Currently the future of supercomputing in the US is being assessed, and there are calls for revitalizing high performance computing in the country. We provide additional comments on this subject in Section 5. Parallel computations can be carried out on shared or distributed memory systems, or a combination of the two. Parallel implementations on shared memory systems, which are relatively simpler, are based on the processors using the same

2 Finite Element Methods: 1970 s and beyond central memory. Parallel implementations on distributed memory systems require more elaborate work, but within the past decade many interfaces have been developed to make the task easier for computational engineers. For example, the user-friendly Connection Machine software libraries developed by Johnsson and his associates 1 3 at Thinking Machines encouraged several finite element researchers to implement their formulations on the Connection Machine systems. Some of the earlier and landmark contributions in parallel finite element computations were made by Hughes and his coworkers. 4 6 Examples of other earlier contributions in finite element computations are those by Tezduyar, 7 9 Farhat, Shephard, 13 and their associates. It is not our intention here to provide an exhaustive review of the parallel finite element computation activities of the past decade, but to give examples of how we have addressed some of the computational challenges in 3D finite element computation of complex engineering problems on parallel computing platforms. We focus on flow problems with moving boundaries and interfaces, such as fluid object and fluid structure interactions and free-surface and two-fluid flows. Specifically we describe how the mesh is updated with sufficient parallel efficiency as the spatial domain occupied by the fluid changes its shape during the computation of this class of problems. We also focus on how we can employ effective iterative schemes for the solution of large, coupled equation systems generated by the spatial and temporal discretizations. In particular, we present advanced, special preconditioners that are ideally suited for implementation on high-end parallel architectures. In the computation of moving boundaries and interfaces, flow conditions determine whether we should use an interface-tracking or interface-capturing technique or a combination of the two. In an interface-tracking technique the interface computation requires meshes that track the interfaces. The mesh needs to be updated as the interface evolves. In an interface-capturing technique the interface computation does not require that the mesh be updated as the interface evolves. Instead, the interface is captured within the resolution of the finite element mesh covering the area where the interface is. The interface-tracking and interface-capturing techniques we have developed over the years (see ) are based on stabilized formulations. The stabilized methods are the streamline-upwind/petrov-galerkin (SUPG) and pressure-stabilizing/petrov-galerkin (PSPG) 14,23 formulations. An earlier version of the pressure-stabilizing formulation for Stokes flows was reported in. 24 These stabilized formulations prevent numerical instabilities in solving problems with high Reynolds or Mach numbers and shocks or thin boundary layers, as well as when using equal-order interpolation functions for velocity and pressure. These stabilized formulations also substantially improve the convergence rate in iterative solution of the large, coupled nonlinear equation system that needs to be solved at every time step of a flow computation. Such nonlinear systems are typically solved with the Newton Raphson method, which involves, in each iteration step, solution of a large linear system of equations. It was pointed out in 8 that using a good stabilized method makes substantial difference in the speed of convergence of iterative schemes for solving those inner linear systems. Preconditioned iterative schemes that perform well even in the absence of such stabilizations have also been developed. 25,26 We will discuss those briefly in later sections. In our interface-tracking techniques the core method is the Deforming-Spatial-

3 T.E. Tezduyar and A. Sameh / PARALLEL COMPUTING Domain/Stabilized Space Time (DSD/SST) formulation, where the finite element formulation of the problem is written over its space time domain. As the spatial domain occupied by the fluid changes its shape in time, the mesh is updated with an automatic mesh moving method, 7,17 where the motion of the nodes is governed by the equations of elasticity, and full or partial remeshing (i.e., generating a new set of elements, and sometimes also a new set of nodes) as needed. The mesh update issues, including their relevance to parallel efficiency, is discussed in more detail in Section 2. We also note that the stabilized space time formulations were used earlier by other researchers to solve problems with fixed spatial domains (see for example 27 ). In our interface-capturing techniques (see 17,18 ) the core method consists of the stabilized formulations of the Navier Stokes equations and an advection equation governing the time-evolution of an interface function marking the interface location. For comparable levels of spatial discretization, interface-capturing methods yield less accurate representation of the interface. By accurate representation of the interface we mean not only accurately locating the interface, but also accurately resolving the boundary or interface layers that might be present near the interface. However, because the interface-capturing techniques do not require the mesh to be updated as the interface evolves, they can be used as practical alternatives when a free-surface or two-fluid interface might be too complex or unsteady to track while keeping the frequency of remeshing at an acceptable level. The accuracy of an interface-capturing technique in representing an interface can be increased in a number of ways, one of which is the Enhanced-Discretization Interface-Capturing Technique (EDICT). 17,18,28 In the EDICT, a second-level, more refined mesh is built over regions of the first-level mesh which contain the interface. The finite element function spaces are based on a combination of first- and second-level meshes. How the second-level mesh is constructed and updated, and how the computational and parallel efficiency are maintained, are discussed in more detail in Section 2. The large coupled, nonlinear equation systems that need to be solved in each time step in a time-accurate computation (or every pseudo-time step in a steady-state computation) are solved using iterative methods. 7,17,29 34 The main components of these iterative methods are: a) formation of the residual vector of the linear equation system (that needs to be solved in each iteration of a Newton Raphson sequence used in solving the coupled, nonlinear equations); b) designing effective preconditioners for the Jacobians in the Newton Raphson iterations; and c) updating the solution vector in an optimal way. More details on the iterative solution techniques are discussed in Sections 3 and 4. In Section 5 we provide brief comments on parallel computing platforms. We present some numerical examples in Section 6, and end with concluding remarks in Section 7. 2 MESH UPDATE METHODS In interface-tracking techniques, how the mesh is updated depends on several factors, such as the complexity of the interface and overall geometry, how unsteady the interface is, and how the starting mesh was generated. In general, the mesh update could have two components: moving the mesh for as long as it is possible, and full or partial remeshing (i.e., generating a new set of elements, and sometimes also a new set of nodes) when the element distortion becomes too high. In mesh moving strategies, the only rule the mesh motion needs to follow is that at the interface the

4 Finite Element Methods: 1970 s and beyond normal velocity of the mesh has to match the normal velocity of the fluid. Beyond that, the mesh can be moved in any way desired, with the main objective being to reduce the frequency of remeshing. In most 3D applications, remeshing requires calling an automatic, unstructured-mesh generator, and therefore reducing the cost of automatic mesh generation becomes a major incentive for reducing the frequency of remeshing. Since the parallel efficiency of most automatic mesh generators is substantially lower than that of most flow solvers, one has to strive to reduce the frequency of remeshing. For example, remeshing every ten time steps would sufficiently reduce the influence of remeshing in terms of its added cost and lack of parallel efficiency. In most of the complex flow problems we computed in the past, the frequency of remeshing was far less. In our current parallel computations on a PC-cluster, typically we perform remeshing on one of the nodes, which, with its 2 GigaBytes of memory, is powerful enough to generate very large meshes. If remeshing does not consist of (full or partial) regeneration of just the element connectivities but also involves (full or partial) node regeneration, we need to project the solution from the old mesh to the new one. This involves a search process, which can be carried out in parallel. Still, the computational cost involved in this search process as well as the projection errors introduced by remeshing, add more incentives for reducing the frequency of remeshing. In the automatic mesh moving technique introduced in, 7 the motion of the internal nodes is determined by solving the equations of elasticity. As boundary condition, the motion of the nodes at the interfaces is specified to match the normal velocity of the fluid at the interface. Similar mesh moving techniques were used earlier by other researchers (see for example 35 ). In 7 the mesh deformation is dealt with selectively based on the sizes of the elements, and based on the deformation modes in terms of shape and volume changes. Mesh moving techniques with comparable features were later introduced in. 36 In the technique introduced in, 7 selective treatment of the mesh deformation based on shape and volume changes is attained by adjusting the relative values of the Lamé constants of the elasticity equations. The objective would be to stiffen the mesh against shape changes more than we stiffen it against volume changes. Selective treatment based on element sizes, on the other hand, is attained by altering the way we account for the Jacobian of the transformation from the element domain to the physical domain. In this case, the objective is to stiffen the smaller elements, which are typically placed near solid surfaces, more than the larger ones. We developed a number of additional mesh moving features around this core method for the purpose of reducing the frequency of remeshing. We refer the interested readers to. 8,18,37,38 In the EDICT, a subset of the elements in the first-level mesh are identified as those at or near the interface. The second-level mesh is constructed by patching together the more refined meshes generated over each element in this subset. The interpolation functions for velocity, pressure and interface function will all have two components each: one from the first-level mesh and the other from the second-level mesh. We re-define the subset of the first-level mesh over which we build the secondlevel mesh not every time step but with sufficient frequency to keep the interface enveloped. We need to avoid this envelope being too wide or too narrow. If the envelope is too wide, although we would keep the interface enveloped in for a large number of time steps without reconstructing the second-level mesh, we would also

5 T.E. Tezduyar and A. Sameh / PARALLEL COMPUTING have at a given time enhanced discretization mostly where it is not needed. This would pointlessly increase the computational cost for the flow solver. On the other hand, if the envelope is too narrow, then the frequency of reconstructing the secondlevel mesh would increase. This would in turn increase the computational overhead involved in redefining the location of the second-level mesh and generating it. This would also adversely affect the parallel efficiency of the computations. Our target would be to maintain the frequency of reconstructing the second-level mesh at about every ten time steps. This would keep the computational cost for the flow solver at a reasonable level and at the same time would sufficiently reduce the effect of mesh reconstruction in terms of its added cost and harm to parallel efficiency. 3 ITERATIVE SOLUTION METHODS The finite element formulations mentioned in the earlier sections fall into two categories: a space time formulation with moving meshes or a semi-discrete formulation with non-moving meshes. Full discretizations of these formulations lead to coupled, nonlinear equation systems that need to be solved in each time step of the simulation. Whether we are using a space time formulation or a semi-discrete formulation, we can represent the system of equations that needs to be solved as follows: N (d n+1 ) = F. (1) Here d n+1 is the vector of nodal unknowns. In a semi-discrete formulation, this vector contains the unknowns associated with marching from time level n to n + 1. In a space time formulation, it contains the unknowns associated with the finite element formulation written for the space time slab between time levels n and n+1 (see ). We solve Eq. (1) with the Newton Raphson method: N d ( ) ( ) d i n+1 = F N d i n+1, (2) d i n+1 where i is the step counter for the Newton Raphson sequence, and d i n+1 is the increment computed for d i n+1. The large sparse linear system in Eq. (2), which needs to be solved in each Newton Raphson iteration, may be written as Ax = b. (3) In the class of computations we typically carry out, this linear system is too large to be solved via a direct sparse solver. Therefore, we resort to solving it via a preconditioned iterative scheme. In each iteration, the most time-consuming steps are: (a) computing the residual of the linear system at hand, and (b) solving a linear system of the form r = b Ax, (4) Pz = r, (5) where P is the chosen preconditioner for the linear system in Eq. (3).

6 Finite Element Methods: 1970 s and beyond Computing the residual can be achieved in different ways. The computation can be based on: (i) a sparse-matrix storage of A, (ii) storing just element-level matrices (element-matrix-based), or (iii) storing just the element-level vectors (elementvector-based). This last strategy is also called the matrix-free technique. The preconditioner P is chosen such that it is much simpler to solve systems of the form in Eq. (5) than systems in Eq. (3). Moreover, it should be noted that the closer the inverse of P is to the inverse of A, the fewer the required iterations (to achieve an acceptable solution in each Newton Raphson iteration) are and the more effective the preconditioner is. Updating the solution vector x is also an important step in the iterative solver. Several update methods are available, and the GMRES 39 scheme is among the more popular ones. In constructing iterative flow solvers, we concentrated on choosing good preconditioners P and computing effectively the residual r. We adopted computational techniques that assure efficient implementation on parallel computing platforms. For example, the parallel-ready methods we use for the residual computations include those that are element-matrix-based, 40 elementvector-based, 40 and sparse-matrix-based. 32 The element-vector-based methods were successfully used also by other researchers in the context of parallel computations (see for example 6,41 ). The element-vector-based computations can be based on numerical or analytical definitions of the directional derivatives involved (see 17,18 ). Provided that they can be used without a major difficulty, we prefer the analytical definitions. A mixed analytical/numerical element-vector-based (AEVB/NEVB) computation technique was introduced in 17,18 for fluid structure interaction problems in which analytical element-vector-based computations are exceedingly difficult for the coupling matrix blocks. We have developed some advanced preconditioners such as the Clustered-Elementby-Element (CEBE) preconditioner 30 and the mixed CEBE and Cluster Companion (CC) preconditioner. 30 We have also successfully implemented the CEBE preconditioner in conjunction with an ILU approximation. 32 However, our typical parallel computations are based on diagonal and nodal-block-diagonal preconditioners. These are very simple preconditioners, but are also very simple to implement on parallel platforms. More on our parallel implementations can be found in SPECIAL PRECONDITIONING TECHNIQUES Numerical simulation of fluid particle interactions were extensively investigated by Tezduyar and his associates 14 16,31,33,42 45 with the methods described in earlier sections. In this section we only illustrate an advanced preconditioning method for the linear systems that might arise in such simulations. We describe this method in the context of an Arbitrary Lagrangian Eulerian (ALE) technique, instead of the DSD/SST formulation. A structurally symmetric matrix representation of the resulting Jacobians has been proposed in. 46 These nonsingular linear systems are of the form A B E 1 B T C E 2 F 1 F 2 G u p U = f 1 f 2 f 3, (6) where the first 2 2 block is nonsingular, in which A and C are nonsymmetric and nonsingular of order n and m < n, respectively, and B is of maximal column rank m. The matrix G is nonsingular of order 3q for 2D problems, where q is the number

7 T.E. Tezduyar and A. Sameh / PARALLEL COMPUTING particles. Here, u denotes the fluid velocity at the interior finite element nodes, p denotes the pressure, and U represents the particle velocities and rotations. If we do not use a stabilized formulation then the matrix C vanishes. In this case Eq. (6) becomes a bordered nonsymmetric saddle-point problem. Solving such saddle-point problems via iterative schemes is challenging. We outline in this section a very effective nested preconditioned iterative scheme, which is also quite suitable for implementation on parallel computers (see 26 ), for handling the saddle-point problem given in Eq. (6). From the discretization of the Navier Stokes equations, and the choice of the time-step size, it can be seen that the symmetric part of A, i.e. A s = (A + A T )/2, is diagonally dominant and positive definite, and that A is positive stable. The preconditioned nested scheme in 26 is given by the following outer inner iterations: ( ) ( ) ( ) A B x b outer: solve B T =, 0 y c ( ) (7) As B inner: solve Mz = r, where M = B T. 0 The system in the outer iterations is solved with GMRES(k), while the system in the inner iterations is solved with a preconditioned Richardson scheme. There are two keys to the effectiveness of the above scheme. First is that for most of the computation A s is more dominant than the skew-symmetric part of A, A ss = (A A T )/2. Second, in the preconditioned Richardson s iteration, we only need to solve two simple linear systems of the form Âv = a, (8) (B T Â 1 B)w = b. (9) Here Â 1 is a diagonal matrix that approximates A 1 s and chosen so as to minimize the Frobenius norm of (I A s Â 1 ). Such minimization can be performed with perfect parallelism, dealing simultaneously with all columns of A s. Moreover, we have shown in 26 that ρ(i A s Â 1 ) < 1, where ρ(s) is the spectral radius of a square matrix S. Using such Â, solving Eq. (8) becomes trivial, and Eq. (9) needs to be solved via the conjugate gradient algorithm without preconditioning and with only a relaxed convergence criterion (reducing the original residual by a factor of 10 3 ). This nested scheme is quite suitable for matrix-free formulation as one only needs an effective routine for matrix vector products. The algorithm has also proven to be more effective than other preconditioned schemes in which the above inner iterations are solved with GMRES. Our scheme is more effective than using GMRES with ( ) I B M = B T (10) 0 as suggested in 47 or M = ( A 0 0 B T A 1 B as suggested in. 48 Moreover, our scheme in 26 proved to be superior to all ILU preconditioning varieties of GMRES, including its most recent version, ARMS. 49 ) (11)

8 Finite Element Methods: 1970 s and beyond 5 PARALLEL COMPUTING PLATFORMS High performance computing is vital not only for solving important problems in science and engineering disciplines, but also for advancing these disciplines. Vector processors, introduced by Seymour Cray in the late 1970 s, were the supercomputers of the day for conducting full-scale simulations in academic, government, and industry research laboratories. Parallel computers of that decade were mainly experimental with the best known being the Illiac-IV, designed by the late Daniel Slotnick at the University of Illinois, and built by the Burroughs Corporation, and installed at NASA Ames. In the early 1980 s, however, the most powerful supercomputers were the Cray C 90 and T 90 parallel architectures introduced by Cray Research, in which each CPU was a powerful vector processor. However, with the increase in speed of uniprocessors, and with the slow pace of adopting more advanced simulation models, many organizations and research laboratories found it much more economical to use clusters of PC s with various commercially available interconnection networks. Such architectures, including PC-cluster-like architectures introduced by HP and IBM, for example, were also encouraged by the introduction of the Top 500 List. This list is based on theoretical peak (in Gflops) and computational rate achieved in solving a dense linear system using Gaussian elimination (the LINPACK Benchmark). While this benchmark gives an indication of the suitability of the parallel architecture for dense matrix computation, it provides the user with very little idea about the architecture s performance on real-life applications in science and engineering. In fact, experience in industry and government laboratories on such PC-cluster architectures have shown that most real applications achieve sustained performance that cannot exceed 10% of the performance achieved on the LINPACK benchmark. The introduction of the Japanese Earth Simulator, in which the architecture combines parallelism and vector processing, alerted the US to the fact that the direction of high performance computing in the country needed more attention. The Earth Simulator is not only ranked first in the Top 500 list, but also capable of a sustained performance at 30% to 50% of its peak performance on many science and engineering applications. Ironically, the Earth Simulator architecture and system software are similar to some of the high performance computing projects discontinued in the US in the early 1990 s. This is by no means a disapproval of PC-cluster architectures. These are economical and quite suitable for code development but they are certainly not the parallel architectures that can meet the high-end computing-power needs of research and development activities. These concerns have recently prompted US agencies to sponsor studies and workshops that address the future of supercomputing and the need for revitalization of high-end computing in the US. 6 NUMERICAL EXAMPLES 6.1 Parachute soft-landing dynamics Soft landing with the aid of a retraction device reduces the landing impact for payloads delivered with parachutes. In this example, for a T 10 parachute, a pneumatic muscle actuator (PMA) placed between the suspension lines and the payload serves at the retraction device by causing rapid contraction just before landing.

9 T.E. Tezduyar and A. Sameh / PARALLEL COMPUTING The DSD/SST formulation is used with the automatic mesh moving method. The motion and deformation of the parachute is governed by the membrane and cable equations, which are solved together with the fluid mechanics equations. A blockiterative coupling method is used for solving the two systems simultaneously. The blocks of linear equation systems are solved iteratively with the GMRES update method. The computation is performed on a 41-node PC cluster in the Department of Mechanical Engineering and Materials Science at Rice University. This system has two 1.7GHz Pentium IV processors on each node sharing 2 GigaBytes of memory. Figure 1 shows the payload trajectory. For more on soft-landing simulations Figure 1: Parachute soft-landing dynamics. Payload trajectory for soft landing of a T 10 parachute during and immediately after retraction. The straight line is the trajectory that the payload would have had without the retraction. The parachutes displayed illustrate the deformations of the canopy and the cables. The length scale used in displaying the parachutes is not the same as it is for the trajectory graph. and the methods used, see A performance evaluation test for the Balance Scheme If the saddle-point problem discussed in Section 4, i.e., [ ] [ ] [ ] A B x b B T =, (12) 0 y c is reordered to minimize its bandwidth via the reverse Cuthill McKee scheme, for example, we obtain a narrow banded system that is sparse within the band. Naturally, this system is indefinite. It is also rather difficult to solve via a Krylov subspace method without an effective preconditioning strategy. As an alternative to the iterative strategy in Section 4, we describe here the Balance Scheme for iterative solution of these narrow banded systems. This scheme is also quite suitable for

10 Finite Element Methods: 1970 s and beyond parallel computers (see 25 and related work in 51 and 52 ). We show that it is superior to GMRES with ILUT preconditioning and also superior to classical direct methods for solving banded systems. Let the banded system, of order N and bandwidth ν, resulting after reordering Eq. (12) be given by Ku = h. To illustrate the main idea in this scheme (see 25 ) we partition the banded matrix of coefficients K as two coupled linear systems: [ A1 B C 2 A 2 ] u 1 µ u 2 = [ h1 h 2 ]. (13) Here A 1 and A 2 are each of order (N/2) ( ) N ν 2, and B1 and C 2 are each of order (N/2) ν. Thus, we have two underdetermined systems: [ ] [ ] E1 T u1 µ = h µ 1, E2 T = h 2, (14) where E1 T = (A 1, B 1 ) and E2 T = (C 2, A 2 ). The solution to these two systems are given as [ ] [ ] u1 µ = p µ 1 + (I P 1 )z 1, = p 2 + (I P 2 )z 2. (15) Here p i = E i (Ei T E i ) 1 h i, with i = 1, 2, are the particular solutions of system, and the orthogonal projectors P i are given as E i (Ei TE i) 1 Ei T. The particular solutions are obtained using the conjugate gradient algorithm preconditioned via the incomplete Cholesky factorization of (Ei T E i ). The vectors z i are arbitrary but need to be determined to assure that µ = µ. Imposing this condition yields the underdetermined reduced system Mz = q of order ν (N ν), where q is trivially obtained from the vectors p i, and M consists of rows of the projectors (I P i ). This reduced system can be cast in the form u 2 u 2 MM T w = q, (16) where MM T is the sum of sections of (I P i ). Note that M is not formed explicitly, rather, the system in Eq. (16) is solved via the conjugate gradient algorithm. Here one only needs to perform the multiplication of MM T by a vector. This amounts to performing multiplications of (I P i ) by a vector c, i.e., d = (I P i )c. This in turn reduces to computing the residual of the least-squares problem min d c E i d 2, again, using the conjugate gradient algorithm. Once the reduced system is solved for w, the solution vector is obtained in a straightforward way. This algorithm can be generalized to multiple partitions γ > 2, generating γ independent linear least-squares problems that can be solved in parallel for the particular solutions p i, i = 1, 2,..., γ. Naturally there is also the opportunity for parallelism in handling each least-squares problem, thus creating multi-level parallelism. For solving indefinite linear systems this algorithm proved to be superior to GMRES with ILUT preconditioning. This is illustrated via the system resulting from the finite element discretization of the partial differential equation u + α u + βu = f in Ω, u = 0 on Γ.

11 T.E. Tezduyar and A. Sameh / PARALLEL COMPUTING Using an unstructured mesh to discretize the problem domain Ω, we obtain a sparse linear system of equations that can be reordered into a narrow banded system that is sparse within the band. For certain values of α and β the resulting linear system becomes strongly indefinite. Table 1 shows that, even for a system of order as small as 3101, our algorithm (with γ = 8 partitions) proves to be more robust than GM- RES(20) with ILUT preconditioning (allowing a fill-in of up to 15 elements per row and a drop tolerance of 10 4 ). Moreover, since the reordered system results in a very Table 1: Balance GMRES α β iter r 2 iter r failure failure failure narrow band, we also compare the Balance Scheme with the classical banded system solver in the linear algebra package ScaLAPACK. 53 For this purpose, we chose a banded Toeplitz system that is sparse within the band, with the nonzero diagonals at the following distances from the main diagonal: { b l, 1, 1, b u }. The values assigned to these diagonals are: 1, 1, 1, 1. Table 2 shows, on an SGI-Origin 2000, the time consumed by the Balance Scheme (with γ = 16) and the resulting speed improvements over ScaLAPACK. We note that ScaLAPACK reaches its maximum speed when the number of processors is two. Table 2: speed improvement #of Time for over ScaLAPACK Problem processors Balance on 2 processors N = 16, seconds 1 bandwidth = seconds seconds 5 N = 32, seconds 2 bandwidth = seconds seconds 12 7 CONCLUDING REMARKS We provided a brief outline of the role of parallel computing in finite element computations. We focused on giving examples of how we address some of the computational challenges involved in parallel, 3D finite element computation of complex engineering problems. To that end, we provided our overview in the context of computational fluid mechanics, with emphasis on moving boundaries and interfaces. This is a complex but important class of problems that includes fluid object and

12 Finite Element Methods: 1970 s and beyond fluid structure interactions and free-surface and two-fluid flows. In developing the methods needed to address the computational challenges involved in this class of problems, we made sure that 3D computations with these methods could be carried out efficiently on parallel computers. For example, in designing the mesh update methods needed to accommodate the changes in the spatial domain occupied by the fluid, we made sure that parallel efficiency is not disturbed. Similarly, maintaining the parallel efficiency was one of the key principles we observed in developing the iterative methods needed to solve the large, coupled equation systems. To demonstrate how the methods developed can be applied to complex engineering problems, we presented a numerical simulation involving aerodynamics and fluid structure interactions. We also presented a performance evaluation test for a preconditioned iterative method for banded systems. ACKNOWLEDGMENTS The first author was supported by the US Army Natick Soldier Center (Contract No. DAAD16-03-C-0051), NSF (Grant No. EIA ), and NASA Johnson Space Center (Grant No. NAG9-1435). The second author was supported by NSF (Grant No. CCR and ERC ). REFERENCES [1] S.L. Johnsson and K.K. Mathur, Experience With the Conjugate Gradient Method for Stress Analysis on a Data Parallel Supercomputer, International Journal for Numerical Methods in Engineering, 27 (1989) [2] K.K. Mathur and S.L. Johnsson, The finite element method on a data parallel computing system, International Journal of High Speed Computing, 1 (1989) [3] S.L. Johnsson and K.K. Mathur, Data Structures and Algorithms for the Finite Element Method on a Data Parallel Supercomputer, International Journal for Numerical Methods in Engineering, 29 (1990) [4] Z. Johan, K.K. Mathur, S.L. Johnsson, and T.J.R. Hughes, An efficient communications strategy for finite element methods on the Connection Machine CM-5 system, Computer Methods in Applied Mechanics and Engineering, 113 (1994) [5] Z. Johan, K.K. Mathur, S.L. Johnsson, and T.J.R. Hughes, Scalability of finite element applications on distributed-memory parallel computers, Computer Methods in Applied Mechanics and Engineering, 119 (1994) [6] Z. Johan, K.K. Mathur, S.L. Johnsson, and T.J.R. Hughes, A case study in parallel computation: Viscous flow around an Onera M6 wing, International Journal for Numerical Methods in Fluids, 21 (1995) [7] T.E. Tezduyar, M. Behr, S. Mittal, and A.A. Johnson, Computation of unsteady incompressible flows with the finite element methods space time formulations, iterative strategies and massively parallel implementations, in New

13 T.E. Tezduyar and A. Sameh / PARALLEL COMPUTING Methods in Transient Analysis, PVP-Vol.246/AMD-Vol.143, ASME, New York, (1992) [8] T. Tezduyar, S. Aliabadi, M. Behr, A. Johnson, and S. Mittal, Parallel finiteelement computation of 3D flows, Computer, 26 (1993) [9] T.E. Tezduyar, S.K. Aliabadi, M. Behr, and S. Mittal, Massively parallel finite element simulation of compressible and incompressible flows, Computer Methods in Applied Mechanics and Engineering, 119 (1994) [10] C. Farhat, L. Fezoui, and S. Lanteri, Two-dimensional viscous flow computations on the CM-2: Unstructured meshes, upwind schemes and massively parallel computations, Computer Methods in Applied Mechanics and Engineering, 102 (1993) [11] C. Farhat and S. Lanteri, Simulation of compressible viscous flows on a variety of MPPs: Computational algorithms for unstructured dynamic meshes and performance results, Computer Methods in Applied Mechanics and Engineering, 119 (1994) [12] C. Farhat, M. Lesoinne, and N. Maman, Mixed explicit/implicit time integration of coupled aeroelastic problems: Three-field formulation, geometric conservation and distributed solution, International Journal for Numerical Methods in Fluids, 21 (1995) [13] C. Özturan, H.L. decougny, M.S. Shephard, and J.E. Flaherty, Parallel adaptive mesh refinement and redistribution on distributed memory computers, Computer Methods in Applied Mechanics and Engineering, 119 (1994) [14] T.E. Tezduyar, Stabilized finite element formulations for incompressible flow computations, Advances in Applied Mechanics, 28 (1992) [15] T.E. Tezduyar, M. Behr, and J. Liou, A new strategy for finite element computations involving moving boundaries and interfaces the deformingspatial-domain/space time procedure: I. The concept and the preliminary numerical tests, Computer Methods in Applied Mechanics and Engineering, 94 (1992) [16] T.E. Tezduyar, M. Behr, S. Mittal, and J. Liou, A new strategy for finite element computations involving moving boundaries and interfaces the deformingspatial-domain/space time procedure: II. Computation of free-surface flows, two-liquid flows, and flows with drifting cylinders, Computer Methods in Applied Mechanics and Engineering, 94 (1992) [17] T.E. Tezduyar, Finite element methods for flow problems with moving boundaries and interfaces, Archives of Computational Methods in Engineering, 8 (2001)

14 Finite Element Methods: 1970 s and beyond [18] T.E. Tezduyar, Finite element methods for fluid dynamics with moving boundaries and interfaces, in E. Stein, R. De Borst, and T.J.R. Hughes, editors, Encyclopedia of Computational Mechanics, Volume 3: Fluids, Chapter 17, John Wiley & Sons, [19] T.J.R. Hughes and A.N. Brooks, A multi-dimensional upwind scheme with no crosswind diffusion, in T.J.R. Hughes, editor, Finite Element Methods for Convection Dominated Flows, AMD-Vol.34, 19 35, ASME, New York, [20] A.N. Brooks and T.J.R. Hughes, Streamline upwind/petrov-galerkin formulations for convection dominated flows with particular emphasis on the incompressible Navier-Stokes equations, Computer Methods in Applied Mechanics and Engineering, 32 (1982) [21] T.E. Tezduyar and T.J.R. Hughes, Finite element formulations for convection dominated flows with particular emphasis on the compressible Euler equations, in Proceedings of AIAA 21st Aerospace Sciences Meeting, AIAA Paper , Reno, Nevada, (1983). [22] T.J.R Hughes and T.E. Tezduyar, Finite element methods for firstorder hyperbolic systems with particular emphasis on the compressible Euler equations, Computer Methods in Applied Mechanics and Engineering, 45 (1984) [23] T.E. Tezduyar, S. Mittal, S.E. Ray, and R. Shih, Incompressible flow computations with stabilized bilinear and linear equal-order-interpolation velocitypressure elements, Computer Methods in Applied Mechanics and Engineering, 95 (1992) [24] T.J.R. Hughes, L.P. Franca, and M. Balestra, A new finite element formulation for computational fluid dynamics: V. Circumventing the Babuška Brezzi condition: A stable Petrov Galerkin formulation of the Stokes problem accommodating equal-order interpolations, Computer Methods in Applied Mechanics and Engineering, 59 (1986) [25] G. Golub, A. Sameh, and V. Sarin, A parallel balance scheme for banded linear systems, Num. Lin. Alg. Appl., 8 (2001) [26] A. Baggag and A. Sameh, A nested iterative scheme for indefinite linear systems in particulate flows, to appear in Computer Methods in Applied Mechanics and Engineering. [27] T.J.R. Hughes and G.M. Hulbert, Space time finite element methods for elastodynamics: formulations and error estimates, Computer Methods in Applied Mechanics and Engineering, 66 (1988) [28] T. Tezduyar, S. Aliabadi, and M. Behr, Enhanced-Discretization Interface- Capturing Technique (EDICT) for computation of unsteady flows with interfaces, Computer Methods in Applied Mechanics and Engineering, 155 (1998)

15 T.E. Tezduyar and A. Sameh / PARALLEL COMPUTING [29] T.E. Tezduyar and J. Liou, Grouped element-by-element iteration schemes for incompressible flow computations, Computer Physcis Communications, 53 (1989) [30] T.E. Tezduyar, M. Behr, S.K. Aliabadi, S. Mittal, and S.E. Ray, A new mixed preconditioning method for finite element computations, Computer Methods in Applied Mechanics and Engineering, 99 (1992) [31] T. Tezduyar, S. Aliabadi, M. Behr, A. Johnson, V. Kalro, and M. Litke, Flow simulation and high performance computing, Computational Mechanics, 18 (1996) [32] V. Kalro and T. Tezduyar, Parallel iterative computational methods for 3D finite element flow simulations, Computer Assisted Mechanics and Engineering Sciences, 5 (1998) [33] T.E. Tezduyar, CFD methods for three-dimensional computation of complex flow problems, Journal of Wind Engineering and Industrial Aerodynamics, 81 (1999) [34] T. Tezduyar and Y. Osawa, Methods for parallel computation of complex flow problems, Parallel Computing, 25 (1999) [35] D.R. Lynch, Wakes in liquid-liquid systems, Journal of Computational Physics, 47 (1982) [36] A. Masud and T.J.R. Hughes, A space time Galerkin/least-squares finite element formulation of the Navier-Stokes equations for moving domain problems, Computer Methods in Applied Mechanics and Engineering, 146 (1997) [37] K. Stein, T. Tezduyar, and R. Benney, Mesh moving techniques for fluid structure interactions with large displacements, Journal of Applied Mechanics, 70 (2003) [38] T. Tezduyar, Finite element interface-tracking and interface-capturing techniques for flows with moving boundaries and interfaces, in Proceedings of the ASME Symposium on Fluid-Physics and Heat Transfer for Macro- and Micro-Scale Gas-Liquid and Phase-Change Flows (CD-ROM), ASME Paper IMECE2001/HTD-24206, ASME, New York, New York, (2001). [39] Y. Saad and M. Schultz, GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM Journal of Scientific and Statistical Computing, 7 (1986) [40] T. Tezduyar, S. Aliabadi, M. Behr, A. Johnson, V. Kalro, and M. Litke, High performance computing techniques for flow simulations, in M. Papadrakakis, editor, Solving Large-Scale Problems in Mechanics: Parallel Solution Methods in Computational Mechanics, Chapter 10, , John Wiley & Sons, 1997.

16 Finite Element Methods: 1970 s and beyond [41] Z. Johan, T.J.R. Hughes, and F. Shakib, A globally convergent matrix-free algorithm for implicit time-marching schemes arising in finite element analysis in fluids, Computer Methods in Applied Mechanics and Engineering, 87 (1991) [42] A.A. Johnson and T.E. Tezduyar, Simulation of multiple spheres falling in a liquid-filled tube, Computer Methods in Applied Mechanics and Engineering, 134 (1996) [43] A.A. Johnson and T.E. Tezduyar, 3D simulation of fuid-particle interactions with the number of particles reaching 100, Computer Methods in Applied Mechanics and Engineering, 145 (1997) [44] A.A. Johnson and T.E. Tezduyar, Advanced mesh generation and update methods for 3D flow simulations, Computational Mechanics, 23 (1999) [45] A. Johnson and T. Tezduyar, Methods for 3D computation of fluid-object interactions in spatially-periodic flows, Computer Methods in Applied Mechanics and Engineering, 190 (2001) [46] M. Knepley, V. Sarin, and A. Sameh, Parallel simulation of particulate flows, in Lecture Notes in Computer Science, number 1457, , Springer, [47] I. Perugia and V. Simoncini, Block-diagonal and indefinite symmetric preconditioners for mixed finite element formulations, Num. Lin. Alg. Appl., 7 (2000) [48] M. Murphy, G. Golub, and A. Wathen, A note on preconditioning for indefinite linear systems, SIAM J. Sci. Comput., 21 (2000) [49] Y. SaaD and B. Suchomel, Arms: an algebraic recursive multi-level solver for general sparse linear systems, Num. Lin. Alg. Appl., 9 (2000) [50] K. Stein, T. Tezduyar, S. Sathe, M. Senga, C. Ozcan, T. Soltys, V. Kumar, R. Benney, and R. Charles, Simulation of parachute dynamics during control line input operations, in Proceedings of the 17th AIAA Aerodynamic Decelerator Systems Technology Conference, AIAA Paper , Monterey, California, (2003). [51] A. Sameh and V. Sarin, Parallel algorithms for indefinite linear systems, Parallel Computing, 28 (2002) [52] A. Sameh and V. Sarin, Hybrid parallel linear system solvers, Int. J. of Comp. Fluid Dynamics, 12 (1999) [53] L. Blackford et al., ScaLAPACK User s Guide. SIAM, Philadelphia, 1997.

Advanced Mesh Update Techniques for Problems Involving Large Displacements

WCCM V Fifth World Congress on Computational Mechanics July 7,, Vienna, Austria Eds.: H.A. Mang, F.G. Rammerstorfer, J. Eberhardsteiner Advanced Mesh Update Techniques for Problems Involving Large Displacements