IMPLEMENTATION OF IMPLICIT FINITE ELEMENT METHODS FOR INCOMPRESSIBLE FLOWS ON THE CM-5

Size: px
Start display at page:

Download "IMPLEMENTATION OF IMPLICIT FINITE ELEMENT METHODS FOR INCOMPRESSIBLE FLOWS ON THE CM-5"

Transcription

1 Computer Methods in Applied Mechanics and Engineering, (1994) 1 IMPLEMENTATION OF IMPLICIT FINITE ELEMENT METHODS FOR INCOMPRESSIBLE FLOWS ON THE CM-5 J.G. Kennedy Thinking Machines Corporation 245 First Street Cambridge, MA 02142, USA M. Behr, V. Kalro, and T.E. Tezduyar AEM/AHPCRC Supercomputer Institute, University of Minnesota, 1200 Washington Avenue South, Minneapolis, MN 55415, USA March 20, 1994 Revised: March 27, 1994 Abstract A parallel implementation of an implicit finite element formulation for incompressible fluids on a distributed-memory massively parallel computer is presented. The dominant issue that distinguishes the implementation of finite element problems on distributed-memory computers from that on traditional shared-memory scalar or vector computers is the distribution of data (and hence workload) to the processors and the nonuniform memory hierarchy associated with the processors, particularly the nonuniform costs associated with on-processor and off-processor memory references. Accessing data stored in a remote processor requires computing resources an order of magnitude greater than accessing data locally in a processor. This distribution of data motivates the development of alternatives to traditional algorithms and data structures designed for shared-memory computers, which must now account for distributed-memory architectures. Data structures as well as data decomposition and data communication algorithms designed for distributed-memory computers are presented in the context of high level language constructs from High Performance Fortran. The discussion relies primarily on abstract features of the hardware and software environment and should be applicable, in principle, to a variety of distributed-memory systems. The actual implementation is carried out on a Connection Machine CM-5 system with high performance communication functions. 1. Introduction Distributed-memory, massively parallel computers are emerging as significant competitors to traditional vector supercomputers in the area of large-scale computational fluid dynamics. Fluid problems are particularly well suited for these parallel computers due to large, regular data sets since parallelization occurs over large uniform data structures. The continuing trend in fluid simulations toward significantly larger data sets as well as the need for shorter solution times leads naturally to distributed-memory massively parallel computers. Parallel

2 Computer Methods in Applied Mechanics and Engineering, (1994) 2 computers offer the potential for both higher sustained computational performance as well as substantially larger memory capacities than traditional vector computers. This study focuses on a finite element formulation for the problem of an incompressible, viscous fluid. In spite of offering highly parallel data structures, finite element methods pose a challenge for distributed-memory parallel machines as a result of the irregular data communication patterns that arise in the context of unstructured meshes. The earliest implementations of finite element methods on parallel computers relied on a message passing programming model coupled with domain decomposition constructs [1 3]. Domain decomposition constructs subdivide the physical domain into N p subdomains, one for each of the N p processors, each subdomain typically consisting of a spatially contiguous set of elements. The subdomains then communicate data only through elements on the subdomain boundaries. More recently, data parallel implementations of finite element methods emerged [4 9]. Johan et al. [10] coupled traditional notions of domain decomposition with a data parallel finite element implementation. A variety of methods have been used to construct the element subdomains to maintain favorable load balancing characteristics and low network communication requirements. The recursive spectral bisection (RSB) algorithm due to Pothen et al. [11] and Simon [12] provides a systematic and robust methodology for domain partitioning. Johan et al. [10] provided the first parallel implementation of RSB for unstructured meshes. Behr et al. [9] provide parallel implementation constructs for two incompressible flow formulations (based on velocity-pressure, and stress-velocity-pressure as primary variables) along with two-dimensional flow simulations. These parallel implementation constructs are extended here to include a detailed discussion of the the GMRES solver, including a comparison between a traditional matrix-based algorithm and a matrix-free algorithm, domain decomposition and three-dimensional flow simulations. In addition, the issues associated with the coupling between domain decomposition and gather-scatter communication performance are discussed. The domain decomposition strategy is based on the parallel implementation of RSB provided in Johan et al. [10]. The paper is organized as follows. An abstraction of the hardware and software characteristics of distributed-memory massively parallel computers is provided in Section 2. A statement of the finite element problem is provided in Section 3. The parallel implementation of this problem is presented in Section 4. Data decomposition and associated communication issues are discussed in Section 5. Numerical simulations are presented in Section 6. Finally, conclusions are provided in Section Parallel Computer Model Implementation of the finite element method discussed here is carried out on a Connection Machine CM-5 system in the data parallel language Connection Machine Fortran (CMF). The discussion here relies primarily on general features of both the CM-5 and CMF. The CM-5 is a distributed memory, massively parallel computer system. Like the emerging High Performance Fortran (HPF) standard, CMF is a language based on Fortran 90 with additional data layout compiler directives. In principle, the discussion here applies to other distributed memory, massively parallel computer systems using other programming models and languages. Programming language specifics discussed here are immediately accessible

3 Computer Methods in Applied Mechanics and Engineering, (1994) 3 from both CMF and HPF. The primary programming language constructs used here are data distribution or data layout constructs used to distribute Fortran array elements to memory within the distributed processors 2. The syntax :SERIAL and :PARALLEL are used here to denote serial (in local processor memory) and parallel (across processor memory) array dimensions. For example, consider the arrays A, B, and C on an N p processor machine with cyclic data layout as follows: REAL A(N p ),B(3 N p ),C(5,N p ) CMPLR$ LAYOUT A(:PARALLEL), B(:PARALLEL) CMPLR$ LAYOUT C(:SERIAL, :PARALLEL). Array A has a single parallel dimension whose number of entries matches the number of processors. Cyclic layout of the parallel dimension places a single entry of A in each processor. Array B on the other hand has three times as many entries as there are processors. Cyclic layout of B places B(1 : N p ) one per processor. Similarly, B(N p +1 : 2 N p )andb(2 N p +1 : 3 N p ) are distributed one per processor such that each processor is assigned three entries of B. Array C on the other hand has both a serial and a parallel axis. The parallel axis of C is distributed identically to that in A. The serial axis of C is distributed such that, for the k th parallel entry of C, a serial vector of length 5 is placed in the processor associated with the k th parallel axis entry of C. Further discussion of these constructs may be found in [13, 14]. For the case in which the syntax CMPLR$ LAYOUT C(:SERIAL, :PARALLEL) is assumed to infer cyclic distribution of data along parallel axes, the equivalent syntax in HPF is!hpf$ DISTRIBUTE C(, CYCLIC). Currently CMF supports only block layout (described in [13]). In the case in which the syntax CMPLR$ LAYOUT C(:SERIAL, :PARALLEL) is assumed to infer block distribution of data, the equivalent syntax in HPF is whereas the equivalent case in CMF is!hpf$ DISTRIBUTE C(, BLOCK), CMF$ LAYOUT C(:SERIAL, :NEWS). For either block or cyclic data distribution, parallel array operations may be invoked using simple array syntax in Fortran 90. For example, the expression A(:) = A(:) + 3/C(4, :), where : denotes do for all entries of the axis and invokes the assignment statement in parallel, for all parallel entries in A and C simultaneously. 2 InthecaseoftheCM-5,thetermprocessor is used to infer a single vector unit. There are four vector units per processing node on a CM-5. On parallel architectures composed of processing nodes containing only one vector or superscalar processor, the term processor is unambiguous.

4 Computer Methods in Applied Mechanics and Engineering, (1994) 4 3. Finite Element Formulation Here we consider the isothermal transient response of an incompressible fluid. The initial/boundary-value problem is represented in Box 1 where u is the velocity, p denotes pressure, ρ is the density, σ is the Cauchy stress, f is the body force and g and h are the Dirichlet and Neumann boundary condition values, respectively, enforced on the subsets of the boundary Γ t of the possibly evolving domain Ω t. In the case of the fixed domain, the subscript t denoting time on the domains is dropped. The stress response is assumed to be Newtonian, characterized by the fluid viscosity µ. 1. Momentum Balance on Ω t ( ) u ρ t + u u f σ = 0 2. Mass Balance (Incompressible) on Ω t u =0 3. Initial and Boundary Conditions u = g on (Γ t ) g σ n = h on (Γ t ) h u(x, 0) = u 0 on Ω 0 4. Stress Response (Newtonian) σ = pi + T, T =2µε(u) ε(u) = 1 ( ) u + u T 2 Box 1: Initial/Boundary-Value Problem. A stabilized, space-time, velocity-pressure formulation is then summarized in Box 2. Here, (, ), (, ) Q e n and (, ) Ωn denote L 2 inner products over the space-time slab Q n,the single space-time slab element Q e n and the spatial domain Ω n, respectively. The surface P n is traced by Γ t as t traverses the time interval associated with slab n and (P n ) h is the subset of P n corresponding to (Γ t ) h. The ( ) + n and ( ) n denote the values of a variable at level n as it is approached from the top and the bottom, respectively. The Q h and V h are suitable spaces for pressure and velocity functions, and τ MOM and τ CONT are stabilization parameters. Further details on this formulation, although not central to the discussions here, may be found in Tezduyar et al. [15, 16]. The space-time formulation is used in the next section because of its notational simplicity, but the parallel implementation issues are the same for a semi-discrete formulation, in which the jump term (Box 2, item 2, term 4 on right-hand side) is dropped, the integration takes place over the spatial domain only, and

5 Computer Methods in Applied Mechanics and Engineering, (1994) 5 1. Finite Element Form B(p h, u h ; q h, v h )=F (q h, v h ) (q h, v h ) Q h V h 2. B(p h, u h ; q h, v h ) B(p h, u h ; q h, v h ) = ( u h t + u h u h, ρv h ) Q n + ( σ(p h, u h ), ε(v h ) ) Q n + ( ρ u h,q h) Q n + ( ) (u h ) + n (u h ) n,ρ(v h ) + n Ω n (n el ) n ] + ([ρ( uh t + uh u h ) σ(p h, u h ), e=1 ]) 1 τ MOM [ρ( vh ρ t + uh v h ) σ(q h, v h ) + (n el ) n e=1 ( τcont u h,ρ v h) Q e n Q e n 3. F (q h, v h ) F (q h, v h ) = ( f,ρv h) Q n + ( h, v h) (P n) h (n el ) n + e=1 ]) (f,τ MOM [ρ( vh t + uh v h ) σ(q h, v h ) Q e n Box 2: Stabilized u p Space-Time Finite Element Formulation. the time derivatives are replaced by appropriate expansions. The Galerkin/least-squares problem like the one shown in Box 2 will lead to a nonlinear coupled system of equations: N (d n )=F, (1) where d n is the vector of unknowns associated with marching from time step n 1tonin a semi-discrete formulation, or associated with time slab n in a space-time formulation. For the nonlinear system of equations (1), the Newton-Raphson iterations N d each require the solution of a linear equation system ( ) ( ) d k n = F N d k n, (2) d k n A k nx k n = R k n, (3)

6 Computer Methods in Applied Mechanics and Engineering, (1994) 6 where k is the nonlinear iteration counter, A k n = N/ d d k n is the nonsymmetric Jacobian operator, x k n = d k n is the vector of increments for unknown solution values and R k n = F N ( dn) k is the vector of residuals. When discussing the process of the solution of the linear equation system (3), the sub- and superscripts identifying the time step and nonlinear iteration will be dropped, as only one such system is solved at a given time. An outline of the implicit solution to the finite element problem is shown in Box Preprocessing and initial conditions 2. PARTITION data to processors 3. a. Time step loop (n start =0) b. Nonlinear iteration loop (k start =0) 4. GATHER nodal x k n to elements 5. FORM element matrices and residuals A e,k n 6. SCATTER R e,k n to assembled R k n 7. SOLVE A k nx k n = R k n 8. a. End k loop (k k +1,goto3b) b. End n loop (n n +1,goto3a) 9. Postprocessing and visualization Box 3: Outline of Finite Element Solution., Re,k n 4. Parallel Implementation Here, the global programming model described in Section 2 is used to implement the finite element method. The key features of the current finite element implementation on a distributed-memory massively parallel computer are (1) constructing data structures which circumvent unneeded communication of data between processors, (2) mapping the data associated with these data structures to the processors in a manner which efficiently exploits data locality, (3) using efficient gather and scatter algorithms which distinguish on-processor and off-processor data transfers and (4) maintaining favorable load balancing and scaling properties. Two naturally parallel data structures emerge from the finite element problem: the first associated with the FORM phase (element-ordered data set corresponding to Step 5, the FORM step, in Box 3), the second associated with the SOLVE phase (node-ordered data structure corresponding to Step 7, the SOLVE step, in Box 3). Using the serial and parallel layout constructs described in Section 2, the element-level residual vector R e and its global counterpart R are represented in these two data structures as shown in Box 4, where n dof is the number of degrees of freedom per node, n en is the number of local nodes residing in an element, n nodes is the number of global nodes and n el is the number of elements. The idea is to construct a parallel array axis of length n el for the FORM data structure and n nodes for the SOLVE data structure. Element information in a FORM array associated with each element is then accumulated by indexing along the serial dimension(s) of the array. Similarly, node information in a SOLVE array associated with each node is also accumulated by indexing along the serial dimension(s).

7 Computer Methods in Applied Mechanics and Engineering, (1994) 7 1. FORM Element Based REAL R e (n dof,n en, n el ) CMPLR$ LAYOUT R e (:SERIAL, :SERIAL, :PARALLEL ) 2. SOLVE Node Based REAL R(n dof, n nodes ) CMPLR$ LAYOUT R(:SERIAL, :PARALLEL ) 3. Communication R e (n dof,n en, n el ) R(n dof, n nodes ) gather / scatter Box 4: FORM and SOLVE Data Structures. These FORM and SOLVE data structures exhibit natural parallelism in that they enable the FORM step and a number of phases of the SOLVE step of the solution outlined in Box 3 to take place in parallel without communication between processors. With these two data structures, communication between processors within the time step loop occurs predominantly due to communication between the FORM and SOLVE data structures. That is, communication occurs predominantly in the GATHER and SCATTER steps. Pseudo-code evaluating the boxed terms in Box 2 for the FORM phase is shown in Box 5. Note that the repeated indices imply summation, j σ is an index of the stress tensor component, and that i sd identifies the space dimension. Here, integration over the space-time slabs Q e n is taken as the usual sum over quadrature points. That is, n int χ(:) dq = [χ(:)] l J l (:)w l, (4) Q e n where n int is the number of integration points, J l is the determinant of the Jacobian of the finite element mapping, and w l is the weight. Here, and in Box 5 pseudocode, : implies do for all elements i el =1:n el in a FORM -based array. By definition of the CMPLR$ LAYOUT constructs, for a given element i el, the element-level vector R e (1 : n dof, 1:n en,i el )of n dof n en components resides in the memory of processor p(i el ), where p(i el ) is a mapping provided by the compiler. Furthermore, an element-level vector v e (1 : m, i el ), for any m along a serial dimension and i el along a parallel dimension with identical extent 1:n el as the one in R e, resides in the memory of the same processor p(i el ). Consequently, R e (1 : n dof, 1:n en,i el ) and v e (1 : m, i el ) reside in the same (virtual) processor for each i el [1,n el ]. With this in mind, it is evident from Box 5 that no inter-processor communication occurs in the FORM phase. The SOLVE phase on the other hand does require communication. A summary of the GMRES algorithm used in the SOLVE phase is shown in Box 6. All quantities in the SOLVE phase are stored in the SOLVE data structure with the exception of the element-level Jacobian matrices a e (and, as a result, two element-level vectors required to interact with a e ) which are stored on the element level for performance reasons. From the l

8 Computer Methods in Applied Mechanics and Engineering, (1994) 8 1. B ff" Formation B ff" (p h, u h ; q h, v h ):= σ(p h, u h ):ε(v h )dq Q n B ff" comprises of element-level contributions: B e (:) = σ(j ff" σ, :) ε(j σ, :)dq Q e n 2. B t Formation B t (p h, u h ; q h, v h ):= ρ Qn uh t vh dq B t comprises of element-level contributions: Bt e (:) = ρ(:) u(i sd, :) v(i sd, :) dq Q e n Box 5: Pseudo Code: FORM Phase. perspective of a parallel implementation, the SOLVE phase is comprised primarily of dot products (α = p q), SAXPY operations (p = p + αq), matrix-vector products (q = Ap) and a preconditioning step. Here, only diagonal preconditioning is considered such that the preconditioning step requires strictly inexpensive on-processor operations, with computing costs not significantly beyond that of a dot product or a SAXPY operation. Pseudo code for such steps of the SOLVE phase is shown in Box 7. The dominant computational portion of the GMRES algorithm is the matrix-vector product. Item 3 in Box 7 highlights a matrix-vector product (q = Ap) scheme which consists of three steps: (1) a gather of p to p e on the element level, (2) an on-processor matrix-vector product (q e = a e p e )involving no inter-processor communication and (3) a scatter of q e to q on the global assembled level. In the gather and scatter steps, iconn(1 : n en, 1:n el ) is the nodal connectivity array. This matrix-vector product scheme was initially proposed by Johnsson and Mathur [17] and demonstrates favorable performance characteristics on Connection Machine systems. Note that communication in the SOLVE phase occurs in the global sums within dot products and in the gather/scatter steps of the matrix-vector product, the latter being the dominant communication steps. Expressed in terms of the the High Performance Fortran FORALL construct, the gather step may be expressed in the form FORALL (i dof =1:n dof,i en =1:n en,i el =1:n el ) v e (i dof,i en,i el )=v(i dof,iconn(i en,i el )). (5) A scatter on the other hand must account for collisions of data at the destination and hence takes the form DO i dof =1,n dof FORALL (i node =1:n nodes ) v(i dof,i node )=v(i dof,i node )+SUM (v e (i dof, 1:n en, :), MASK = iconn(1 : n en, :).EQ.i node ) END DO. (6)

9 Computer Methods in Applied Mechanics and Engineering, (1994) 9 DO l =1,n outer GMRES outer iterations r 0 := R Ax 0 compute initial residual β := r 0 2 compute initial residual norm v 1 = r 0 /β define first Krylov vector DO j = 1,m GMRES inner iteration z j := M 1 j v j preconditioning step w := Az j matrix-vector product DO i = 1,j Gramm-Schmidt orthogonalization h i,j := (w, v i ) w := w h i,j v i END DO h j+1,j := w 2 v j+1 := w/h j+1,j define next Krylov vector END DO H := {h i,j } define reduced system matrix y := argminŷ βe 1 Hŷ 2 solve reduced system x := x 0 + m i=1 y iz i form approximate solution IF βe 1 Hy 2 ɛ EXIT convergence check x 0 := x restart END DO Box 6: GMRES algorithm: Algorithm Summary. 1. Dot Product: α = p q 2. SAXPY: p = p + αq α = SUM(p(i dof, :) q(i dof, :)) p(i dof, :) = p(i dof, :) + α q(i dof, :) 3. Matrix-Vector Multiply: q = A p p e (i dof,i en, :) = p(i dof,iconn(i en, :)) (Gather) q e (i dof,i en, :) = a e (i dof,i en,j dof,j en, :) p e (j dof,j en, :) (Local Mult) q(i dof,iconn(i en, :)) = q e (i dof,i en, :) (Scatter) [Add Collisions] Box 7: Pseudo Code: SOLVE Phase. In the numerical implementation, for performance reasons, the gather/scatter steps are implemented on the CM-5 using high performance communication algorithms which replace

10 Computer Methods in Applied Mechanics and Engineering, (1994) 10 the FORALL statements above with single function calls. The gather and scatter steps are discussed further in the following section Ax = N d x N N(d + εx) N(d) x d ε R ε = R(d + εx) =F N(d + εx), R = R(d) =F N(d) Ax = R ε R ε Box 8: Matrix-Free Linearization. An alternative matrix-free GMRES solution scheme may be used based on a matrix-free linearization of the residual as is represented in Box 8. In the matrix-free linearization, which is due to Johan [18], the linear part of the residual represented as a matrix-vector product Ax is approximated by (R ε R)/ε, wherer ε = R(d + εx), R = R(d), and ε is a suitably small number [18]. As a result, in the parallel implementation, the GMRES algorithm differs only in replacing the matrix-vector product in the above algorithm with this simple difference formula between the residuals. In particular, the above three-step matrix-vector product is replaced by the steps (1) gather the current solution vector d to the element level d e,(2) FORM the updated element-level residual R e ε on the element level (without inter-processor communication) based upon d + εx and (3) scatter R e ε to the global assembled level and perform the difference formula. That is, from a computational perspective, this scheme differs primarily from the matrix-vector product scheme in that the on-processor matrixvector product is replaced with formation of the element-level residual R e ε. Notice that the element-level Jacobian matrices a e from Box 7 need not be stored in the matrix-free case, resulting in substantial memory savings since storage of these matrices dominate the memory requirements in the original formulation. This memory savings is accompanied by additional on-processor computational requirements, however, since computing the residual R e ε on subsequent GMRES iterations typically requires greater on-processor computations than does the on-processor matrix-vector multiply discussed above. A comparison of the matrix-free and original GMRES solver is provided in Data Decomposition and Communication Partitioning of the data associated with the FORM and SOLVE data structures into groups, each group being associated with a single processor of the parallel computer, is used to increase the efficiency of the GATHER and SCATTER steps by attempting to

11 Computer Methods in Applied Mechanics and Engineering, (1994) 11 minimize the off-processor communication in these communication steps, taking maximum advantage of data locality. A parallel implementation of the RSB algorithm is used to decompose and distribute the element data ( FORM data structure) to the processors based on the modal analysis of the graph of the connectivity array describing the connectivity between the elements (dual connectivity). The RSB algorithm, with origins due to Pothen et al. [11] and Simon [12], provides a robust, systematic tool for generating efficient data decompositions in parallel. The parallel implementation of the RSB algorithm used here is due to Johan [18], and is available in the Connection Machine Scientific Software Library. The data decomposition generated by the bisection algorithm is exploited in efficient gather and scatter communication algorithms which account for locality of data residing in a given processor by breaking each communication step (gather or scatter) into two distinct phases an on-processor communication step (with communication speeds on the order of the memory bandwidth) and an off-processor communication step (with communication rates on the order of the network bandwidth). Such a two-step algorithm is natural within a message passing programming model. The data parallel implementation of the two-step algorithm is more subtle due to the high-level language constructs. The data parallel twostep algorithm used here is due to Johan et al. [10] and exhibits favorable load balancing and scaling properties for large classes of problems. Performance advantages which arise due to the data decomposition and two-step communication strategies are a result of the amount of data gathered (or scattered) from the surface elements in one partition to that in another partition (hereafter referred to as surface data) relative to the amount of data gathered (or scattered) within the internal volume of an element partition (hereafter referred to as volume data). Provided the mesh partitioning algorithm provides suitably nice, contiguous element groups, the ratio of surface data to volume data becomes small as the number of elements in typical partitions becomes large. Hence, the amount of surface data communicated at network bandwidth speeds is small relative to the amount of volume data communicated at memory bandwidth speeds. The relative amounts of surface data and volume data in a mesh and hence the performance improvements available from the mesh partitioning and communication schemes is dependent on mesh geometry. Three-dimensional meshes typically exhibit more favorable surface data to volume data ratios than do two-dimensional meshes and hence experience more pronounced speed-ups from the data decomposition strategies. To illustrate this, it is useful to look in detail at the amount of surface and volume data which exists first within a general finite element mesh and next within simple illustrative meshes. To begin, assume that the distribution of the global nodes to the processors in the SOLVE data structure are such that (1) nodes internal to the element partition (nodes not on the element partition boundary) are assigned to the processor associated with that element partition and (2) nodes on the element partition boundary (nodes associated with surface data) are assigned such that two element partitions sharing a set of nodes receive a random subset of those shared nodes. Such a node distribution is in fact the one used in the mesh partitioning scheme used here. With this in mind, for this discussion, it is reasonable to characterize the amount of data sent off-processor from an element partition in the gather or scatter operation (equa- Obtaining a true minimum is an NP complete (i.e. intractable) problem.

12 Computer Methods in Applied Mechanics and Engineering, (1994) 12 tions (5) and (6)) as roughly half the data associated with the partition boundary nodes. Consequently, the number of array elements of v(1 : n dof, 1:n nodes )sentoff-processorfrom a single element partition is roughly half the number of partition boundary nodes times n dof. The two-step scatter described in [10] is composed of (1) a scatter of element data within a partition to an intermediate set of (pseudo-global) nodes local to that partition and (2) a scatter of the data associated with this intermediate set of nodes to the global nodes. The first step involves strictly on-processor data motion (n dof n en n partition el words per partition, where n partition el is the number of elements in the partition). The second step involves both on-processor (n dof n Vpartition nodes + 1n 2 dof n partition np words per partition, where n Vpartition np := n partition np n partition np ) and off-processor data motion ( 1n 2 dof n partition np words per partition). Here, n partition np is the number of nodes in the partition and n partition np is the number of nodes on the partition boundary. Next, consider the four simple meshes shown in Figure 1: (1) a two-dimensional square mesh of quadrilateral elements (4 nodes per element, N N elements), (2) a two-dimensional square mesh of triangular elements (3 nodes per element, 2 N N elements), (3) a threedimensional cubic mesh of brick elements (8 nodes per element, N N N elements) and (4) a three-dimensional cubic mesh of tetrahedral elements (4 nodes per element, 5 N N N elements). We constrain the tetrahedral mesh to be that generated from the brick mesh by decomposing each brick into 5 tetrahedral elements containing only those nodes which exist in the brick elements (see Figure 1), and place similar constraint on the pair of two-dimensional meshes. We also require that the elements of each mesh are partitioned into identical rectangular (two-dimensional) or rectangular parallelepiped (three-dimensional) partitions of elements on each processor, so that identical nodes comprise corresponding partitions in each mesh. Figure 1. Meshes for communication bandwidth tests. In the event that these simple meshes are partitioned into N p partitions of n n quadrilateral element partitions in the two-dimensional case (each quadrilateral subdivided into 2 triangles for the triangular mesh) and n n n hexahedral element partitions in the threedimensional case (each hexahedron subdivided into 5 tetrahedra for the tetrahedral mesh), the number of array elements associated with on-processor (volume data) and off-processor (surface data) data motion is shown as a function of n in Table 1. Steps 1 and 2 in Table 1 refer to the steps in the two-step gather-scatter. Noting that n is a linear function of N for each mesh, it is evident from Table 1 that the amount of surface data is O(N d 1 )whereas As is described in Johan et al. [10], the off-processor communication occurs on a node-to-node basis as opposed to an element-to-node basis.

13 Computer Methods in Applied Mechanics and Engineering, (1994) 13 the amount of volume data is O(N d ), where d is dimensionality of the mesh. As the size of the mesh and hence N increases, the amount of volume data quickly dominates the surface data. Step 1 Step 2 Mesh On-PN Off-PN On-PN Off-PN Quads n dof n en n 2 0 n dof ((n 1) 2 +2n) n dof 2n Triangles n dof n en 2n 2 0 n dof ((n 1) 2 +2n) n dof 2n Hexahedra n dof n en n 3 0 n dof ((n 1) 3 +3n 2 +1) n dof (3n 2 +1) Tetrahedra n dof n en 5n 3 0 n dof ((n 1) 3 +3n 2 +1) n dof (3n 2 +1) Table 1. Communication load for square and cubic partitions (n x = n y = n z = n). Ratios of surface data to volume data for meshes associated with specific values of N are shown Tables 2 and 3, where n dof is assumed to be 4. Here the partitions are not square as they are in Table 1, however. Table 2 corresponds to two-dimensional quadrilateral meshes of (1) a mesh of 16 8 partitions of 8 16 elements and (2) a mesh of 16 8 partitions of elements. Table 3 corresponds to three-dimensional hexahedral meshes of (1) a mesh of partitions of 3 6 6elementsand(2) a mesh of partitions of elements. The triangular and tetrahedral meshes are generated from the quadrilateral and hexahedral meshes as described above. From Tables 2 and 3, one can see that the ratio of surface data sent off-processor (at network bandwidth rates) to volume data sent on-processor (at memory bandwidth rates) is quite small, even for these moderately sized meshes. The degree to which a two-step gather or scatter will experience speed-ups due to a particular data partitioning strategy on a particular computer system is a function of the both the memory and the network bandwidths of the computer system. In the case of the CM-5, the speed-ups for the two-dimensional meshes considered in Table 2 are shown in Table 4, while the speed-ups for three-dimensional meshes considered in Table 3 are shown in Table 5. In Tables 4 and 5 the non-partitioned results are based on the communication strategy outlined in [19], with random distribution of the nodal points to the processors. Notice that the two-step scheme with partitioning offers dramatic speed-ups and that the speed-ups for the three-dimensional problems exceed those for the two-dimensional problems. 6. Numerical Simulations D flow around a cylinder: matrix-free vs. non-matrix-free performance In this section we consider a three-dimensional simulation of the flow past a circular cylinder. The simple problem geometry allows us to generate meshes of hexahedral and tetrahedral elements with relative ease. Here we use two meshes shown in Figure 2. The mesh shown in Figure 2 a) consists of 100,907 tetrahedral elements and 21,188 nodes, while the mesh in Figure 2 b) consists of 18,396 hexahedral elements and 21,460 nodes. The steady flow field at Re = 100 is obtained on both meshes with each technique (matrix-free and non-matrixfree). Figure 3 shows the steady-state pressure field around the cylinder obtained with the tetrahedral mesh.

14 Computer Methods in Applied Mechanics and Engineering, (1994) 14 Step 1 Step 2 Mesh On-PN Off-PN On-PN Off-PN Off-PN to On-PN Ratio Quads (N = 128) Triangles (N = 128) Quads (N = 256) Triangles (N = 256) Table 2. Surface to volume data ratios for the square meshes. Step 1 Step 2 Mesh On-PN Off-PN On-PN Off-PN Off-PN to On-PN Ratio Hexahedra (N = 24) Tetrahedra (N = 24) Hexahedra (N = 32) Tetrahedra (N = 32) Table 3. Surface to volume data ratios for the cubic meshes. Gather Scatter Mesh Non-partitioned Partitioned Non-partitioned Partitioned Quads (N = 128) 2.1 ( 30.8 ms) 7.9 ( 8.4 ms) 2.1 ( 31.3 ms) 6.4 (10.2 ms) Triangles (N = 128) 2.0 ( 48.9 ms) 12.4 ( 7.9 ms) 2.0 ( 50.7 ms) 9.5 (10.4 ms) Quads (N = 256) 1.8 (147.0 ms) 10.6 (24.7 ms) 1.7 (154.6 ms) 8.9 (29.9 ms) Triangles (N = 256) 1.6 (239.7 ms) 13.5 (29.4 ms) 1.6 (247.3 ms) 12.6 (32.1 ms) Table 4. Bandwidth comparison for the square mesh in MB/s/PN. Gather Scatter Mesh Non-partitioned Partitioned Non-partitioned Partitioned Hexahedra (N = 24) 1.8 ( 61.3 ms) 8.3 (13.3 ms) 2.1 ( 51.9 ms) 8.0 (13.8 ms) Tetrahedra (N = 24) 1.8 (155.8 ms) 18.8 (14.7 ms) 1.7 (159.1 ms) 14.2 (19.5 ms) Hexahedra (N = 32) 2.0 (132.6 ms) 9.3 (28.3 ms) 1.7 (151.4 ms) 8.1 (32.3 ms) Tetrahedra (N = 32) 1.5 (438.1 ms) 20.2 (32.5 ms) 1.5 (450.2 ms) 19.7 (33.3 ms) Table 5. Bandwidth comparison for the cubic mesh in MB/s/PN.

15 Computer Methods in Applied Mechanics and Engineering, (1994) 15 Figure 2. a) Surface of the tetrahedral and b) hexahedral cylinder mesh. Figure 3. Surface steady pressure field at Reynolds number 100. The parameter which influences the relative performance of the two techniques is the size of the Krylov space. Since in the matrix-free technique the matrix-vector products are replaced by residual evaluations, it is computation dominated; hence increasing the size of the Krylov space would result in the relative slow-down of the matrix-free technique. Table 6 indicates the time required per non-linear iteration, as well as the overall communication percentage, for different number of inner iterations (i.e. Krylov space sizes) for the tetrahedral mesh. Table 7 shows the same data for the hexahedral mesh. The measurements were taken on a CM-5 with 128 processing nodes for the tetrahedral mesh, and on a CM-5 with 32 processing nodes for the hexahedral mesh, resulting in similar subgrid lengths for

16 Computer Methods in Applied Mechanics and Engineering, (1994) 16 the two problems. We observe that for smaller Krylov spaces the matrix-free technique is faster, with a break-even point at around 8 inner iterations in the case of the tetrahedral mesh, and around 30 inner iterations in the case of the hexahedral mesh. The tetrahedral result is similar to findings by Johan [18] for compressible flows. We use 4 gauss points for the tetrahedral mesh and 8 for the hexahedral mesh. Note that in the current matrix-free implementation we store the values of the shape functions and Jacobians of the element domain transformation. At most (in the case of the deforming meshes) they are computed once every non-linear iteration. Matrix-free Non-matrix-free Krylov space size Iteration cost Comm. percentage Iteration cost Comm. percentage sec 18.2% 1.15 sec 18.7% sec 21.8% 1.19 sec 21.3% sec 19.3% 1.31 sec 20.5% sec 21.8% 1.87 sec 27.6% sec 22.8% 2.51 sec 32.1% sec 22.2% 3.24 sec 32.1% Table 6. Matrix-free vs. non-matrix-free comparison for the tetrahedral mesh. Matrix-free Non-matrix-free Krylov space size Iteration cost Comm. percentage Iteration cost Comm. percentage sec 17.2% 4.24 sec 9.2% sec 20.5% 4.62 sec 12.2% sec 21.0% 4.84 sec 13.8% sec 21.1% 6.06 sec 21.0% sec 22.9% 7.68 sec 23.1% sec 22.1% 9.08 sec 25.8% Table 7. Matrix-free vs. non-matrix-free comparison for the hexahedral mesh Flow around a submarine: partitioning benefits This simulation involves three-dimensional flow around a Los Angeles-class submarine. The ability to handle completely unstructured meshes is important when studying flows around complex shapes, since it is difficult to construct a structured mesh around a complex threedimensional object. The semi-automatic structured-mesh generators are generally less flexible and require more user intervention than fully automatic mesh generators designed for unstructured meshes. An example of the latter is the finite octree tetrahedral mesh generator developed by Shephard [20]. Here, this mesh generator was used to create a mesh around a Los Angeles-class submarine. The input to the mesh generator consisted of a geometric definition of the bounding surfaces of the mesh, including the outer rectangular box, and surface model of the submarine hull. The hull geometric model was digitized from commercially available data and was composed of a number of triangular and rectangular Bezier

17 Computer Methods in Applied Mechanics and Engineering, (1994) 17 patches. The mesh used for the current computations consisted of 86,111 nodes and 428,157 tetrahedral elements. Selected surfaces of that mesh are shown in Figure 4. Figure 4. Surface of the submarine mesh. In these initial computations the domain was stationary and therefore a more computationally efficient semi-discrete implementation (Tezduyar et al. [21]) was used in place of the space-time formulation. The boundary conditions consisted of a specified uniform inflow velocity, zero-normal-velocity/zero-shear-stress boundary conditions at the external lateral boundaries, a traction-free outflow boundary, and no-slip condition on the submarine hull. The Reynolds number is based on the free-stream fluid velocity and submarine length. The computations were restarted from a steady-state solution at Reynolds number The Reynolds stress was modeled using a Smagorinsky turbulence model after Kato [22]. In this model, the kinematic viscosity ν is augmented by an eddy viscosity ν T =(Ch) 2 (2ε(u):ε(u)) 1 2, (7) where C =0.15 is the model constant and h is the element length. In the transient phase of the solution, the Krylov space of 50 was used in the GMRES solver with no restarts. At each time step 4 nonlinear iterations were performed. A representative result from this preliminary computation is presented in Figure 5, which shows the pressure field on the submarine hull. At this point in the computation, the drag coefficient remained at The overall sustained performance and communication performance for this simulation are shown in Table 8. The communication performance is shown both for the case of the two-step communication of partitioned data (see Section 5) and for the case of a singlestep communication (see Mathur and Johnsson [19]) with random distribution of the nodes. Figure 6 shows the partitioning for 2048 vector units on the surface mesh of the submarine

18 Computer Methods in Applied Mechanics and Engineering, (1994) 18 Figure 5. Pressure distribution on the submarine hull. hull. Table 8 shows, with and without partitioning, the overall speed in GigaFLOPS, time taken per nonlinear iteration, as well as gather and scatter bandwidths attained in the GMRES solver. All measurements were taken on a CM-5 computer with 512 processing nodes and 2048 vector execution units. Note that the difference in the FORM phase speed between the partitioned and non-partitioned case is statistical and/or possibly due to the load on the front end. The partitioning is observed to more than double the overall speed, by decreasing the gather cost by a factor of 7 and scatter by a factor of 3.5. Figure 6. Partitioning of the submarine mesh for 2048 vector units. Non-partitioned Partitioned FORM phase speed 11.5 GigaFLOPS 12.3 GigaFLOPS Overall speed 2.4 GigaFLOPS 5.4 GigaFLOPS Time per iteration 9.9 sec 4.4 sec Gather Bandwidth 1.5 MB/s/PN 10.4 MB/s/PN Scatter Bandwidth 1.8 MB/s/PN 6.4 MB/s/PN Table 8. Performance with and without mesh partitioning.

19 Computer Methods in Applied Mechanics and Engineering, (1994) Concluding Remarks We have discussed various aspects of a data parallel implementation of finite element methods for computational fluid dynamics. The foundation for such implementation is the existence of high-level data parallel programming languages such as the Connection Machine Fortran or High Performance Fortran. These languages are ideal for exploiting the fine-grain parallelism occurring naturally in finite element problems on large meshes. We based the implementation discussion on a space-time velocity-pressure formulation of incompressible Navier-Stokes equations, and noted that this discussion is equally relevant to many other formulations, including those that employ conventional time-stepping methods. The issues covered include the selection of two principal data storage modes, the formation of elementlevel residual vectors, and the iterative solution process used to solve the linear system of equations arising at each nonlinear iteration step. Subsequently we investigated how additional control over the distribution of the data elements in the two storage modes can be used to significantly reduce the cost of communication between these storage sets. Here we used the two-step gather and scatter routines from the Connection Machine Scientific Software Library. Using a 3D flow past a cylinder as an example, we compared the performance of the aforementioned implementation, using a standard GMRES implementation, as well as its matrix-free version. Finally we presented some results from a 3D simulation of a flow past a complex submarine model, and compared the throughput of both the standard and the two-step communication routines on this practical problem. The preconditioning of the linear system arising from the finite element formulation is still an open issue, especially significant in the incompressible case, where some degree of global (i.e., not local to element or node) preconditioning can dramatically improve convergence. In the examples presented here, only a diagonal preconditioning/scaling has been used. 8. Acknowledgments This research was sponsored by NASA-JSC under grant NAG 9-449, by NSF under grants CTS and ASC , by ARPA under NIST contract 60NANB2D1272, and by ARO under grant DAAH04-93-G Partial support for this work has also come from the ARO contract number DAAL03-89-C-0038 with the AHPCRC at the University of Minnesota. We are indebted to Zdenek Johan for helpful comments and providing access to his CM-5 implementations of both the RSB algorithm for data decomposition and the two-step gather and scatter algorithms. We are also indebted to Kapil Mathur for helpful comments and his contributions to the two-step gather and scatter algorithms. References [1] J.G. Malone, Automatic mesh decomposition and concurrent finite element analysis for hypercube multiprocessor computers, Computer Methods in Applied Mechanics and Engineering, 70 (1988)

20 Computer Methods in Applied Mechanics and Engineering, (1994) 20 [2] C. Farhat and E. Wilson, A new finite element concurrent computer program architecture, International Journal for Numerical Methods in Engineering, 24 (1987) [3] G.A. Lyzenga, A. Raefsky, and B.H. Hager, Finite elements and the method of conjugate gradients on concurrent processors, Report C3P-119, California Institute of Technology, Pasadena, CA, [4] K.K. Mathur and S.L. Johnsson, The finite element method on a data parallel computing system, International Journal of High Speed Computing, 1 (1989) [5] T. Belytschko, E.J. Plaskacz, J.M. Kennedy, and D.L. Greenwell, Finite element analysis on the Connection Machine, Computer Methods in Applied Mechanics and Engineering, 81 (1990) [6] R.A. Shapiro, Implementation of an Euler/Navier-Stokes finite element algorithm on the Connection Machine, in AIAA , AIAA 29th Aerospace Sciences Meeting, (1991). [7] C. Farhat, N. Sobh, and K.C. Park, Transient finite element computations on 65,536 processors: The Connection Machine, International Journal for Numerical Methods in Engineering, 30 (1990) [8] Z. Johan, T.J.R. Hughes, K.K. Mathur, and S.L. Johnsson, A data parallel finite element method for computational fluid dynamics on the Connection Machine system, Computer Methods in Applied Mechanics and Engineering, 99 (1992) [9] M. Behr, A. Johnson, J. Kennedy, S. Mittal, and T.E. Tezduyar, Computation of incompressible flows with implicit finite element implementations on the Connection Machine, Computer Methods in Applied Mechanics and Engineering, 108 (1993) [10] Z. Johan, K.K. Mathur, S.L. Johnsson, and T.J.R. Hughes, An efficient communications strategy for finite element methods on the Connection Machine CM-5 system, Computer Methods in Applied Mechanics and Engineering, 113 (1994) [11] A. Pothen, H.D. Simon, and K.P. Liou, Partitioning sparse matrices with eigenvectors of graphs, SIAM Journal on Matrix Analysis and Applications, 11 (1990) [12] H.D. Simon, Partitioning of unstructured problems for parallel processing, Computing Systems in Engineering, 2 (1991) [13] C.H. Koelbel, D.B. Loveman, R.S. Schreiber, Jr. G.L. Steele, and M.E. Zosel, The High Performance Fortran Handbook. MIT Press, Cambridge, MA, 1994, ISBN [14] Thinking Machines Corporation, 245 First Street, Cambridge, MA 02142, CM Fortran Reference Manual, Versions 1.0 and 1.1, 1991.

21 Computer Methods in Applied Mechanics and Engineering, (1994) 21 [15] T.E. Tezduyar, M. Behr, and J. Liou, A new strategy for finite element computations involving moving boundaries and interfaces the deforming-spatial-domain/space-time procedure: I. The concept and the preliminary tests, Computer Methods in Applied Mechanics and Engineering, 94 (1992) [16] T.E. Tezduyar, M. Behr, S. Mittal, and J. Liou, A new strategy for finite element computations involving moving boundaries and interfaces the deforming-spatialdomain/space-time procedure: II. Computation of free-surface flows, two-liquid flows, and flows with drifting cylinders, Computer Methods in Applied Mechanics and Engineering, 94 (1992) [17] S.L. Johnsson and K.K. Mathur, Experience with the conjugate gradient method for stress analysis on a data parallel supercomputer, International Journal for Numerical Methods in Engineering, 27 (1989) [18] Z. Johan, Data Parallel Finite Element Techniques for Large-Scale Computational Fluid Dynamics, Ph.D. thesis, Department of Mechanical Engineering, Stanford University, [19] K.K. Mathur and S.L. Johnsson, Communication primitives for unstructured finite element simulations on data parallel architectures, Computer Systems in Engineering, 3 (1992) [20] M.S. Shephard and M.K. Georges, Automatic three-dimensional mesh generation by the finite octree technique, International Journal for Numerical Methods in Engineering, 32 (1991) [21] T.E. Tezduyar, S. Mittal, S.E. Ray, and R. Shih, Incompressible flow computations with stabilized bilinear and linear equal-order-interpolation velocity-pressure elements, Computer Methods in Applied Mechanics and Engineering, 95 (1992) [22] C. Kato and M. Ikegawa, Large eddy simulation of unsteady turbulent wake of a circular cylinder using the finite element method, in I. Celik, T. Kobayashi, K.N. Ghia, and J. Kurokawa, editors, Advances in Numerical Simulation of Turbulent Flows, FED-Vol.117, ASME, New York, (1991)

A NEW MIXED PRECONDITIONING METHOD BASED ON THE CLUSTERED ELEMENT -BY -ELEMENT PRECONDITIONERS

A NEW MIXED PRECONDITIONING METHOD BASED ON THE CLUSTERED ELEMENT -BY -ELEMENT PRECONDITIONERS Contemporary Mathematics Volume 157, 1994 A NEW MIXED PRECONDITIONING METHOD BASED ON THE CLUSTERED ELEMENT -BY -ELEMENT PRECONDITIONERS T.E. Tezduyar, M. Behr, S.K. Aliabadi, S. Mittal and S.E. Ray ABSTRACT.

More information

COMPUTATIONAL METHODS FOR ENVIRONMENTAL FLUID MECHANICS

COMPUTATIONAL METHODS FOR ENVIRONMENTAL FLUID MECHANICS COMPUTATIONAL METHODS FOR ENVIRONMENTAL FLUID MECHANICS Tayfun Tezduyar tezduyar@rice.edu Team for Advanced Flow Simulation and Modeling (T*AFSM) Mechanical Engineering and Materials Science Rice University

More information

Corrected/Updated References

Corrected/Updated References K. Kashiyama, H. Ito, M. Behr and T. Tezduyar, "Massively Parallel Finite Element Strategies for Large-Scale Computation of Shallow Water Flows and Contaminant Transport", Extended Abstracts of the Second

More information

Mesh Decomposition and Communication Procedures for Finite Element Applications on the Connection Machine CM-5 System

Mesh Decomposition and Communication Procedures for Finite Element Applications on the Connection Machine CM-5 System Mesh Decomposition and Communication Procedures for Finite Element Applications on the Connection Machine CM-5 System The Harvard community has made this article openly available. Please share how this

More information

Advanced Mesh Update Techniques for Problems Involving Large Displacements

Advanced Mesh Update Techniques for Problems Involving Large Displacements WCCM V Fifth World Congress on Computational Mechanics July 7,, Vienna, Austria Eds.: H.A. Mang, F.G. Rammerstorfer, J. Eberhardsteiner Advanced Mesh Update Techniques for Problems Involving Large Displacements

More information

Massively Parallel Computing: Unstructured Finite Element Simulations

Massively Parallel Computing: Unstructured Finite Element Simulations Massively Parallel Computing: Unstructured Finite Element Simulations The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Citation

More information

1.2 Numerical Solutions of Flow Problems

1.2 Numerical Solutions of Flow Problems 1.2 Numerical Solutions of Flow Problems DIFFERENTIAL EQUATIONS OF MOTION FOR A SIMPLIFIED FLOW PROBLEM Continuity equation for incompressible flow: 0 Momentum (Navier-Stokes) equations for a Newtonian

More information

2.7 Cloth Animation. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter 2 123

2.7 Cloth Animation. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter 2 123 2.7 Cloth Animation 320491: Advanced Graphics - Chapter 2 123 Example: Cloth draping Image Michael Kass 320491: Advanced Graphics - Chapter 2 124 Cloth using mass-spring model Network of masses and springs

More information

STABILIZED FINITE ELEMENT METHODS FOR INCOMPRESSIBLE FLOWS WITH EMPHASIS ON MOVING BOUNDARIES AND INTERFACES

STABILIZED FINITE ELEMENT METHODS FOR INCOMPRESSIBLE FLOWS WITH EMPHASIS ON MOVING BOUNDARIES AND INTERFACES STABILIZED FINITE ELEMENT METHODS FOR INCOMPRESSIBLE FLOWS WITH EMPHASIS ON MOVING BOUNDARIES AND INTERFACES A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Marek

More information

A higher-order finite volume method with collocated grid arrangement for incompressible flows

A higher-order finite volume method with collocated grid arrangement for incompressible flows Computational Methods and Experimental Measurements XVII 109 A higher-order finite volume method with collocated grid arrangement for incompressible flows L. Ramirez 1, X. Nogueira 1, S. Khelladi 2, J.

More information

Scalability of Finite Element Applications on Distributed-Memory Parallel Computers

Scalability of Finite Element Applications on Distributed-Memory Parallel Computers Scalability of Finite Element Applications on Distributed-Memory Parallel Computers The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters.

More information

Investigation of cross flow over a circular cylinder at low Re using the Immersed Boundary Method (IBM)

Investigation of cross flow over a circular cylinder at low Re using the Immersed Boundary Method (IBM) Computational Methods and Experimental Measurements XVII 235 Investigation of cross flow over a circular cylinder at low Re using the Immersed Boundary Method (IBM) K. Rehman Department of Mechanical Engineering,

More information

MESHLESS SOLUTION OF INCOMPRESSIBLE FLOW OVER BACKWARD-FACING STEP

MESHLESS SOLUTION OF INCOMPRESSIBLE FLOW OVER BACKWARD-FACING STEP Vol. 12, Issue 1/2016, 63-68 DOI: 10.1515/cee-2016-0009 MESHLESS SOLUTION OF INCOMPRESSIBLE FLOW OVER BACKWARD-FACING STEP Juraj MUŽÍK 1,* 1 Department of Geotechnics, Faculty of Civil Engineering, University

More information

Parallel High-Order Geometric Multigrid Methods on Adaptive Meshes for Highly Heterogeneous Nonlinear Stokes Flow Simulations of Earth s Mantle

Parallel High-Order Geometric Multigrid Methods on Adaptive Meshes for Highly Heterogeneous Nonlinear Stokes Flow Simulations of Earth s Mantle ICES Student Forum The University of Texas at Austin, USA November 4, 204 Parallel High-Order Geometric Multigrid Methods on Adaptive Meshes for Highly Heterogeneous Nonlinear Stokes Flow Simulations of

More information

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND Student Submission for the 5 th OpenFOAM User Conference 2017, Wiesbaden - Germany: SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND TESSA UROIĆ Faculty of Mechanical Engineering and Naval Architecture, Ivana

More information

Why Use the GPU? How to Exploit? New Hardware Features. Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. Semiconductor trends

Why Use the GPU? How to Exploit? New Hardware Features. Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. Semiconductor trends Imagine stream processor; Bill Dally, Stanford Connection Machine CM; Thinking Machines Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid Jeffrey Bolz Eitan Grinspun Caltech Ian Farmer

More information

Isogeometric Analysis of Fluid-Structure Interaction

Isogeometric Analysis of Fluid-Structure Interaction Isogeometric Analysis of Fluid-Structure Interaction Y. Bazilevs, V.M. Calo, T.J.R. Hughes Institute for Computational Engineering and Sciences, The University of Texas at Austin, USA e-mail: {bazily,victor,hughes}@ices.utexas.edu

More information

Computational Fluid Dynamics - Incompressible Flows

Computational Fluid Dynamics - Incompressible Flows Computational Fluid Dynamics - Incompressible Flows March 25, 2008 Incompressible Flows Basis Functions Discrete Equations CFD - Incompressible Flows CFD is a Huge field Numerical Techniques for solving

More information

Case C3.1: Turbulent Flow over a Multi-Element MDA Airfoil

Case C3.1: Turbulent Flow over a Multi-Element MDA Airfoil Case C3.1: Turbulent Flow over a Multi-Element MDA Airfoil Masayuki Yano and David L. Darmofal Aerospace Computational Design Laboratory, Massachusetts Institute of Technology I. Code Description ProjectX

More information

THE application of advanced computer architecture and

THE application of advanced computer architecture and 544 IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, VOL. 45, NO. 3, MARCH 1997 Scalable Solutions to Integral-Equation and Finite-Element Simulations Tom Cwik, Senior Member, IEEE, Daniel S. Katz, Member,

More information

The effect of irregular interfaces on the BDDC method for the Navier-Stokes equations

The effect of irregular interfaces on the BDDC method for the Navier-Stokes equations 153 The effect of irregular interfaces on the BDDC method for the Navier-Stokes equations Martin Hanek 1, Jakub Šístek 2,3 and Pavel Burda 1 1 Introduction The Balancing Domain Decomposition based on Constraints

More information

Modeling External Compressible Flow

Modeling External Compressible Flow Tutorial 3. Modeling External Compressible Flow Introduction The purpose of this tutorial is to compute the turbulent flow past a transonic airfoil at a nonzero angle of attack. You will use the Spalart-Allmaras

More information

Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics

Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics I. Pantle Fachgebiet Strömungsmaschinen Karlsruher Institut für Technologie KIT Motivation

More information

ALE Seamless Immersed Boundary Method with Overset Grid System for Multiple Moving Objects

ALE Seamless Immersed Boundary Method with Overset Grid System for Multiple Moving Objects Tenth International Conference on Computational Fluid Dynamics (ICCFD10), Barcelona,Spain, July 9-13, 2018 ICCFD10-047 ALE Seamless Immersed Boundary Method with Overset Grid System for Multiple Moving

More information

Final drive lubrication modeling

Final drive lubrication modeling Final drive lubrication modeling E. Avdeev a,b 1, V. Ovchinnikov b a Samara University, b Laduga Automotive Engineering Abstract. In this paper we describe the method, which is the composition of finite

More information

Numerical Simulations of Fluid-Structure Interaction Problems using MpCCI

Numerical Simulations of Fluid-Structure Interaction Problems using MpCCI Numerical Simulations of Fluid-Structure Interaction Problems using MpCCI François Thirifay and Philippe Geuzaine CENAERO, Avenue Jean Mermoz 30, B-6041 Gosselies, Belgium Abstract. This paper reports

More information

Driven Cavity Example

Driven Cavity Example BMAppendixI.qxd 11/14/12 6:55 PM Page I-1 I CFD Driven Cavity Example I.1 Problem One of the classic benchmarks in CFD is the driven cavity problem. Consider steady, incompressible, viscous flow in a square

More information

Non-Newtonian Transitional Flow in an Eccentric Annulus

Non-Newtonian Transitional Flow in an Eccentric Annulus Tutorial 8. Non-Newtonian Transitional Flow in an Eccentric Annulus Introduction The purpose of this tutorial is to illustrate the setup and solution of a 3D, turbulent flow of a non-newtonian fluid. Turbulent

More information

Geometry based pre-processor for parallel fluid dynamic simulations using a hierarchical basis

Geometry based pre-processor for parallel fluid dynamic simulations using a hierarchical basis Geometry based pre-processor for parallel fluid dynamic simulations using a hierarchical basis Anil Kumar Karanam Scientific Computation Research Center, RPI Kenneth E. Jansen Scientific Computation Research

More information

A High-Order Accurate Unstructured GMRES Solver for Poisson s Equation

A High-Order Accurate Unstructured GMRES Solver for Poisson s Equation A High-Order Accurate Unstructured GMRES Solver for Poisson s Equation Amir Nejat * and Carl Ollivier-Gooch Department of Mechanical Engineering, The University of British Columbia, BC V6T 1Z4, Canada

More information

CHAPTER 1. Introduction

CHAPTER 1. Introduction ME 475: Computer-Aided Design of Structures 1-1 CHAPTER 1 Introduction 1.1 Analysis versus Design 1.2 Basic Steps in Analysis 1.3 What is the Finite Element Method? 1.4 Geometrical Representation, Discretization

More information

Techniques for Optimizing FEM/MoM Codes

Techniques for Optimizing FEM/MoM Codes Techniques for Optimizing FEM/MoM Codes Y. Ji, T. H. Hubing, and H. Wang Electromagnetic Compatibility Laboratory Department of Electrical & Computer Engineering University of Missouri-Rolla Rolla, MO

More information

Non-Linear Finite Element Methods in Solid Mechanics Attilio Frangi, Politecnico di Milano, February 3, 2017, Lesson 1

Non-Linear Finite Element Methods in Solid Mechanics Attilio Frangi, Politecnico di Milano, February 3, 2017, Lesson 1 Non-Linear Finite Element Methods in Solid Mechanics Attilio Frangi, attilio.frangi@polimi.it Politecnico di Milano, February 3, 2017, Lesson 1 1 Politecnico di Milano, February 3, 2017, Lesson 1 2 Outline

More information

ENERGY-224 Reservoir Simulation Project Report. Ala Alzayer

ENERGY-224 Reservoir Simulation Project Report. Ala Alzayer ENERGY-224 Reservoir Simulation Project Report Ala Alzayer Autumn Quarter December 3, 2014 Contents 1 Objective 2 2 Governing Equations 2 3 Methodolgy 3 3.1 BlockMesh.........................................

More information

Using a Single Rotating Reference Frame

Using a Single Rotating Reference Frame Tutorial 9. Using a Single Rotating Reference Frame Introduction This tutorial considers the flow within a 2D, axisymmetric, co-rotating disk cavity system. Understanding the behavior of such flows is

More information

FAST ALGORITHMS FOR CALCULATIONS OF VISCOUS INCOMPRESSIBLE FLOWS USING THE ARTIFICIAL COMPRESSIBILITY METHOD

FAST ALGORITHMS FOR CALCULATIONS OF VISCOUS INCOMPRESSIBLE FLOWS USING THE ARTIFICIAL COMPRESSIBILITY METHOD TASK QUARTERLY 12 No 3, 273 287 FAST ALGORITHMS FOR CALCULATIONS OF VISCOUS INCOMPRESSIBLE FLOWS USING THE ARTIFICIAL COMPRESSIBILITY METHOD ZBIGNIEW KOSMA Institute of Applied Mechanics, Technical University

More information

Mid-Year Report. Discontinuous Galerkin Euler Equation Solver. Friday, December 14, Andrey Andreyev. Advisor: Dr.

Mid-Year Report. Discontinuous Galerkin Euler Equation Solver. Friday, December 14, Andrey Andreyev. Advisor: Dr. Mid-Year Report Discontinuous Galerkin Euler Equation Solver Friday, December 14, 2012 Andrey Andreyev Advisor: Dr. James Baeder Abstract: The focus of this effort is to produce a two dimensional inviscid,

More information

Solution of 2D Euler Equations and Application to Airfoil Design

Solution of 2D Euler Equations and Application to Airfoil Design WDS'6 Proceedings of Contributed Papers, Part I, 47 52, 26. ISBN 8-86732-84-3 MATFYZPRESS Solution of 2D Euler Equations and Application to Airfoil Design J. Šimák Charles University, Faculty of Mathematics

More information

Numerical and theoretical analysis of shock waves interaction and reflection

Numerical and theoretical analysis of shock waves interaction and reflection Fluid Structure Interaction and Moving Boundary Problems IV 299 Numerical and theoretical analysis of shock waves interaction and reflection K. Alhussan Space Research Institute, King Abdulaziz City for

More information

cuibm A GPU Accelerated Immersed Boundary Method

cuibm A GPU Accelerated Immersed Boundary Method cuibm A GPU Accelerated Immersed Boundary Method S. K. Layton, A. Krishnan and L. A. Barba Corresponding author: labarba@bu.edu Department of Mechanical Engineering, Boston University, Boston, MA, 225,

More information

FEMLAB Exercise 1 for ChE366

FEMLAB Exercise 1 for ChE366 FEMLAB Exercise 1 for ChE366 Problem statement Consider a spherical particle of radius r s moving with constant velocity U in an infinitely long cylinder of radius R that contains a Newtonian fluid. Let

More information

Solving Partial Differential Equations on Overlapping Grids

Solving Partial Differential Equations on Overlapping Grids **FULL TITLE** ASP Conference Series, Vol. **VOLUME**, **YEAR OF PUBLICATION** **NAMES OF EDITORS** Solving Partial Differential Equations on Overlapping Grids William D. Henshaw Centre for Applied Scientific

More information

Prediction of Flow Features in Centrifugal Blood Pumps

Prediction of Flow Features in Centrifugal Blood Pumps ECCM-2001 European Conference on Computational Mechanics June 26-29, 2001 Cracow, Poland Prediction of Flow Features in Centrifugal Blood Pumps Marek Behr and Dhruv Arora Department of Mechanical Engineering

More information

Strömningslära Fluid Dynamics. Computer laboratories using COMSOL v4.4

Strömningslära Fluid Dynamics. Computer laboratories using COMSOL v4.4 UMEÅ UNIVERSITY Department of Physics Claude Dion Olexii Iukhymenko May 15, 2015 Strömningslära Fluid Dynamics (5FY144) Computer laboratories using COMSOL v4.4!! Report requirements Computer labs must

More information

NIA CFD Seminar, October 4, 2011 Hyperbolic Seminar, NASA Langley, October 17, 2011

NIA CFD Seminar, October 4, 2011 Hyperbolic Seminar, NASA Langley, October 17, 2011 NIA CFD Seminar, October 4, 2011 Hyperbolic Seminar, NASA Langley, October 17, 2011 First-Order Hyperbolic System Method If you have a CFD book for hyperbolic problems, you have a CFD book for all problems.

More information

Multigrid Solvers in CFD. David Emerson. Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK

Multigrid Solvers in CFD. David Emerson. Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK Multigrid Solvers in CFD David Emerson Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK david.emerson@stfc.ac.uk 1 Outline Multigrid: general comments Incompressible

More information

Finite Volume Discretization on Irregular Voronoi Grids

Finite Volume Discretization on Irregular Voronoi Grids Finite Volume Discretization on Irregular Voronoi Grids C.Huettig 1, W. Moore 1 1 Hampton University / National Institute of Aerospace Folie 1 The earth and its terrestrial neighbors NASA Colin Rose, Dorling

More information

Adaptive numerical methods

Adaptive numerical methods METRO MEtallurgical TRaining On-line Adaptive numerical methods Arkadiusz Nagórka CzUT Education and Culture Introduction Common steps of finite element computations consists of preprocessing - definition

More information

This tutorial illustrates how to set up and solve a problem involving solidification. This tutorial will demonstrate how to do the following:

This tutorial illustrates how to set up and solve a problem involving solidification. This tutorial will demonstrate how to do the following: Tutorial 22. Modeling Solidification Introduction This tutorial illustrates how to set up and solve a problem involving solidification. This tutorial will demonstrate how to do the following: Define a

More information

Semi-automatic domain decomposition based on potential theory

Semi-automatic domain decomposition based on potential theory Semi-automatic domain decomposition based on potential theory S.P. Spekreijse and J.C. Kok Nationaal Lucht- en Ruimtevaartlaboratorium National Aerospace Laboratory NLR Semi-automatic domain decomposition

More information

Large Eddy Simulation of Flow over a Backward Facing Step using Fire Dynamics Simulator (FDS)

Large Eddy Simulation of Flow over a Backward Facing Step using Fire Dynamics Simulator (FDS) The 14 th Asian Congress of Fluid Mechanics - 14ACFM October 15-19, 2013; Hanoi and Halong, Vietnam Large Eddy Simulation of Flow over a Backward Facing Step using Fire Dynamics Simulator (FDS) Md. Mahfuz

More information

Guidelines for proper use of Plate elements

Guidelines for proper use of Plate elements Guidelines for proper use of Plate elements In structural analysis using finite element method, the analysis model is created by dividing the entire structure into finite elements. This procedure is known

More information

The 3D DSC in Fluid Simulation

The 3D DSC in Fluid Simulation The 3D DSC in Fluid Simulation Marek K. Misztal Informatics and Mathematical Modelling, Technical University of Denmark mkm@imm.dtu.dk DSC 2011 Workshop Kgs. Lyngby, 26th August 2011 Governing Equations

More information

Introduction to CFX. Workshop 2. Transonic Flow Over a NACA 0012 Airfoil. WS2-1. ANSYS, Inc. Proprietary 2009 ANSYS, Inc. All rights reserved.

Introduction to CFX. Workshop 2. Transonic Flow Over a NACA 0012 Airfoil. WS2-1. ANSYS, Inc. Proprietary 2009 ANSYS, Inc. All rights reserved. Workshop 2 Transonic Flow Over a NACA 0012 Airfoil. Introduction to CFX WS2-1 Goals The purpose of this tutorial is to introduce the user to modelling flow in high speed external aerodynamic applications.

More information

High-Order Navier-Stokes Simulations using a Sparse Line-Based Discontinuous Galerkin Method

High-Order Navier-Stokes Simulations using a Sparse Line-Based Discontinuous Galerkin Method High-Order Navier-Stokes Simulations using a Sparse Line-Based Discontinuous Galerkin Method Per-Olof Persson University of California, Berkeley, Berkeley, CA 9472-384, U.S.A. We study some of the properties

More information

Introduction to ANSYS CFX

Introduction to ANSYS CFX Workshop 03 Fluid flow around the NACA0012 Airfoil 16.0 Release Introduction to ANSYS CFX 2015 ANSYS, Inc. March 13, 2015 1 Release 16.0 Workshop Description: The flow simulated is an external aerodynamics

More information

Revision of the SolidWorks Variable Pressure Simulation Tutorial J.E. Akin, Rice University, Mechanical Engineering. Introduction

Revision of the SolidWorks Variable Pressure Simulation Tutorial J.E. Akin, Rice University, Mechanical Engineering. Introduction Revision of the SolidWorks Variable Pressure Simulation Tutorial J.E. Akin, Rice University, Mechanical Engineering Introduction A SolidWorks simulation tutorial is just intended to illustrate where to

More information

Development of a Maxwell Equation Solver for Application to Two Fluid Plasma Models. C. Aberle, A. Hakim, and U. Shumlak

Development of a Maxwell Equation Solver for Application to Two Fluid Plasma Models. C. Aberle, A. Hakim, and U. Shumlak Development of a Maxwell Equation Solver for Application to Two Fluid Plasma Models C. Aberle, A. Hakim, and U. Shumlak Aerospace and Astronautics University of Washington, Seattle American Physical Society

More information

PATCH TEST OF HEXAHEDRAL ELEMENT

PATCH TEST OF HEXAHEDRAL ELEMENT Annual Report of ADVENTURE Project ADV-99- (999) PATCH TEST OF HEXAHEDRAL ELEMENT Yoshikazu ISHIHARA * and Hirohisa NOGUCHI * * Mitsubishi Research Institute, Inc. e-mail: y-ishi@mri.co.jp * Department

More information

Airfoil Design Optimization Using Reduced Order Models Based on Proper Orthogonal Decomposition

Airfoil Design Optimization Using Reduced Order Models Based on Proper Orthogonal Decomposition Airfoil Design Optimization Using Reduced Order Models Based on Proper Orthogonal Decomposition.5.5.5.5.5.5.5..5.95.9.85.8.75.7 Patrick A. LeGresley and Juan J. Alonso Dept. of Aeronautics & Astronautics

More information

Improvements in Dynamic Partitioning. Aman Arora Snehal Chitnavis

Improvements in Dynamic Partitioning. Aman Arora Snehal Chitnavis Improvements in Dynamic Partitioning Aman Arora Snehal Chitnavis Introduction Partitioning - Decomposition & Assignment Break up computation into maximum number of small concurrent computations that can

More information

Optimization to Reduce Automobile Cabin Noise

Optimization to Reduce Automobile Cabin Noise EngOpt 2008 - International Conference on Engineering Optimization Rio de Janeiro, Brazil, 01-05 June 2008. Optimization to Reduce Automobile Cabin Noise Harold Thomas, Dilip Mandal, and Narayanan Pagaldipti

More information

Application of GPU-Based Computing to Large Scale Finite Element Analysis of Three-Dimensional Structures

Application of GPU-Based Computing to Large Scale Finite Element Analysis of Three-Dimensional Structures Paper 6 Civil-Comp Press, 2012 Proceedings of the Eighth International Conference on Engineering Computational Technology, B.H.V. Topping, (Editor), Civil-Comp Press, Stirlingshire, Scotland Application

More information

(LSS Erlangen, Simon Bogner, Ulrich Rüde, Thomas Pohl, Nils Thürey in collaboration with many more

(LSS Erlangen, Simon Bogner, Ulrich Rüde, Thomas Pohl, Nils Thürey in collaboration with many more Parallel Free-Surface Extension of the Lattice-Boltzmann Method A Lattice-Boltzmann Approach for Simulation of Two-Phase Flows Stefan Donath (LSS Erlangen, stefan.donath@informatik.uni-erlangen.de) Simon

More information

HPC Algorithms and Applications

HPC Algorithms and Applications HPC Algorithms and Applications Dwarf #5 Structured Grids Michael Bader Winter 2012/2013 Dwarf #5 Structured Grids, Winter 2012/2013 1 Dwarf #5 Structured Grids 1. dense linear algebra 2. sparse linear

More information

PARALLEL COMPUTING. Tayfun E. Tezduyar. Ahmed Sameh

PARALLEL COMPUTING. Tayfun E. Tezduyar. Ahmed Sameh 1 FINITE ELEMENT METHODS: 1970 s AND BEYOND L.P. Franca, T.E. Tezduyar and A. Masud (Eds.) c CIMNE, Barcelona, Spain 2004 PARALLEL COMPUTING Tayfun E. Tezduyar Mechanical Engineering Rice University MS

More information

Stream Function-Vorticity CFD Solver MAE 6263

Stream Function-Vorticity CFD Solver MAE 6263 Stream Function-Vorticity CFD Solver MAE 66 Charles O Neill April, 00 Abstract A finite difference CFD solver was developed for transient, two-dimensional Cartesian viscous flows. Flow parameters are solved

More information

Subdivision-stabilised immersed b-spline finite elements for moving boundary flows

Subdivision-stabilised immersed b-spline finite elements for moving boundary flows Subdivision-stabilised immersed b-spline finite elements for moving boundary flows T. Rüberg, F. Cirak Department of Engineering, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, U.K. Abstract

More information

High-Order Numerical Algorithms for Steady and Unsteady Simulation of Viscous Compressible Flow with Shocks (Grant FA )

High-Order Numerical Algorithms for Steady and Unsteady Simulation of Viscous Compressible Flow with Shocks (Grant FA ) High-Order Numerical Algorithms for Steady and Unsteady Simulation of Viscous Compressible Flow with Shocks (Grant FA9550-07-0195) Sachin Premasuthan, Kui Ou, Patrice Castonguay, Lala Li, Yves Allaneau,

More information

Contents. I The Basic Framework for Stationary Problems 1

Contents. I The Basic Framework for Stationary Problems 1 page v Preface xiii I The Basic Framework for Stationary Problems 1 1 Some model PDEs 3 1.1 Laplace s equation; elliptic BVPs... 3 1.1.1 Physical experiments modeled by Laplace s equation... 5 1.2 Other

More information

Element Quality Metrics for Higher-Order Bernstein Bézier Elements

Element Quality Metrics for Higher-Order Bernstein Bézier Elements Element Quality Metrics for Higher-Order Bernstein Bézier Elements Luke Engvall and John A. Evans Abstract In this note, we review the interpolation theory for curvilinear finite elements originally derived

More information

Application of Finite Volume Method for Structural Analysis

Application of Finite Volume Method for Structural Analysis Application of Finite Volume Method for Structural Analysis Saeed-Reza Sabbagh-Yazdi and Milad Bayatlou Associate Professor, Civil Engineering Department of KNToosi University of Technology, PostGraduate

More information

Steady Flow: Lid-Driven Cavity Flow

Steady Flow: Lid-Driven Cavity Flow STAR-CCM+ User Guide Steady Flow: Lid-Driven Cavity Flow 2 Steady Flow: Lid-Driven Cavity Flow This tutorial demonstrates the performance of STAR-CCM+ in solving a traditional square lid-driven cavity

More information

The Immersed Interface Method

The Immersed Interface Method The Immersed Interface Method Numerical Solutions of PDEs Involving Interfaces and Irregular Domains Zhiiin Li Kazufumi Ito North Carolina State University Raleigh, North Carolina Society for Industrial

More information

Efficiency Aspects for Advanced Fluid Finite Element Formulations

Efficiency Aspects for Advanced Fluid Finite Element Formulations Proceedings of the 5 th International Conference on Computation of Shell and Spatial Structures June 1-4, 2005 Salzburg, Austria E. Ramm, W. A. Wall, K.-U. Bletzinger, M. Bischoff (eds.) www.iassiacm2005.de

More information

Lab 9: FLUENT: Transient Natural Convection Between Concentric Cylinders

Lab 9: FLUENT: Transient Natural Convection Between Concentric Cylinders Lab 9: FLUENT: Transient Natural Convection Between Concentric Cylinders Objective: The objective of this laboratory is to introduce how to use FLUENT to solve both transient and natural convection problems.

More information

Introduction to C omputational F luid Dynamics. D. Murrin

Introduction to C omputational F luid Dynamics. D. Murrin Introduction to C omputational F luid Dynamics D. Murrin Computational fluid dynamics (CFD) is the science of predicting fluid flow, heat transfer, mass transfer, chemical reactions, and related phenomena

More information

Predicting Tumour Location by Modelling the Deformation of the Breast using Nonlinear Elasticity

Predicting Tumour Location by Modelling the Deformation of the Breast using Nonlinear Elasticity Predicting Tumour Location by Modelling the Deformation of the Breast using Nonlinear Elasticity November 8th, 2006 Outline Motivation Motivation Motivation for Modelling Breast Deformation Mesh Generation

More information

Reproducibility of Complex Turbulent Flow Using Commercially-Available CFD Software

Reproducibility of Complex Turbulent Flow Using Commercially-Available CFD Software Reports of Research Institute for Applied Mechanics, Kyushu University No.150 (71 83) March 2016 Reproducibility of Complex Turbulent Flow Using Commercially-Available CFD Software Report 3: For the Case

More information

AN IMPROVED METHOD TO MODEL SEMI-ELLIPTICAL SURFACE CRACKS USING ELEMENT MISMATCH IN ABAQUS

AN IMPROVED METHOD TO MODEL SEMI-ELLIPTICAL SURFACE CRACKS USING ELEMENT MISMATCH IN ABAQUS AN IMPROVED METHOD TO MODEL SEMI-ELLIPTICAL SURFACE CRACKS USING ELEMENT MISMATCH IN ABAQUS R. H. A. Latiff and F. Yusof School of Mechanical Engineering, UniversitiSains, Malaysia E-Mail: mefeizal@usm.my

More information

THE EFFECTS OF THE PLANFORM SHAPE ON DRAG POLAR CURVES OF WINGS: FLUID-STRUCTURE INTERACTION ANALYSES RESULTS

THE EFFECTS OF THE PLANFORM SHAPE ON DRAG POLAR CURVES OF WINGS: FLUID-STRUCTURE INTERACTION ANALYSES RESULTS March 18-20, 2013 THE EFFECTS OF THE PLANFORM SHAPE ON DRAG POLAR CURVES OF WINGS: FLUID-STRUCTURE INTERACTION ANALYSES RESULTS Authors: M.R. Chiarelli, M. Ciabattari, M. Cagnoni, G. Lombardi Speaker:

More information

ANSYS FLUENT. Airfoil Analysis and Tutorial

ANSYS FLUENT. Airfoil Analysis and Tutorial ANSYS FLUENT Airfoil Analysis and Tutorial ENGR083: Fluid Mechanics II Terry Yu 5/11/2017 Abstract The NACA 0012 airfoil was one of the earliest airfoils created. Its mathematically simple shape and age

More information

Lagrangian and Eulerian Representations of Fluid Flow: Kinematics and the Equations of Motion

Lagrangian and Eulerian Representations of Fluid Flow: Kinematics and the Equations of Motion Lagrangian and Eulerian Representations of Fluid Flow: Kinematics and the Equations of Motion James F. Price Woods Hole Oceanographic Institution Woods Hole, MA, 02543 July 31, 2006 Summary: This essay

More information

CFD Analysis of 2-D Unsteady Flow Past a Square Cylinder at an Angle of Incidence

CFD Analysis of 2-D Unsteady Flow Past a Square Cylinder at an Angle of Incidence CFD Analysis of 2-D Unsteady Flow Past a Square Cylinder at an Angle of Incidence Kavya H.P, Banjara Kotresha 2, Kishan Naik 3 Dept. of Studies in Mechanical Engineering, University BDT College of Engineering,

More information

From Hyperbolic Diffusion Scheme to Gradient Method: Implicit Green-Gauss Gradients for Unstructured Grids

From Hyperbolic Diffusion Scheme to Gradient Method: Implicit Green-Gauss Gradients for Unstructured Grids Preprint accepted in Journal of Computational Physics. https://doi.org/10.1016/j.jcp.2018.06.019 From Hyperbolic Diffusion Scheme to Gradient Method: Implicit Green-Gauss Gradients for Unstructured Grids

More information

Incompressible Viscous Flow Simulations Using the Petrov-Galerkin Finite Element Method

Incompressible Viscous Flow Simulations Using the Petrov-Galerkin Finite Element Method Copyright c 2007 ICCES ICCES, vol.4, no.1, pp.11-18, 2007 Incompressible Viscous Flow Simulations Using the Petrov-Galerkin Finite Element Method Kazuhiko Kakuda 1, Tomohiro Aiso 1 and Shinichiro Miura

More information

Case C1.3: Flow Over the NACA 0012 Airfoil: Subsonic Inviscid, Transonic Inviscid, and Subsonic Laminar Flows

Case C1.3: Flow Over the NACA 0012 Airfoil: Subsonic Inviscid, Transonic Inviscid, and Subsonic Laminar Flows Case C1.3: Flow Over the NACA 0012 Airfoil: Subsonic Inviscid, Transonic Inviscid, and Subsonic Laminar Flows Masayuki Yano and David L. Darmofal Aerospace Computational Design Laboratory, Massachusetts

More information

A MULTI-DOMAIN ALE ALGORITHM FOR SIMULATING FLOWS INSIDE FREE-PISTON DRIVEN HYPERSONIC TEST FACILITIES

A MULTI-DOMAIN ALE ALGORITHM FOR SIMULATING FLOWS INSIDE FREE-PISTON DRIVEN HYPERSONIC TEST FACILITIES A MULTI-DOMAIN ALE ALGORITHM FOR SIMULATING FLOWS INSIDE FREE-PISTON DRIVEN HYPERSONIC TEST FACILITIES Khalil Bensassi, and Herman Deconinck Von Karman Institute for Fluid Dynamics Aeronautics & Aerospace

More information

The Development of a Navier-Stokes Flow Solver with Preconditioning Method on Unstructured Grids

The Development of a Navier-Stokes Flow Solver with Preconditioning Method on Unstructured Grids Proceedings of the International MultiConference of Engineers and Computer Scientists 213 Vol II, IMECS 213, March 13-15, 213, Hong Kong The Development of a Navier-Stokes Flow Solver with Preconditioning

More information

Case C2.2: Turbulent, Transonic Flow over an RAE 2822 Airfoil

Case C2.2: Turbulent, Transonic Flow over an RAE 2822 Airfoil Case C2.2: Turbulent, Transonic Flow over an RAE 2822 Airfoil Masayuki Yano and David L. Darmofal Aerospace Computational Design Laboratory, Massachusetts Institute of Technology I. Code Description ProjectX

More information

A COUPLED FINITE VOLUME SOLVER FOR THE SOLUTION OF LAMINAR TURBULENT INCOMPRESSIBLE AND COMPRESSIBLE FLOWS

A COUPLED FINITE VOLUME SOLVER FOR THE SOLUTION OF LAMINAR TURBULENT INCOMPRESSIBLE AND COMPRESSIBLE FLOWS A COUPLED FINITE VOLUME SOLVER FOR THE SOLUTION OF LAMINAR TURBULENT INCOMPRESSIBLE AND COMPRESSIBLE FLOWS L. Mangani Maschinentechnik CC Fluidmechanik und Hydromaschinen Hochschule Luzern Technik& Architektur

More information

Unstructured Mesh Generation for Implicit Moving Geometries and Level Set Applications

Unstructured Mesh Generation for Implicit Moving Geometries and Level Set Applications Unstructured Mesh Generation for Implicit Moving Geometries and Level Set Applications Per-Olof Persson (persson@mit.edu) Department of Mathematics Massachusetts Institute of Technology http://www.mit.edu/

More information

Eulerian Techniques for Fluid-Structure Interactions - Part II: Applications

Eulerian Techniques for Fluid-Structure Interactions - Part II: Applications Published in Lecture Notes in Computational Science and Engineering Vol. 103, Proceedings of ENUMATH 2013, pp. 755-762, Springer, 2014 Eulerian Techniques for Fluid-Structure Interactions - Part II: Applications

More information

PRACE Workshop, Worksheet 2

PRACE Workshop, Worksheet 2 PRACE Workshop, Worksheet 2 Stockholm, December 3, 2013. 0 Download files http://csc.kth.se/ rvda/prace files ws2.tar.gz. 1 Introduction In this exercise, you will have the opportunity to work with a real

More information

EDICT for 3D computation of two- uid interfaces q

EDICT for 3D computation of two- uid interfaces q Comput. Methods Appl. Mech. Engrg. 190 (2000) 403±410 www.elsevier.com/locate/cma EDICT for 3D computation of two- uid interfaces q Tayfun E. Tezduyar a, *, Shahrouz Aliabadi b a Mechanical Engineering

More information

Computational Study of Laminar Flowfield around a Square Cylinder using Ansys Fluent

Computational Study of Laminar Flowfield around a Square Cylinder using Ansys Fluent MEGR 7090-003, Computational Fluid Dynamics :1 7 Spring 2015 Computational Study of Laminar Flowfield around a Square Cylinder using Ansys Fluent Rahul R Upadhyay Master of Science, Dept of Mechanical

More information

Calculate a solution using the pressure-based coupled solver.

Calculate a solution using the pressure-based coupled solver. Tutorial 19. Modeling Cavitation Introduction This tutorial examines the pressure-driven cavitating flow of water through a sharpedged orifice. This is a typical configuration in fuel injectors, and brings

More information

ISSN(PRINT): ,(ONLINE): ,VOLUME-1,ISSUE-1,

ISSN(PRINT): ,(ONLINE): ,VOLUME-1,ISSUE-1, NUMERICAL ANALYSIS OF THE TUBE BANK PRESSURE DROP OF A SHELL AND TUBE HEAT EXCHANGER Kartik Ajugia, Kunal Bhavsar Lecturer, Mechanical Department, SJCET Mumbai University, Maharashtra Assistant Professor,

More information

Using Semi-Regular 4 8 Meshes for Subdivision Surfaces

Using Semi-Regular 4 8 Meshes for Subdivision Surfaces Using Semi-Regular 8 Meshes for Subdivision Surfaces Luiz Velho IMPA Instituto de Matemática Pura e Aplicada Abstract. Semi-regular 8 meshes are refinable triangulated quadrangulations. They provide a

More information

Modeling Unsteady Compressible Flow

Modeling Unsteady Compressible Flow Tutorial 4. Modeling Unsteady Compressible Flow Introduction In this tutorial, FLUENT s density-based implicit solver is used to predict the timedependent flow through a two-dimensional nozzle. As an initial

More information