Assembly-Free Large-Scale Modal Analysis on the GPU Praveen Yadav, Krishnan Suresh

Size: px
Start display at page:

Download "Assembly-Free Large-Scale Modal Analysis on the GPU Praveen Yadav, Krishnan Suresh"

Transcription

1 Assembly-Free Large-Scale odal Analysis on the GPU Praveen Yadav, rishnan Suresh Department of echanical Engineering, UW-adison, adison, Wisconsin 53706, USA Abstract Popular eigen-solvers such as block-lanczos require repeated inversion of an eigen-matrix. his is a bottleneck in large-scale modal problems with millions of degrees of freedom. On the other hand, the classic Rayleigh-Ritz conjugate gradient method only requires a matrix-vector multiplication, and is therefore potentially scalable to such problems. However, as is well-known, the Rayleigh-Ritz has serious numerical deficiencies, and has largely been abandoned by the finite element community. In this paper, we address these deficiencies through subspace augmentation, and consider a subspace augmented Rayleigh-Ritz conjugate gradient method (SaRCG). SaRCG is numerically stable and does not entail explicit inversion. As a specific application, we consider the modal analysis of geometrically complex structures discretized via non-conforming voxels. he resulting large-scale eigen-problems are then solved via SaRCG. he voxelization structure is also exploited to render the underlying matrix-vector multiplication assembly-free. he implementation of SaRCG on multi-core CPUs, and graphics-programmable-units (GPUs) is discussed, followed by numerical experiments and case-studies.. INRODUCION Consider the modal analysis of a thin geometrically complex structure illustrated in Figure. he first step in such an analysis is the construction of a suitable finite-element mesh. Figure : A thin gear-housing whose eigen-spectrum is desired. hin geometrically complex structures require a small element size resulting in a large scale finite-element problem. For example, for the above structures, after numerous failed attempts, we were able to create the tetrahedral mesh illustrated in Figure, with over 750,000 degrees of freedom. Figure : Complex thin-structures require a finemesh. his leads to the eigen-problem: x= λx (.) where & are the stiffness and mass matrices associated with the finite-element discretization. he above problem has been well-studied, and numerous implementations exist [ 3]. When & are large, practical limitations arise. Specifically, popular eigensolvers such as block-lanczos require explicit and repeated inversion of a large-size eigen-matrix (see Section ), and the memory requirements and computational time grows quadratically in such methods. On the other hand, the classic Rayleigh-Ritz conjugate gradient method for solving Equation (.) only requires an implementation of sparse matrix-vector multiplication. he memory requirements are low, and the computational complexity depends largely on the efficiency of the matrix-vector multiplication. However, as numerical experiments indicate, the Rayleigh-Ritz method lacks robustness, and can exhibit poor convergence; consequently, it has largely been abandoned by the finite element community.

2 In this paper, we revisit this method, and address the limitations through subspace augmentation. he resulting subspace augmented Rayleigh-Ritz conjugate gradient (SaRCG) method is highly robust, exhibits fast convergence, and is easily parallelizable. An assembly-free implementation of SaRCG on multicore and GPU architectures is discussed.. LIERAURE REVIEW In this Section, we briefly review popular strategies for solving the generalized eigen-value problem: x= λx (.) For the remainder of the paper, we assume that and are sparse, symmetric, real, and positive definite. One of the most popular methods today to solve Equation (.) is the block-form of the shift-andinvert Lanczos algorithm, also referred as the block- Lanczos method [], [4], [5]. he block-lanczos method requires repeated inversion of an eigenmatrix σ, where the parameter σ is determined during iterations of the method. Since and are large, and the parameter σ varies during the iteration, explicit LU factorization of σ is not desirable. An alternate strategy is to factorize this matrix through preconditioned iterative solvers. However, as emphasized in [], [6], early termination of iterative solvers can lead to a significant loss of accuracy in block-lanczos, requiring careful attention to termination. In [6], the classic block-lanczos algorithm was instead modified to eliminate the need for an explicit decomposition. Specifically, the inversion was carried out in an approximate sense over a rylov sub-space. he method discussed in this paper borrows this concept, but in a different context. In [], the authors rely on Algebraic ultigrid as a pre-conditioner for factorizing the shifted matrix. In addition, the authors also implement and compare a variety of alternate algorithms including locally optimal block preconditioned conjugate gradient, Davidson-Jacobi, etc. ore importantly, the authors demonstrate that, for large-scale eigen-value problems, such alternate algorithms can be competitive. his is a key motivation for the present paper. One such alternate algorithm is the Rayleigh-Ritz conjugate gradient (RCG) where the eigen-mode problem is posed and solved as a minimization problem (see next Section). Since the RCG method can be slow to converge, the authors in [7] implement a parallel version of RCG with factorized sparse approximate inverse as a pre-conditioner. Further, in [8], a locally-optimal variation of RCG was proposed. he essential idea is to expand the search-space during each step of CG in a locally-optimal sense. We do not implement this variation here, but the proposed method can be modified to include localoptimality. Besides block-lanczos and RCG, there are dozens of methods including Jacobi-Davidson [9] and inverseiteration [0] that do not offer significant advantages over RCG or block-lanczos, and are therefore not pursued here. 3. Rayleigh-Ritz Conjugate Gradient As stated earlier, the Rayleigh-Ritz conjugate gradient (RCG) algorithm [], [] only requires an efficient implementation of sparse matrix-vector multiplication. It therefore exhibits numerous advantages including simplicity, low memory requirements, and significant scope for parallelism. 3. Rayleigh Quotient Concept A key concept behind RCG is the Rayleigh quotient of an arbitrary vector x : x x ϕ ( x) = (3.) x x If the vector x happens to be an eigen-vector of (, ), then the Rayleigh quotient is the corresponding eigen-value. hus, by minimizing the Rayleigh quotient, one can compute the lowest eigenpair, i.e., the eigen-value problem can be posed as a minimization problem: x x x in x x (3.) 3. Computing the Lowest ode hus, one can deploy, for example, the nonlinear conjugate gradient method [3] to find the lowest eigen-mode. owards this end, observe that the gradient of Equation (3.) is given by: where: x ϕ( xx ) gx ( ) = x x (3.3) = x x (3.4) Exploiting the classic conjugate gradient algorithm [3], we have the Rayleigh-Ritz conjugate gradient (RCG; Version ): Rayleigh-Ritz Conjugate Gradient (Version ): (). Initialize x 0 such that (0) (). Setp = 0, β = and k= () x = 3. Compute ϕ ( x ) via Equation (3.), and the gradient g via Equation (3.3)

3 4. he conjugate search direction is given by: p = g + β p ( k ) 5. Find the step length ( k ) 6. Let y + = x + δ p and 7. If x = y / y ( k+ ) ( k+ ) ( k+ ) δ as described in [4] g, terminate; else, increment k, and go to step 3 Observe that the RCG algorithm only requires matrixvector multiplications: v and v, making it a simple algorithm to implement. 3.3 Computing ultiple odes o compute higher modes, observe that that if ( λ, x ) and ( λ, x ) are two distinct modes, and if λ λ, then they must satisfy -orthogonality: x x = 0 (3.5) Further, if λ = λ, one can always find a pair of eigen-modes that satisfy the above equation. hus, to find the second eigen-mode, we pose a constrained minimization problem: x x in x x x st.. x x = 0 (3.6) Given an arbitrary vector x, one can enforce - orthogonality via: ( ) x= x x x x (3.7) o ensure -orthogonality during the iteration process, it is necessary and sufficient if: () the initial random vector is -orthogonal to x, and () the search directions ( k ) p are -orthogonal to x. hus, suppose the first (m-)-modes have been computed: X = { x, x,..., x } (3.8) m o compute the next mode, one must solve the constrained minimization problem: x x in x x x st.. x X= 0 where -orthogonality is enforced via: m i= ( ) i i (3.9) x= x x x x (3.0) his leads to the following RCG algorithm (Version ) for computing the m th pair ( λ, x ) []. m m Rayleigh-Ritz Conjugate Gradient (Version ):. Suppose m- eigen-modes X= { x, x,..., x } m have been computed. (). Initialize x 0 such that x X= 0 (0) () 3. Setp = 0, β = and k= () x = and 4. Compute ϕ ( x ) via Equation (3.), and the gradient g via Equation (3.3) ( k ) 5. Letp = g + β p (preliminary direction) 6. Construct an -orthogonal direction Equation (3.0) 7. Find the step length ( k ) ( k ) ( k ) ( k ) 8. Let y + = x + δ p and 9. If x = y / y ( k+ ) ( k+ ) ( k+ ) ( k ) p via δ as described in [4] g, terminate; else, increment k, and go to step Subspace Augmentation here are three fundamental limitations of the above RCG algorithm; these are confirmed later through numerical experiments. o the best of our knowledge, these limitations have not been identified/ emphasized in the literature:. issing modes: As we sweep through the eigenspectrum, one or more eigen-modes may go undetected, especially in the case of near-by eigenvalues. his can be attributed to the fact that RCG attempts to find local minima, and presence of multiple minima that are close-by can be detrimental to the algorithm.. Erroneous Results: A large value for (in step- 9 of RCG, version ) can leads to erroneous results; 0 typically ~ 0 is essential Slow convergence: On the other hand, ~ 0 leads to thousands of iterations (for each eigenmode), especially for ill-conditioned matrices. Fortunately, all three problems can be addressed by relying on subspace projection methods. Such projection methods are fairly common in modern eigen-solvers [3]. In particular, the method proposed

4 here is similar to the generic method of subspace projection discussed in []. Specifically, let us assume that, at the end of k iterations, we have a set of approximate eigen-vectors ( k x ) ( i=,,..., m) computed via early termination of i RCG (say ~ 0.0 ). A sub-space is then defined via: S = { x, x,..., x } m (3.) Next, a reduced stiffness and mass matrices are constructed over this sub-subspace as follows: = S S ( k ) = S S ( k ) (3.) Observe that computing these matrices only requires sparse matrix-vector multiplications. he two matrices are then used to solve a smaller eigen-value problem (exactly) via: V = V Λ (3.3) Finally, a sharpened set of eigen-vectors are recovered via: ( k ) X + = S V (3.4) where: ( k+ ) ( k+ ) ( k+ ) ( k+ ) X = { x, x,..., x } m (3.5) One can now restart the RCG method with the eigenvectors in Equation (3.5) as the starting vectors. his leads to the following subspace augmented Rayleigh- Ritz conjugate gradient algorithm (SaRCG), summarized below. Subspace Augmented Rayleigh-Ritz Conjugate Gradient (SaRCG): () () () (). Initialize X = { x, x,..., x } (typically m random vectors). Set k= ( ). Compute x k ( ) with x k as the starting vectors until i i g ( ~ 0.0 ), for i=,,..., m, via RCG (version ). 3. Construct the reduced matrices via Equation (3.), where S is defined in Equation (3.), and solve Equation (3.3) to find Λ and V. 4. If convergence in Λ has not been achieved, construct the updated eigen-vectors ( k ) X + via Equation (3.4), increment k and go to step. his simple extension of the RCG algorithm dramatically improves its robustness and convergence. he additional computational cost of constructing and solving the reduced eigen-value problem is negligible. In other words, the primary cost in SaRCG, is the sparse matrix-vector multiplication (SpV). herefore, an efficient implementation of SpV is critical. 3.5 Voxelization he proposed SaRCG method is applicable to any finite-element discretization, including the tetrahedral discretization of Figure. However, in this paper, we consider a simple discretization, where the geometry is approximated via uniform cubes or voxels ; the voxel-approach has gained significant popularity recently [5]. Since the voxelization need not conform to the geometry, it is robust, fast and relatively insensitive to the geometric complexity. Further, unlike in stress analysis, where a conforming mesh is essential, in eigen-mode analysis, a nonconforming voxelization is often sufficient, especially during the early stage of design. his is confirmed through numerical experiments later in the paper. he voxelization of the geometry in Figure is illustrated in Figure 3; it has over a million degrees of freedom. Fortunately, even such a large-size problem is easily handled via the proposed method. Figure 3: Brute-force voxelization of the structure. he voxelization of a triangulated CAD model, also referred to as 3-D scan conversion, is straightforward, and is discussed, for example, see [6]. Voxels that are partially inside the geometry are assigned a density value depending on the number of corner nodes that lie inside the geometry. As the numerical experiments indicate, the overall computational cost for the voxelization is small compared to the cost of computing the eigen-modes. 3.6 Shape Functions Given a voxelization, one can choose a variety of hexahedral finite element shape functions. he simplest are tri-linear shape functions as described in [7], where each node-based shape function is of the form: N = 0.5( + ξξ)( + ηη)( + ζζ); i=...8 (3.6) i i i i However, the resulting 8-noded elements are stiff, and convergence is slow especially for higher order modes, as illustrated through numerical experiments. One could use 0-node or 7-node elements, but these increase the memory requirements significantly.

5 In the context of static problems the authors of [8] instead propose the use of three additional bubblefunctions of the form: ( ξ, η, ζ) = ( ξ ) ( ξ, η, ζ) = ( η ) 3( ξ, η, ζ) = ( ζ ) his resulting stiffness matrix will be form: where = e = N D NdΩ = N D dω = D NdΩ = D dω (3.7) (3.8) (3.9) Fortunately, one can condense out the bubble degrees of freedom, resulting again in a reduced 4 degrees of freedom element stiffness matrix [8]: = ( \ ) (3.0) e While the authors in [8] demonstrate the merit of this formulation for static problems, we have adopted these bubble functions for eigen-problems. Numerical experiments indicate bubble functions are highly effective in eigen-problems as well. 3.7 Assembly-Free SpV he primary computational cost of sparse matrixvector multiplication (SpV), in today s computational architecture, is memory-access, i.e., floating-point operations are essentially free []. In SpV, there are two types of memory access: () ( ) accessing elements of x k, and () accessing elements i of and. here is very little one can do about the former, except perhaps careful numbering of the mesh nodes. However, the cost of accessing elements of and can be dramatically reduced through assembly-free methods as discussed below. Since the geometry is discretized via uniform voxels, all elements are geometrically identical. herefore all element matrices are identical (barring a density assigned to partial voxels). hus, one need not assemble or store the global and matrices; it is sufficient if a single matrix-pair and is stored. e e Consequently, a matrix-free implementation ofv, for example, may be implemented as follows [9]: x= x= x e ( e e) (3.) e e In other words, instead of assembling the matrix, and then carrying out x, we first multiply x and e e then assemble the result. 3.8 CPU and GPU Implementations In the CPU implementation of the assembly-free SpV, parallelization was attained through OpenP commands ( In the GPU implementation using CUDA [0] on NVidia cards, SpV was implemented by assigning each node of the voxel-grid to a scalar processor. Associated with each voxel node, there are at most 8 voxel elements, and 7 neighboring nodes. When a block of threads are launched, all threads share identical element matrices, and therefore a single memory fetch of the stiffness and mass element matrix per block is sufficient. he three degrees of freedom associated with the 7 nodes are fetched in parallel. In general, during this fetch, it is hard to enforce coalesced memory access for arbitrary voxel-meshes. For more recent GPU cards, we have observed that this is not a serious issue. Finally, many of the low-level operations such as vector-product were implemented using CUBLAS library. 4. NUERICAL EXPERIENS In this Section, we present results from numerical experiments based on the RCG and SaRCG algorithms. All experiments were conducted on a Windows-7 64-bit machine with the following hardware: Intel I7 960 CPU quad-core running at 3.GHz with 6 GB of memory; parallelization of CPU code was implemented through Openp commands. he graphics programmable unit (GPU) is an NVidia GeForce GX 480 (480 cores) with.5 GB. Both the CPU and GPU were configured to run in double-precision. 4. Importance of Bubble Functions In this experiment, we illustrate the value added by the use of bubble functions discussed earlier. Specifically, a short cantilevered beam of dimension x x 5, fixed on end is used in this experiment. In Figure 4, the first 4 eigen-modes are illustrated; these were computed using a quadratic tetrahedral mesh with 45,000 DOF, in SolidWorks []. () 3 Hz () 3 Hz

6 (3) 478 Hz (4) 74 Hz Figure 4: First four modes computed via SolidWorks. Next using SaRCG, the first eigen-mode was computed for different voxel sizes, with-and-without bubble function. As one can observe, the use of bubble functions dramatically improves the convergence at no additional cost. Similar improvements were observed for higher order modes, and for other casestudies. herefore, for the remainder of the experiments, bubble functions are employed. able : First eigen-value of short cantilevered beam, with-and-without bubble functions. #DOF Without Bubble (Hz) %error With Bubble (Hz) %error Deficiency of RCG Next, we illustrate the deficiency of RCG. Using the same cantilevered beam example, we computed the first four eigen-modes via RCG for different tolerances values (see Version of RCG algorithm) for a fixed voxel size (55,000 DOF). As one can observe in able, even with tight tolerances, RCG can miss some of the critical modes or compute them out of order the classic RCG method is not robust. able : First 4 eigen-value of beam via RCG. ode- ode- ode-3 ode-4 8 = = = = On the other hand, SaRCG converges to the correct set of eigen-modes, even with a coarse tolerance of 4 = 0, as summarized in able 3. able 3: First 4 eigen-values of beam via SaRCG. ode- ode- ode-3 ode-4 = Beam: Conforming vs. Non-Conforming A premise of this paper is that, for approximate eigenmode analysis, a non-conforming mesh may be sufficient. We illustrate this through the following experiment. he beam from the previous example was rotated about its longest axis by an arbitrary angle of degrees, and the resulting geometry was discretized via a non-conforming set of voxels (see Figure 5). hen the first four eigen-modes were computed for increasing density of the mesh, using SaRCG with 4 bubble-mesh, and a tolerance of = 0. Figure 5: Non-conforming voxelization of a beam. able 4 summarizes the results of this experiment. Clearly, non-conforming meshes are never as accurate as conforming meshes (for a fixed grid-size). However, the inaccuracy can be reduced through brute-force computation. #DOF able 4: First eigen-value for cylindrical beam cantilever using SaCG with tolerance of 0 -. GPU ime (sec) ode- ode- ode-3 ode Crank-Rod: A Practical Example A more realistic example is that of a crank-rod illustrated in Figure 6 that is clamped on the innersurface of the smaller cylindrical hole. he first eigenmode was computed using SolidWorks with default mesh parameters. he resulting conforming (secondorder tetrahedral) mesh contains approximately 90,000 degrees of freedom. he first eigen-mode was computed to be 59.6 Hz; the total computational time including meshing and solver time was approximately 9 seconds. Refinement of the mesh did not change the first eigen-mode significantly in SolidWorks. 4 =

7 Figure 6: First eigen-mode of a connecting-rod at 59.6 Hz In able 5, the computed eigen-value (via SaRCG) as the mesh is refined is summarized. As one can observe, the eigen-value converges to the correct value, and the computational time, is comparable to that of SolidWorks, especially when a GPU such as the GX 480 is used. DOF able 5: First eigen-value for crank-rod. ode- Voxelization time (CPU) Solution time (GPU) Solution time (CPU) 9, Hz 0.07 secs.6 secs.7 secs 5, Hz 0. secs. secs 3.7 secs 80, Hz 0.4 secs 4 secs secs 36, Hz 0.6 secs 7 secs 58 secs 50, Hz. secs 6 secs 5 secs () 34 Hz () 484 Hz (3) 4 Hz (4) 5856 Hz (5) 7595 Hz Figure 8: he first 5 modes computed via SolidWorks. he corresponding five eigen-modes computed via the proposed method are illustrated in Figure 9. he eigen-values are within ~% accuracy, and the computational time was approximately 35 seconds, on the GPU, and.5 minutes on the CPU. For the next 0 modes, convergence to within 3~5% was also observed. 4.4 nuckle: Accuracy of First Few Eigen- odes In the next example, we compare the accuracy of SaRCG method for the first five eigen-modes for the knuckle problem illustrated in Figure 6a, where the two horizontal holes are clamped. Also illustrated in Figure 6b and Figure 6c are the confirming and nonconforming meshes respectively, with approximately 50,000 and 50,000 degrees of freedom, respectively. () 393 Hz () 445 Hz (3) 46 Hz Figure 7: (a) A knuckle component, (b) conforming mesh, and (c) voxel-mesh he first five eigen-modes as computed using SolidWorks, are illustrated in Figure 8; the total computational time was approximately 5 seconds. (4) 574 Hz (5) 7534 Hz Figure 9: he first 5 modes computed via proposed method. 4.5 Gear Housing: Robustness he primary advantages of the proposed method are its robustness, simplicity and ability to handle geometrically complex structures. o illustrate, consider the gear housing illustrated earlier in Figure. he first eigen-mode of the structure computed via the proposed method is illustrated in Figure 0.

8 Figure 0: he first mode of the structure in Figure. he convergence of the first eigen-value with voxelsize is summarized in able 6; it demonstrates that one can rapidly estimate the eigen-values in a fully automated fashion. his is particularly important during the early stages of design. able 6: First eigen-value for gear-housing. DOF ode- Solution time (GPU) Solution time (CPU) 50, secs 6 secs 300, secs. mins 45, secs 3.7 mins,000, secs 3 mins 5. CONCLUSION he main contribution of this paper is an assemblyfree implementation of Subspace augmented Rayleigh-Ritz Conjugate Gradient (SaRCG), for computing the eigen-modes of elastic structures. he proposed method is simple, robust and inverse-free. Since the method only requires an implementation of a sparse matrix vector multiplication (SpV), it is highly parallelizable, and can be easily implemented on modern multi-core architectures. Although the method was demonstrated using a non-conforming structured (voxel) mesh, it is equally applicable to classic conforming meshes. 6. REFERENCES [] V. Hernandez, J. E. Roman, A. omas, and V. Vidal, A survey of software for sparse eigenvalue problems., SLEPc echnical Report SR-6. Universidad Politecnica de Valencia, Valencia, Spain [] P. Arbenz, U. L. Hetmaniuk, R. B. Lehoucq, and R. S. uminaro, A Comparison of Eigensolvers for Large-scale 3D odal Analysis using AG-Preconditioned Iterative ethods, International Journal for Numerical ethods in Engineering, vol. 64, no., pp , 005. [3] Y. Saad, Numerical methods for large eigenvalue problems, nd ed. anchester University Press, 0. [4] R. G. Grimes, J. G. Lewis, and H. D. Simon, A Shifted Block Lanczos Algorithm for Solving Sparse Symmetric Generalized Eigenproblems, SIA Journal on atrix Analysis and Applications, vol. 5, no., p. 8, 994. [5] D. C. Sorensen, Numerical methods for large eigenvalue problems, ACA NUERICA, vol., pp , 00. [6] G. H. Golub and Q. Ye, An Inverse Free Preconditioned rylov Subspace ethod for Symmetric Generalized Eigenvalue Problems, SIA Journal on Scientific Computing, vol. 4, no., pp , 00. [7] L. Bergamaschi, Á. artínez, and G. Pini, Parallel preconditioned conjugate gradient optimization of the Rayleigh quotient for the solution of sparse eigenproblems, Applied mathematics and computation, vol. 75, no., p. 964, 006. [8] A. V. nyazev, oward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient ethod, SIA Journal on Scientific Computing, vol. 3, no., pp , 00. [9] G. L. G. Sleijpen and H. A. Van der Vorst, A Jacobi-Davidson Iteration ethod for Linear Eigenvalue Problems, SIA Journal on atrix Analysis and Applications, vol. 7, no., pp , 996. [0] I. C. F. Ipsen, Computing an Eigenvector with Inverse Iteration, SIA REVIEW, vol. 39, no., pp. 54 9, 997. [] H.-J. Jang, Preconditioned Conjugate Gradient ethod for Large Generalized Eigenproblems, rends in athematics Information Center for athematical Sciences, vol. 4, no., pp , 00. [] Y.. Feng and D. R. J. Owen, Conjugate Gradient ethods for Solving the Smallest Eigenpair of Large Symmetric Eigenvalue Problems, International Journal for Numerical ethods in Engineering, vol. 39, no. 3, pp , 996. [3] J., Wright, S. Nocedal, Numerical optimization. New York: Springer Science + Business edia, 006. [4] H. Yang, Conjugate Gradient ethods for the Rayleigh Quotient inimization of Generalized Eigenvalue Problems, Computing -Wein-, vol. 5, no., pp , 993. [5] A. Duster, J. Parvizian, Z. Yang, and E. Rank, he Finite Cell ethod for 3D problems of solid mechanics., Computer ethods in Applied echanics and Engineering, vol. 97, pp , 008. [6] E. A. arabassi, G. Papaioannou, and. heoharis, A Fast Depth-Buffer-Based

9 Voxelization Algorithm, Journal of Graphics ools, vol. 4, no. 4, 999. [7] O. C. Zienkiewicz, he Finite Element ethod for Solid and Structural echanics. Elsevier, 005. [8] H. H. aiebat and J. P. Carter, hree- Dimensional Non-Conforming Elements, Centre for Geotechnical Research, he University of Sydney, Sydney, R808, 00. [9] C. E. Augarde, A. Ramage, and J. Staudacher, An element-based displacement preconditioner for linear elasticity problems, Computers and Structures, vol. 84, no. 3 3, pp , 006. [0] NVIDIA Corporation, NVIDIA CUDA: Compute Unified Device Architecture, Programming Guide. Santa Clara.:, 008. [] SolidWorks, SolidWorks;

A Deflated Assembly-Free Approach to Large-Scale Implicit Structural Dynamics

A Deflated Assembly-Free Approach to Large-Scale Implicit Structural Dynamics A Deflated Assembly-Free Approach to Large-Scale Implicit Structural Dynamics Amir M. Mirzendehdel rishnan Suresh suresh@engr.wisc.edu Department of Mechanical Engineering UW-Madison, Madison, Wisconsin

More information

Krishnan Suresh Associate Professor Mechanical Engineering

Krishnan Suresh Associate Professor Mechanical Engineering Large Scale FEA on the GPU Krishnan Suresh Associate Professor Mechanical Engineering High-Performance Trick Computations (i.e., 3.4*1.22): essentially free Memory access determines speed of code Pick

More information

Ilya Lashuk, Merico Argentati, Evgenii Ovtchinnikov, Andrew Knyazev (speaker)

Ilya Lashuk, Merico Argentati, Evgenii Ovtchinnikov, Andrew Knyazev (speaker) Ilya Lashuk, Merico Argentati, Evgenii Ovtchinnikov, Andrew Knyazev (speaker) Department of Mathematics and Center for Computational Mathematics University of Colorado at Denver SIAM Conference on Parallel

More information

Performance of Implicit Solver Strategies on GPUs

Performance of Implicit Solver Strategies on GPUs 9. LS-DYNA Forum, Bamberg 2010 IT / Performance Performance of Implicit Solver Strategies on GPUs Prof. Dr. Uli Göhner DYNAmore GmbH Stuttgart, Germany Abstract: The increasing power of GPUs can be used

More information

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications ME964 High Performance Computing for Engineering Applications Outlining Midterm Projects Topic 3: GPU-based FEA Topic 4: GPU Direct Solver for Sparse Linear Algebra March 01, 2011 Dan Negrut, 2011 ME964

More information

ASSEMBLY-FREE BUCKLING ANALYSIS FOR TOPOLOGY OPTIMIZATION

ASSEMBLY-FREE BUCKLING ANALYSIS FOR TOPOLOGY OPTIMIZATION Proceedings of the ASME 015 International Design Engineering echnical Conferences & Computers and Information in Engineering Conference IDEC/CIE 015 August August 5, 015, Boston, MA, USA DEC015-46351 ASSEMBLY-FREE

More information

Challenge Problem 5 - The Solution Dynamic Characteristics of a Truss Structure

Challenge Problem 5 - The Solution Dynamic Characteristics of a Truss Structure Challenge Problem 5 - The Solution Dynamic Characteristics of a Truss Structure In the final year of his engineering degree course a student was introduced to finite element analysis and conducted an assessment

More information

Large Scale Finite Element Analysis via Assembly-Free Deflated Conjugate Gradient

Large Scale Finite Element Analysis via Assembly-Free Deflated Conjugate Gradient Abstract Large Scale Finite Element Analysis via Assembly-Free Deflated Conjugate Gradient Praveen Yadav, Krishnan Suresh suresh@engr.wisc.edu Department of Mechanical Engineering, UW-Madison, Madison,

More information

FEA Model Updating Using SDM

FEA Model Updating Using SDM FEA l Updating Using SDM Brian Schwarz & Mark Richardson Vibrant Technology, Inc. Scotts Valley, California David L. Formenti Sage Technologies Santa Cruz, California ABSTRACT In recent years, a variety

More information

Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs

Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,

More information

SLEPc: Scalable Library for Eigenvalue Problem Computations

SLEPc: Scalable Library for Eigenvalue Problem Computations SLEPc: Scalable Library for Eigenvalue Problem Computations Jose E. Roman Joint work with A. Tomas and E. Romero Universidad Politécnica de Valencia, Spain 10th ACTS Workshop - August, 2009 Outline 1 Introduction

More information

17. SEISMIC ANALYSIS MODELING TO SATISFY BUILDING CODES

17. SEISMIC ANALYSIS MODELING TO SATISFY BUILDING CODES 17. SEISMIC ANALYSIS MODELING TO SATISFY BUILDING CODES The Current Building Codes Use the Terminology: Principal Direction without a Unique Definition 17.1 INTRODUCTION { XE "Building Codes" }Currently

More information

Contents. I The Basic Framework for Stationary Problems 1

Contents. I The Basic Framework for Stationary Problems 1 page v Preface xiii I The Basic Framework for Stationary Problems 1 1 Some model PDEs 3 1.1 Laplace s equation; elliptic BVPs... 3 1.1.1 Physical experiments modeled by Laplace s equation... 5 1.2 Other

More information

ESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report

ESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report ESPRESO ExaScale PaRallel FETI Solver Hybrid FETI Solver Report Lubomir Riha, Tomas Brzobohaty IT4Innovations Outline HFETI theory from FETI to HFETI communication hiding and avoiding techniques our new

More information

3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs

3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs 3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs H. Knibbe, C. W. Oosterlee, C. Vuik Abstract We are focusing on an iterative solver for the three-dimensional

More information

Application of GPU-Based Computing to Large Scale Finite Element Analysis of Three-Dimensional Structures

Application of GPU-Based Computing to Large Scale Finite Element Analysis of Three-Dimensional Structures Paper 6 Civil-Comp Press, 2012 Proceedings of the Eighth International Conference on Engineering Computational Technology, B.H.V. Topping, (Editor), Civil-Comp Press, Stirlingshire, Scotland Application

More information

Lecture 11: Randomized Least-squares Approximation in Practice. 11 Randomized Least-squares Approximation in Practice

Lecture 11: Randomized Least-squares Approximation in Practice. 11 Randomized Least-squares Approximation in Practice Stat60/CS94: Randomized Algorithms for Matrices and Data Lecture 11-10/09/013 Lecture 11: Randomized Least-squares Approximation in Practice Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these

More information

Application of Finite Volume Method for Structural Analysis

Application of Finite Volume Method for Structural Analysis Application of Finite Volume Method for Structural Analysis Saeed-Reza Sabbagh-Yazdi and Milad Bayatlou Associate Professor, Civil Engineering Department of KNToosi University of Technology, PostGraduate

More information

FOR P3: A monolithic multigrid FEM solver for fluid structure interaction

FOR P3: A monolithic multigrid FEM solver for fluid structure interaction FOR 493 - P3: A monolithic multigrid FEM solver for fluid structure interaction Stefan Turek 1 Jaroslav Hron 1,2 Hilmar Wobker 1 Mudassar Razzaq 1 1 Institute of Applied Mathematics, TU Dortmund, Germany

More information

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,

More information

Study and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou

Study and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou Study and implementation of computational methods for Differential Equations in heterogeneous systems Asimina Vouronikoy - Eleni Zisiou Outline Introduction Review of related work Cyclic Reduction Algorithm

More information

Computational Fluid Dynamics - Incompressible Flows

Computational Fluid Dynamics - Incompressible Flows Computational Fluid Dynamics - Incompressible Flows March 25, 2008 Incompressible Flows Basis Functions Discrete Equations CFD - Incompressible Flows CFD is a Huge field Numerical Techniques for solving

More information

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND Student Submission for the 5 th OpenFOAM User Conference 2017, Wiesbaden - Germany: SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND TESSA UROIĆ Faculty of Mechanical Engineering and Naval Architecture, Ivana

More information

CHAPTER 1. Introduction

CHAPTER 1. Introduction ME 475: Computer-Aided Design of Structures 1-1 CHAPTER 1 Introduction 1.1 Analysis versus Design 1.2 Basic Steps in Analysis 1.3 What is the Finite Element Method? 1.4 Geometrical Representation, Discretization

More information

Abstract. Introduction. Kevin Todisco

Abstract. Introduction. Kevin Todisco - Kevin Todisco Figure 1: A large scale example of the simulation. The leftmost image shows the beginning of the test case, and shows how the fluid refracts the environment around it. The middle image

More information

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI 1 Akshay N. Panajwar, 2 Prof.M.A.Shah Department of Computer Science and Engineering, Walchand College of Engineering,

More information

midas FEA V320 Table of Contents

midas FEA V320 Table of Contents midas FEA V320 New Feature 01 CFD Moving Mesh Pre Process 01 File Open > File Preview 02 Extract Surface 03 Frame to Solid > Import 04 Tetra Auto mesh generation 05 Mesh options 06 Auto mesh Solid Function

More information

Lecture 15: More Iterative Ideas

Lecture 15: More Iterative Ideas Lecture 15: More Iterative Ideas David Bindel 15 Mar 2010 Logistics HW 2 due! Some notes on HW 2. Where we are / where we re going More iterative ideas. Intro to HW 3. More HW 2 notes See solution code!

More information

Effect of Mesh Size of Finite Element Analysis in Modal Analysis for Periodic Symmetric Struts Support

Effect of Mesh Size of Finite Element Analysis in Modal Analysis for Periodic Symmetric Struts Support Key Engineering Materials Online: 2011-01-20 ISSN: 1662-9795, Vols. 462-463, pp 1008-1012 doi:10.4028/www.scientific.net/kem.462-463.1008 2011 Trans Tech Publications, Switzerland Effect of Mesh Size of

More information

1.2 Numerical Solutions of Flow Problems

1.2 Numerical Solutions of Flow Problems 1.2 Numerical Solutions of Flow Problems DIFFERENTIAL EQUATIONS OF MOTION FOR A SIMPLIFIED FLOW PROBLEM Continuity equation for incompressible flow: 0 Momentum (Navier-Stokes) equations for a Newtonian

More information

Analysis of the Adjoint Euler Equations as used for Gradient-based Aerodynamic Shape Optimization

Analysis of the Adjoint Euler Equations as used for Gradient-based Aerodynamic Shape Optimization Analysis of the Adjoint Euler Equations as used for Gradient-based Aerodynamic Shape Optimization Final Presentation Dylan Jude Graduate Research Assistant University of Maryland AMSC 663/664 May 4, 2017

More information

Multi Layer Perceptron trained by Quasi Newton learning rule

Multi Layer Perceptron trained by Quasi Newton learning rule Multi Layer Perceptron trained by Quasi Newton learning rule Feed-forward neural networks provide a general framework for representing nonlinear functional mappings between a set of input variables and

More information

B. Tech. Project Second Stage Report on

B. Tech. Project Second Stage Report on B. Tech. Project Second Stage Report on GPU Based Active Contours Submitted by Sumit Shekhar (05007028) Under the guidance of Prof Subhasis Chaudhuri Table of Contents 1. Introduction... 1 1.1 Graphic

More information

calibrated coordinates Linear transformation pixel coordinates

calibrated coordinates Linear transformation pixel coordinates 1 calibrated coordinates Linear transformation pixel coordinates 2 Calibration with a rig Uncalibrated epipolar geometry Ambiguities in image formation Stratified reconstruction Autocalibration with partial

More information

arxiv: v1 [math.na] 26 Jun 2014

arxiv: v1 [math.na] 26 Jun 2014 for spectrally accurate wave propagation Vladimir Druskin, Alexander V. Mamonov and Mikhail Zaslavsky, Schlumberger arxiv:406.6923v [math.na] 26 Jun 204 SUMMARY We develop a method for numerical time-domain

More information

Finite element algorithm with adaptive quadtree-octree mesh refinement

Finite element algorithm with adaptive quadtree-octree mesh refinement ANZIAM J. 46 (E) ppc15 C28, 2005 C15 Finite element algorithm with adaptive quadtree-octree mesh refinement G. P. Nikishkov (Received 18 October 2004; revised 24 January 2005) Abstract Certain difficulties

More information

Example 24 Spring-back

Example 24 Spring-back Example 24 Spring-back Summary The spring-back simulation of sheet metal bent into a hat-shape is studied. The problem is one of the famous tests from the Numisheet 93. As spring-back is generally a quasi-static

More information

Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards

Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards By Allan P. Engsig-Karup, Morten Gorm Madsen and Stefan L. Glimberg DTU Informatics Workshop

More information

Report of Linear Solver Implementation on GPU

Report of Linear Solver Implementation on GPU Report of Linear Solver Implementation on GPU XIANG LI Abstract As the development of technology and the linear equation solver is used in many aspects such as smart grid, aviation and chemical engineering,

More information

Optimizing Data Locality for Iterative Matrix Solvers on CUDA

Optimizing Data Locality for Iterative Matrix Solvers on CUDA Optimizing Data Locality for Iterative Matrix Solvers on CUDA Raymond Flagg, Jason Monk, Yifeng Zhu PhD., Bruce Segee PhD. Department of Electrical and Computer Engineering, University of Maine, Orono,

More information

Adaptive numerical methods

Adaptive numerical methods METRO MEtallurgical TRaining On-line Adaptive numerical methods Arkadiusz Nagórka CzUT Education and Culture Introduction Common steps of finite element computations consists of preprocessing - definition

More information

A simple algorithm for the inverse field of values problem

A simple algorithm for the inverse field of values problem A simple algorithm for the inverse field of values problem Russell Carden Department of Computational and Applied Mathematics, Rice University, Houston, TX 77005-183, USA E-mail: Russell.L.Carden@rice.edu

More information

Parallel High-Order Geometric Multigrid Methods on Adaptive Meshes for Highly Heterogeneous Nonlinear Stokes Flow Simulations of Earth s Mantle

Parallel High-Order Geometric Multigrid Methods on Adaptive Meshes for Highly Heterogeneous Nonlinear Stokes Flow Simulations of Earth s Mantle ICES Student Forum The University of Texas at Austin, USA November 4, 204 Parallel High-Order Geometric Multigrid Methods on Adaptive Meshes for Highly Heterogeneous Nonlinear Stokes Flow Simulations of

More information

High-Frequency Algorithmic Advances in EM Tools for Signal Integrity Part 1. electromagnetic. (EM) simulation. tool of the practic-

High-Frequency Algorithmic Advances in EM Tools for Signal Integrity Part 1. electromagnetic. (EM) simulation. tool of the practic- From January 2011 High Frequency Electronics Copyright 2011 Summit Technical Media, LLC High-Frequency Algorithmic Advances in EM Tools for Signal Integrity Part 1 By John Dunn AWR Corporation Only 30

More information

Graph drawing in spectral layout

Graph drawing in spectral layout Graph drawing in spectral layout Maureen Gallagher Colleen Tygh John Urschel Ludmil Zikatanov Beginning: July 8, 203; Today is: October 2, 203 Introduction Our research focuses on the use of spectral graph

More information

The Immersed Interface Method

The Immersed Interface Method The Immersed Interface Method Numerical Solutions of PDEs Involving Interfaces and Irregular Domains Zhiiin Li Kazufumi Ito North Carolina State University Raleigh, North Carolina Society for Industrial

More information

Fast Radial Basis Functions for Engineering Applications. Prof. Marco Evangelos Biancolini University of Rome Tor Vergata

Fast Radial Basis Functions for Engineering Applications. Prof. Marco Evangelos Biancolini University of Rome Tor Vergata Fast Radial Basis Functions for Engineering Applications Prof. Marco Evangelos Biancolini University of Rome Tor Vergata Outline 2 RBF background Fast RBF on HPC Engineering Applications Mesh morphing

More information

Approaches to Parallel Implementation of the BDDC Method

Approaches to Parallel Implementation of the BDDC Method Approaches to Parallel Implementation of the BDDC Method Jakub Šístek Includes joint work with P. Burda, M. Čertíková, J. Mandel, J. Novotný, B. Sousedík. Institute of Mathematics of the AS CR, Prague

More information

Accelerating Finite Element Analysis in MATLAB with Parallel Computing

Accelerating Finite Element Analysis in MATLAB with Parallel Computing MATLAB Digest Accelerating Finite Element Analysis in MATLAB with Parallel Computing By Vaishali Hosagrahara, Krishna Tamminana, and Gaurav Sharma The Finite Element Method is a powerful numerical technique

More information

SOLIDWORKS Simulation Avoiding Singularities

SOLIDWORKS Simulation Avoiding Singularities SOLIDWORKS Simulation Avoiding Singularities What is a Singularity? A singularity is a function s divergence into infinity. SOLIDWORKS Simulation occasionally produces stress (or heat flux) singularities.

More information

Revision of the SolidWorks Variable Pressure Simulation Tutorial J.E. Akin, Rice University, Mechanical Engineering. Introduction

Revision of the SolidWorks Variable Pressure Simulation Tutorial J.E. Akin, Rice University, Mechanical Engineering. Introduction Revision of the SolidWorks Variable Pressure Simulation Tutorial J.E. Akin, Rice University, Mechanical Engineering Introduction A SolidWorks simulation tutorial is just intended to illustrate where to

More information

Efficient Use of Iterative Solvers in Nested Topology Optimization

Efficient Use of Iterative Solvers in Nested Topology Optimization Efficient Use of Iterative Solvers in Nested Topology Optimization Oded Amir, Mathias Stolpe and Ole Sigmund Technical University of Denmark Department of Mathematics Department of Mechanical Engineering

More information

Revised Sheet Metal Simulation, J.E. Akin, Rice University

Revised Sheet Metal Simulation, J.E. Akin, Rice University Revised Sheet Metal Simulation, J.E. Akin, Rice University A SolidWorks simulation tutorial is just intended to illustrate where to find various icons that you would need in a real engineering analysis.

More information

Generating 3D Topologies with Multiple Constraints on the GPU

Generating 3D Topologies with Multiple Constraints on the GPU 1 th World Congress on Structural and Multidisciplinary Optimization May 19-4, 13, Orlando, Florida, USA Generating 3D Topologies with Multiple Constraints on the GPU Krishnan Suresh University of Wisconsin,

More information

Stereo Vision. MAN-522 Computer Vision

Stereo Vision. MAN-522 Computer Vision Stereo Vision MAN-522 Computer Vision What is the goal of stereo vision? The recovery of the 3D structure of a scene using two or more images of the 3D scene, each acquired from a different viewpoint in

More information

Journal of Universal Computer Science, vol. 14, no. 14 (2008), submitted: 30/9/07, accepted: 30/4/08, appeared: 28/7/08 J.

Journal of Universal Computer Science, vol. 14, no. 14 (2008), submitted: 30/9/07, accepted: 30/4/08, appeared: 28/7/08 J. Journal of Universal Computer Science, vol. 14, no. 14 (2008), 2416-2427 submitted: 30/9/07, accepted: 30/4/08, appeared: 28/7/08 J.UCS Tabu Search on GPU Adam Janiak (Institute of Computer Engineering

More information

INFLUENCE OF MODELLING AND NUMERICAL PARAMETERS ON THE PERFORMANCE OF A FLEXIBLE MBS FORMULATION

INFLUENCE OF MODELLING AND NUMERICAL PARAMETERS ON THE PERFORMANCE OF A FLEXIBLE MBS FORMULATION INFLUENCE OF MODELLING AND NUMERICAL PARAMETERS ON THE PERFORMANCE OF A FLEXIBLE MBS FORMULATION J. CUADRADO, R. GUTIERREZ Escuela Politecnica Superior, Universidad de La Coruña, Ferrol, Spain SYNOPSIS

More information

Chapter 1 A New Parallel Algorithm for Computing the Singular Value Decomposition

Chapter 1 A New Parallel Algorithm for Computing the Singular Value Decomposition Chapter 1 A New Parallel Algorithm for Computing the Singular Value Decomposition Nicholas J. Higham Pythagoras Papadimitriou Abstract A new method is described for computing the singular value decomposition

More information

PARALLELIZATION OF POTENTIAL FLOW SOLVER USING PC CLUSTERS

PARALLELIZATION OF POTENTIAL FLOW SOLVER USING PC CLUSTERS Proceedings of FEDSM 2000: ASME Fluids Engineering Division Summer Meeting June 11-15,2000, Boston, MA FEDSM2000-11223 PARALLELIZATION OF POTENTIAL FLOW SOLVER USING PC CLUSTERS Prof. Blair.J.Perot Manjunatha.N.

More information

Industrial finite element analysis: Evolution and current challenges. Keynote presentation at NAFEMS World Congress Crete, Greece June 16-19, 2009

Industrial finite element analysis: Evolution and current challenges. Keynote presentation at NAFEMS World Congress Crete, Greece June 16-19, 2009 Industrial finite element analysis: Evolution and current challenges Keynote presentation at NAFEMS World Congress Crete, Greece June 16-19, 2009 Dr. Chief Numerical Analyst Office of Architecture and

More information

QP-Collide: A New Approach to Collision Treatment

QP-Collide: A New Approach to Collision Treatment QP-Collide: A New Approach to Collision Treatment Laks Raghupathi François Faure Co-encadre par Marie-Paule CANI EVASION/GRAVIR INRIA Rhône-Alpes, Grenoble Teaser Video Classical Physical Simulation Advance

More information

Guidelines for proper use of Plate elements

Guidelines for proper use of Plate elements Guidelines for proper use of Plate elements In structural analysis using finite element method, the analysis model is created by dividing the entire structure into finite elements. This procedure is known

More information

Solving Large Complex Problems. Efficient and Smart Solutions for Large Models

Solving Large Complex Problems. Efficient and Smart Solutions for Large Models Solving Large Complex Problems Efficient and Smart Solutions for Large Models 1 ANSYS Structural Mechanics Solutions offers several techniques 2 Current trends in simulation show an increased need for

More information

Module 3 Mesh Generation

Module 3 Mesh Generation Module 3 Mesh Generation 1 Lecture 3.1 Introduction 2 Mesh Generation Strategy Mesh generation is an important pre-processing step in CFD of turbomachinery, quite analogous to the development of solid

More information

Multigrid Solvers in CFD. David Emerson. Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK

Multigrid Solvers in CFD. David Emerson. Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK Multigrid Solvers in CFD David Emerson Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK david.emerson@stfc.ac.uk 1 Outline Multigrid: general comments Incompressible

More information

PROGRAMMING OF MULTIGRID METHODS

PROGRAMMING OF MULTIGRID METHODS PROGRAMMING OF MULTIGRID METHODS LONG CHEN In this note, we explain the implementation detail of multigrid methods. We will use the approach by space decomposition and subspace correction method; see Chapter:

More information

Speedup Altair RADIOSS Solvers Using NVIDIA GPU

Speedup Altair RADIOSS Solvers Using NVIDIA GPU Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair

More information

Adaptive Surface Modeling Using a Quadtree of Quadratic Finite Elements

Adaptive Surface Modeling Using a Quadtree of Quadratic Finite Elements Adaptive Surface Modeling Using a Quadtree of Quadratic Finite Elements G. P. Nikishkov University of Aizu, Aizu-Wakamatsu 965-8580, Japan niki@u-aizu.ac.jp http://www.u-aizu.ac.jp/ niki Abstract. This

More information

HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE

HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER AVISHA DHISLE PRERIT RODNEY ADHISLE PRODNEY 15618: PARALLEL COMPUTER ARCHITECTURE PROF. BRYANT PROF. KAYVON LET S

More information

Accelerating Implicit LS-DYNA with GPU

Accelerating Implicit LS-DYNA with GPU Accelerating Implicit LS-DYNA with GPU Yih-Yih Lin Hewlett-Packard Company Abstract A major hindrance to the widespread use of Implicit LS-DYNA is its high compute cost. This paper will show modern GPU,

More information

Andrew V. Knyazev and Merico E. Argentati (speaker)

Andrew V. Knyazev and Merico E. Argentati (speaker) 1 Andrew V. Knyazev and Merico E. Argentati (speaker) Department of Mathematics and Center for Computational Mathematics University of Colorado at Denver 2 Acknowledgement Supported by Lawrence Livermore

More information

Engineers can be significantly more productive when ANSYS Mechanical runs on CPUs with a high core count. Executive Summary

Engineers can be significantly more productive when ANSYS Mechanical runs on CPUs with a high core count. Executive Summary white paper Computer-Aided Engineering ANSYS Mechanical on Intel Xeon Processors Engineer Productivity Boosted by Higher-Core CPUs Engineers can be significantly more productive when ANSYS Mechanical runs

More information

Meshless Modeling, Animating, and Simulating Point-Based Geometry

Meshless Modeling, Animating, and Simulating Point-Based Geometry Meshless Modeling, Animating, and Simulating Point-Based Geometry Xiaohu Guo SUNY @ Stony Brook Email: xguo@cs.sunysb.edu http://www.cs.sunysb.edu/~xguo Graphics Primitives - Points The emergence of points

More information

Second-order shape optimization of a steel bridge

Second-order shape optimization of a steel bridge Computer Aided Optimum Design of Structures 67 Second-order shape optimization of a steel bridge A.F.M. Azevedo, A. Adao da Fonseca Faculty of Engineering, University of Porto, Porto, Portugal Email: alvaro@fe.up.pt,

More information

Effects of Solvers on Finite Element Analysis in COMSOL MULTIPHYSICS

Effects of Solvers on Finite Element Analysis in COMSOL MULTIPHYSICS Effects of Solvers on Finite Element Analysis in COMSOL MULTIPHYSICS Chethan Ravi B.R Dr. Venkateswaran P Corporate Technology - Research and Technology Center Siemens Technology and Services Private Limited

More information

Two-Phase flows on massively parallel multi-gpu clusters

Two-Phase flows on massively parallel multi-gpu clusters Two-Phase flows on massively parallel multi-gpu clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn Workshop Programming of Heterogeneous

More information

Analysis of Distortion Parameters of Eight node Serendipity Element on the Elements Performance

Analysis of Distortion Parameters of Eight node Serendipity Element on the Elements Performance Analysis of Distortion Parameters of Eight node Serendipity Element on the Elements Performance Vishal Jagota & A. P. S. Sethi Department of Mechanical Engineering, Shoolini University, Solan (HP), India

More information

Unsupervised learning in Vision

Unsupervised learning in Vision Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual

More information

T6: Position-Based Simulation Methods in Computer Graphics. Jan Bender Miles Macklin Matthias Müller

T6: Position-Based Simulation Methods in Computer Graphics. Jan Bender Miles Macklin Matthias Müller T6: Position-Based Simulation Methods in Computer Graphics Jan Bender Miles Macklin Matthias Müller Jan Bender Organizer Professor at the Visual Computing Institute at Aachen University Research topics

More information

Element Order: Element order refers to the interpolation of an element s nodal results to the interior of the element. This determines how results can

Element Order: Element order refers to the interpolation of an element s nodal results to the interior of the element. This determines how results can TIPS www.ansys.belcan.com 鲁班人 (http://www.lubanren.com/weblog/) Picking an Element Type For Structural Analysis: by Paul Dufour Picking an element type from the large library of elements in ANSYS can be

More information

Cloth Simulation. COMP 768 Presentation Zhen Wei

Cloth Simulation. COMP 768 Presentation Zhen Wei Cloth Simulation COMP 768 Presentation Zhen Wei Outline Motivation and Application Cloth Simulation Methods Physically-based Cloth Simulation Overview Development References 2 Motivation Movies Games VR

More information

PATCH TEST OF HEXAHEDRAL ELEMENT

PATCH TEST OF HEXAHEDRAL ELEMENT Annual Report of ADVENTURE Project ADV-99- (999) PATCH TEST OF HEXAHEDRAL ELEMENT Yoshikazu ISHIHARA * and Hirohisa NOGUCHI * * Mitsubishi Research Institute, Inc. e-mail: y-ishi@mri.co.jp * Department

More information

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization

More information

Algorithms, System and Data Centre Optimisation for Energy Efficient HPC

Algorithms, System and Data Centre Optimisation for Energy Efficient HPC 2015-09-14 Algorithms, System and Data Centre Optimisation for Energy Efficient HPC Vincent Heuveline URZ Computing Centre of Heidelberg University EMCL Engineering Mathematics and Computing Lab 1 Energy

More information

Multigrid Pattern. I. Problem. II. Driving Forces. III. Solution

Multigrid Pattern. I. Problem. II. Driving Forces. III. Solution Multigrid Pattern I. Problem Problem domain is decomposed into a set of geometric grids, where each element participates in a local computation followed by data exchanges with adjacent neighbors. The grids

More information

course outline basic principles of numerical analysis, intro FEM

course outline basic principles of numerical analysis, intro FEM idealization, equilibrium, solutions, interpretation of results types of numerical engineering problems continuous vs discrete systems direct stiffness approach differential & variational formulation introduction

More information

A Random Variable Shape Parameter Strategy for Radial Basis Function Approximation Methods

A Random Variable Shape Parameter Strategy for Radial Basis Function Approximation Methods A Random Variable Shape Parameter Strategy for Radial Basis Function Approximation Methods Scott A. Sarra, Derek Sturgill Marshall University, Department of Mathematics, One John Marshall Drive, Huntington

More information

Sizing Optimization for Industrial Applications

Sizing Optimization for Industrial Applications 11 th World Congress on Structural and Multidisciplinary Optimisation 07 th -12 th, June 2015, Sydney Australia Sizing Optimization for Industrial Applications Miguel A.A.S: Matos 1, Peter M. Clausen 2,

More information

Unstructured Mesh Generation for Implicit Moving Geometries and Level Set Applications

Unstructured Mesh Generation for Implicit Moving Geometries and Level Set Applications Unstructured Mesh Generation for Implicit Moving Geometries and Level Set Applications Per-Olof Persson (persson@mit.edu) Department of Mathematics Massachusetts Institute of Technology http://www.mit.edu/

More information

GRAPH CENTERS USED FOR STABILIZATION OF MATRIX FACTORIZATIONS

GRAPH CENTERS USED FOR STABILIZATION OF MATRIX FACTORIZATIONS Discussiones Mathematicae Graph Theory 30 (2010 ) 349 359 GRAPH CENTERS USED FOR STABILIZATION OF MATRIX FACTORIZATIONS Pavla Kabelíková Department of Applied Mathematics FEI, VSB Technical University

More information

A Parallel Implementation of the BDDC Method for Linear Elasticity

A Parallel Implementation of the BDDC Method for Linear Elasticity A Parallel Implementation of the BDDC Method for Linear Elasticity Jakub Šístek joint work with P. Burda, M. Čertíková, J. Mandel, J. Novotný, B. Sousedík Institute of Mathematics of the AS CR, Prague

More information

Chapter 1 A New Parallel Algorithm for Computing the Singular Value Decomposition

Chapter 1 A New Parallel Algorithm for Computing the Singular Value Decomposition Chapter 1 A New Parallel Algorithm for Computing the Singular Value Decomposition Nicholas J. Higham Pythagoras Papadimitriou Abstract A new method is described for computing the singular value decomposition

More information

PhD Student. Associate Professor, Co-Director, Center for Computational Earth and Environmental Science. Abdulrahman Manea.

PhD Student. Associate Professor, Co-Director, Center for Computational Earth and Environmental Science. Abdulrahman Manea. Abdulrahman Manea PhD Student Hamdi Tchelepi Associate Professor, Co-Director, Center for Computational Earth and Environmental Science Energy Resources Engineering Department School of Earth Sciences

More information

GEOMETRY MODELING & GRID GENERATION

GEOMETRY MODELING & GRID GENERATION GEOMETRY MODELING & GRID GENERATION Dr.D.Prakash Senior Assistant Professor School of Mechanical Engineering SASTRA University, Thanjavur OBJECTIVE The objectives of this discussion are to relate experiences

More information

Static and Dynamic Analysis Of Reed Valves Using a Minicomputer Based Finite Element Systems

Static and Dynamic Analysis Of Reed Valves Using a Minicomputer Based Finite Element Systems Purdue University Purdue e-pubs International Compressor Engineering Conference School of Mechanical Engineering 1980 Static and Dynamic Analysis Of Reed Valves Using a Minicomputer Based Finite Element

More information

cuibm A GPU Accelerated Immersed Boundary Method

cuibm A GPU Accelerated Immersed Boundary Method cuibm A GPU Accelerated Immersed Boundary Method S. K. Layton, A. Krishnan and L. A. Barba Corresponding author: labarba@bu.edu Department of Mechanical Engineering, Boston University, Boston, MA, 225,

More information

THE MORTAR FINITE ELEMENT METHOD IN 2D: IMPLEMENTATION IN MATLAB

THE MORTAR FINITE ELEMENT METHOD IN 2D: IMPLEMENTATION IN MATLAB THE MORTAR FINITE ELEMENT METHOD IN D: IMPLEMENTATION IN MATLAB J. Daněk, H. Kutáková Department of Mathematics, University of West Bohemia, Pilsen MECAS ESI s.r.o., Pilsen Abstract The paper is focused

More information

Memory Efficient Adaptive Mesh Generation and Implementation of Multigrid Algorithms Using Sierpinski Curves

Memory Efficient Adaptive Mesh Generation and Implementation of Multigrid Algorithms Using Sierpinski Curves Memory Efficient Adaptive Mesh Generation and Implementation of Multigrid Algorithms Using Sierpinski Curves Michael Bader TU München Stefanie Schraufstetter TU München Jörn Behrens AWI Bremerhaven Abstract

More information

Final drive lubrication modeling

Final drive lubrication modeling Final drive lubrication modeling E. Avdeev a,b 1, V. Ovchinnikov b a Samara University, b Laduga Automotive Engineering Abstract. In this paper we describe the method, which is the composition of finite

More information

The Level Set Method applied to Structural Topology Optimization

The Level Set Method applied to Structural Topology Optimization The Level Set Method applied to Structural Topology Optimization Dr Peter Dunning 22-Jan-2013 Structural Optimization Sizing Optimization Shape Optimization Increasing: No. design variables Opportunity

More information