Assembly-Free Large-Scale Modal Analysis on the GPU Praveen Yadav, Krishnan Suresh

Size: px

Start display at page:

Download "Assembly-Free Large-Scale Modal Analysis on the GPU Praveen Yadav, Krishnan Suresh"

Colin Flowers
5 years ago
Views:

Assembly-Free Large-Scale odal Analysis on the GPU Praveen Yadav, rishnan Suresh suresh@engr.wisc.

his is a bottleneck in large-scale modal problems with millions of degrees of freedom.

However, as is well-known, the Rayleigh-Ritz has serious numerical deficiencies, and has largely been abandoned by the finite element community.

1 Assembly-Free Large-Scale odal Analysis on the GPU Praveen Yadav, rishnan Suresh Department of echanical Engineering, UW-adison, adison, Wisconsin 53706, USA Abstract Popular eigen-solvers such as block-lanczos require repeated inversion of an eigen-matrix. his is a bottleneck in large-scale modal problems with millions of degrees of freedom. On the other hand, the classic Rayleigh-Ritz conjugate gradient method only requires a matrix-vector multiplication, and is therefore potentially scalable to such problems. However, as is well-known, the Rayleigh-Ritz has serious numerical deficiencies, and has largely been abandoned by the finite element community. In this paper, we address these deficiencies through subspace augmentation, and consider a subspace augmented Rayleigh-Ritz conjugate gradient method (SaRCG). SaRCG is numerically stable and does not entail explicit inversion. As a specific application, we consider the modal analysis of geometrically complex structures discretized via non-conforming voxels. he resulting large-scale eigen-problems are then solved via SaRCG. he voxelization structure is also exploited to render the underlying matrix-vector multiplication assembly-free. he implementation of SaRCG on multi-core CPUs, and graphics-programmable-units (GPUs) is discussed, followed by numerical experiments and case-studies.. INRODUCION Consider the modal analysis of a thin geometrically complex structure illustrated in Figure. he first step in such an analysis is the construction of a suitable finite-element mesh. Figure : A thin gear-housing whose eigen-spectrum is desired. hin geometrically complex structures require a small element size resulting in a large scale finite-element problem. For example, for the above structures, after numerous failed attempts, we were able to create the tetrahedral mesh illustrated in Figure, with over 750,000 degrees of freedom. Figure : Complex thin-structures require a finemesh. his leads to the eigen-problem: x= λx (.) where & are the stiffness and mass matrices associated with the finite-element discretization. he above problem has been well-studied, and numerous implementations exist [ 3]. When & are large, practical limitations arise. Specifically, popular eigensolvers such as block-lanczos require explicit and repeated inversion of a large-size eigen-matrix (see Section ), and the memory requirements and computational time grows quadratically in such methods. On the other hand, the classic Rayleigh-Ritz conjugate gradient method for solving Equation (.) only requires an implementation of sparse matrix-vector multiplication. he memory requirements are low, and the computational complexity depends largely on the efficiency of the matrix-vector multiplication. However, as numerical experiments indicate, the Rayleigh-Ritz method lacks robustness, and can exhibit poor convergence; consequently, it has largely been abandoned by the finite element community.

2 In this paper, we revisit this method, and address the limitations through subspace augmentation. he resulting subspace augmented Rayleigh-Ritz conjugate gradient (SaRCG) method is highly robust, exhibits fast convergence, and is easily parallelizable. An assembly-free implementation of SaRCG on multicore and GPU architectures is discussed.. LIERAURE REVIEW In this Section, we briefly review popular strategies for solving the generalized eigen-value problem: x= λx (.) For the remainder of the paper, we assume that and are sparse, symmetric, real, and positive definite. One of the most popular methods today to solve Equation (.) is the block-form of the shift-andinvert Lanczos algorithm, also referred as the block- Lanczos method [], [4], [5]. he block-lanczos method requires repeated inversion of an eigenmatrix σ, where the parameter σ is determined during iterations of the method. Since and are large, and the parameter σ varies during the iteration, explicit LU factorization of σ is not desirable. An alternate strategy is to factorize this matrix through preconditioned iterative solvers. However, as emphasized in [], [6], early termination of iterative solvers can lead to a significant loss of accuracy in block-lanczos, requiring careful attention to termination. In [6], the classic block-lanczos algorithm was instead modified to eliminate the need for an explicit decomposition. Specifically, the inversion was carried out in an approximate sense over a rylov sub-space. he method discussed in this paper borrows this concept, but in a different context. In [], the authors rely on Algebraic ultigrid as a pre-conditioner for factorizing the shifted matrix. In addition, the authors also implement and compare a variety of alternate algorithms including locally optimal block preconditioned conjugate gradient, Davidson-Jacobi, etc. ore importantly, the authors demonstrate that, for large-scale eigen-value problems, such alternate algorithms can be competitive. his is a key motivation for the present paper. One such alternate algorithm is the Rayleigh-Ritz conjugate gradient (RCG) where the eigen-mode problem is posed and solved as a minimization problem (see next Section). Since the RCG method can be slow to converge, the authors in [7] implement a parallel version of RCG with factorized sparse approximate inverse as a pre-conditioner. Further, in [8], a locally-optimal variation of RCG was proposed. he essential idea is to expand the search-space during each step of CG in a locally-optimal sense. We do not implement this variation here, but the proposed method can be modified to include localoptimality. Besides block-lanczos and RCG, there are dozens of methods including Jacobi-Davidson [9] and inverseiteration [0] that do not offer significant advantages over RCG or block-lanczos, and are therefore not pursued here. 3. Rayleigh-Ritz Conjugate Gradient As stated earlier, the Rayleigh-Ritz conjugate gradient (RCG) algorithm [], [] only requires an efficient implementation of sparse matrix-vector multiplication. It therefore exhibits numerous advantages including simplicity, low memory requirements, and significant scope for parallelism. 3. Rayleigh Quotient Concept A key concept behind RCG is the Rayleigh quotient of an arbitrary vector x : x x ϕ ( x) = (3.) x x If the vector x happens to be an eigen-vector of (, ), then the Rayleigh quotient is the corresponding eigen-value. hus, by minimizing the Rayleigh quotient, one can compute the lowest eigenpair, i.e., the eigen-value problem can be posed as a minimization problem: x x x in x x (3.) 3. Computing the Lowest ode hus, one can deploy, for example, the nonlinear conjugate gradient method [3] to find the lowest eigen-mode. owards this end, observe that the gradient of Equation (3.) is given by: where: x ϕ( xx ) gx ( ) = x x (3.3) = x x (3.4) Exploiting the classic conjugate gradient algorithm [3], we have the Rayleigh-Ritz conjugate gradient (RCG; Version ): Rayleigh-Ritz Conjugate Gradient (Version ): (). Initialize x 0 such that (0) (). Setp = 0, β = and k= () x = 3. Compute ϕ ( x ) via Equation (3.), and the gradient g via Equation (3.3)

3 4. he conjugate search direction is given by: p = g + β p ( k ) 5. Find the step length ( k ) 6. Let y + = x + δ p and 7. If x = y / y ( k+ ) ( k+ ) ( k+ ) δ as described in [4] g, terminate; else, increment k, and go to step 3 Observe that the RCG algorithm only requires matrixvector multiplications: v and v, making it a simple algorithm to implement. 3.3 Computing ultiple odes o compute higher modes, observe that that if ( λ, x ) and ( λ, x ) are two distinct modes, and if λ λ, then they must satisfy -orthogonality: x x = 0 (3.5) Further, if λ = λ, one can always find a pair of eigen-modes that satisfy the above equation. hus, to find the second eigen-mode, we pose a constrained minimization problem: x x in x x x st.. x x = 0 (3.6) Given an arbitrary vector x, one can enforce - orthogonality via: ( ) x= x x x x (3.7) o ensure -orthogonality during the iteration process, it is necessary and sufficient if: () the initial random vector is -orthogonal to x, and () the search directions ( k ) p are -orthogonal to x. hus, suppose the first (m-)-modes have been computed: X = { x, x,..., x } (3.8) m o compute the next mode, one must solve the constrained minimization problem: x x in x x x st.. x X= 0 where -orthogonality is enforced via: m i= ( ) i i (3.9) x= x x x x (3.0) his leads to the following RCG algorithm (Version ) for computing the m th pair ( λ, x ) []. m m Rayleigh-Ritz Conjugate Gradient (Version ):. Suppose m- eigen-modes X= { x, x,..., x } m have been computed. (). Initialize x 0 such that x X= 0 (0) () 3. Setp = 0, β = and k= () x = and 4. Compute ϕ ( x ) via Equation (3.), and the gradient g via Equation (3.3) ( k ) 5. Letp = g + β p (preliminary direction) 6. Construct an -orthogonal direction Equation (3.0) 7. Find the step length ( k ) ( k ) ( k ) ( k ) 8. Let y + = x + δ p and 9. If x = y / y ( k+ ) ( k+ ) ( k+ ) ( k ) p via δ as described in [4] g, terminate; else, increment k, and go to step Subspace Augmentation here are three fundamental limitations of the above RCG algorithm; these are confirmed later through numerical experiments. o the best of our knowledge, these limitations have not been identified/ emphasized in the literature:. issing modes: As we sweep through the eigenspectrum, one or more eigen-modes may go undetected, especially in the case of near-by eigenvalues. his can be attributed to the fact that RCG attempts to find local minima, and presence of multiple minima that are close-by can be detrimental to the algorithm.. Erroneous Results: A large value for (in step- 9 of RCG, version ) can leads to erroneous results; 0 typically ~ 0 is essential Slow convergence: On the other hand, ~ 0 leads to thousands of iterations (for each eigenmode), especially for ill-conditioned matrices. Fortunately, all three problems can be addressed by relying on subspace projection methods. Such projection methods are fairly common in modern eigen-solvers [3]. In particular, the method proposed

here is similar to the generic method of subspace projection discussed in []. Specifically, let us assume that, at the end of k iterations, we have a set of approximate eigen-vectors ( k x ) ( i=,,.

) Next, a reduced stiffness and mass matrices are constructed over this sub-subspace as follows: = S S ( k ) = S S ( k ) (3.

4 here is similar to the generic method of subspace projection discussed in []. Specifically, let us assume that, at the end of k iterations, we have a set of approximate eigen-vectors ( k x ) ( i=,,..., m) computed via early termination of i RCG (say ~ 0.0 ). A sub-space is then defined via: S = { x, x,..., x } m (3.) Next, a reduced stiffness and mass matrices are constructed over this sub-subspace as follows: = S S ( k ) = S S ( k ) (3.) Observe that computing these matrices only requires sparse matrix-vector multiplications. he two matrices are then used to solve a smaller eigen-value problem (exactly) via: V = V Λ (3.3) Finally, a sharpened set of eigen-vectors are recovered via: ( k ) X + = S V (3.4) where: ( k+ ) ( k+ ) ( k+ ) ( k+ ) X = { x, x,..., x } m (3.5) One can now restart the RCG method with the eigenvectors in Equation (3.5) as the starting vectors. his leads to the following subspace augmented Rayleigh- Ritz conjugate gradient algorithm (SaRCG), summarized below. Subspace Augmented Rayleigh-Ritz Conjugate Gradient (SaRCG): () () () (). Initialize X = { x, x,..., x } (typically m random vectors). Set k= ( ). Compute x k ( ) with x k as the starting vectors until i i g ( ~ 0.0 ), for i=,,..., m, via RCG (version ). 3. Construct the reduced matrices via Equation (3.), where S is defined in Equation (3.), and solve Equation (3.3) to find Λ and V. 4. If convergence in Λ has not been achieved, construct the updated eigen-vectors ( k ) X + via Equation (3.4), increment k and go to step. his simple extension of the RCG algorithm dramatically improves its robustness and convergence. he additional computational cost of constructing and solving the reduced eigen-value problem is negligible. In other words, the primary cost in SaRCG, is the sparse matrix-vector multiplication (SpV). herefore, an efficient implementation of SpV is critical. 3.5 Voxelization he proposed SaRCG method is applicable to any finite-element discretization, including the tetrahedral discretization of Figure. However, in this paper, we consider a simple discretization, where the geometry is approximated via uniform cubes or voxels ; the voxel-approach has gained significant popularity recently [5]. Since the voxelization need not conform to the geometry, it is robust, fast and relatively insensitive to the geometric complexity. Further, unlike in stress analysis, where a conforming mesh is essential, in eigen-mode analysis, a nonconforming voxelization is often sufficient, especially during the early stage of design. his is confirmed through numerical experiments later in the paper. he voxelization of the geometry in Figure is illustrated in Figure 3; it has over a million degrees of freedom. Fortunately, even such a large-size problem is easily handled via the proposed method. Figure 3: Brute-force voxelization of the structure. he voxelization of a triangulated CAD model, also referred to as 3-D scan conversion, is straightforward, and is discussed, for example, see [6]. Voxels that are partially inside the geometry are assigned a density value depending on the number of corner nodes that lie inside the geometry. As the numerical experiments indicate, the overall computational cost for the voxelization is small compared to the cost of computing the eigen-modes. 3.6 Shape Functions Given a voxelization, one can choose a variety of hexahedral finite element shape functions. he simplest are tri-linear shape functions as described in [7], where each node-based shape function is of the form: N = 0.5( + ξξ)( + ηη)( + ζζ); i=...8 (3.6) i i i i However, the resulting 8-noded elements are stiff, and convergence is slow especially for higher order modes, as illustrated through numerical experiments. One could use 0-node or 7-node elements, but these increase the memory requirements significantly.

In the context of static problems the authors of [8] instead propose the use of three additional bubblefunctions of the form: ( ξ, η, ζ) = ( ξ ) ( ξ, η, ζ) = ( η ) 3( ξ, η, ζ) = ( ζ ) his resulting

5 In the context of static problems the authors of [8] instead propose the use of three additional bubblefunctions of the form: ( ξ, η, ζ) = ( ξ ) ( ξ, η, ζ) = ( η ) 3( ξ, η, ζ) = ( ζ ) his resulting stiffness matrix will be form: where = e = N D NdΩ = N D dω = D NdΩ = D dω (3.7) (3.8) (3.9) Fortunately, one can condense out the bubble degrees of freedom, resulting again in a reduced 4 degrees of freedom element stiffness matrix [8]: = ( \ ) (3.0) e While the authors in [8] demonstrate the merit of this formulation for static problems, we have adopted these bubble functions for eigen-problems. Numerical experiments indicate bubble functions are highly effective in eigen-problems as well. 3.7 Assembly-Free SpV he primary computational cost of sparse matrixvector multiplication (SpV), in today s computational architecture, is memory-access, i.e., floating-point operations are essentially free []. In SpV, there are two types of memory access: () ( ) accessing elements of x k, and () accessing elements i of and. here is very little one can do about the former, except perhaps careful numbering of the mesh nodes. However, the cost of accessing elements of and can be dramatically reduced through assembly-free methods as discussed below. Since the geometry is discretized via uniform voxels, all elements are geometrically identical. herefore all element matrices are identical (barring a density assigned to partial voxels). hus, one need not assemble or store the global and matrices; it is sufficient if a single matrix-pair and is stored. e e Consequently, a matrix-free implementation ofv, for example, may be implemented as follows [9]: x= x= x e ( e e) (3.) e e In other words, instead of assembling the matrix, and then carrying out x, we first multiply x and e e then assemble the result. 3.8 CPU and GPU Implementations In the CPU implementation of the assembly-free SpV, parallelization was attained through OpenP commands ( In the GPU implementation using CUDA [0] on NVidia cards, SpV was implemented by assigning each node of the voxel-grid to a scalar processor. Associated with each voxel node, there are at most 8 voxel elements, and 7 neighboring nodes. When a block of threads are launched, all threads share identical element matrices, and therefore a single memory fetch of the stiffness and mass element matrix per block is sufficient. he three degrees of freedom associated with the 7 nodes are fetched in parallel. In general, during this fetch, it is hard to enforce coalesced memory access for arbitrary voxel-meshes. For more recent GPU cards, we have observed that this is not a serious issue. Finally, many of the low-level operations such as vector-product were implemented using CUBLAS library. 4. NUERICAL EXPERIENS In this Section, we present results from numerical experiments based on the RCG and SaRCG algorithms. All experiments were conducted on a Windows-7 64-bit machine with the following hardware: Intel I7 960 CPU quad-core running at 3.GHz with 6 GB of memory; parallelization of CPU code was implemented through Openp commands. he graphics programmable unit (GPU) is an NVidia GeForce GX 480 (480 cores) with.5 GB. Both the CPU and GPU were configured to run in double-precision. 4. Importance of Bubble Functions In this experiment, we illustrate the value added by the use of bubble functions discussed earlier. Specifically, a short cantilevered beam of dimension x x 5, fixed on end is used in this experiment. In Figure 4, the first 4 eigen-modes are illustrated; these were computed using a quadratic tetrahedral mesh with 45,000 DOF, in SolidWorks []. () 3 Hz () 3 Hz

(3) 478 Hz (4) 74 Hz Figure 4: First four modes computed via SolidWorks. Next using SaRCG, the first eigen-mode was computed for different voxel sizes, with-and-without bubble function.

herefore, for the remainder of the experiments, bubble functions are employed. able : First eigen-value of short cantilevered beam, with-and-without bubble functions.

6 (3) 478 Hz (4) 74 Hz Figure 4: First four modes computed via SolidWorks. Next using SaRCG, the first eigen-mode was computed for different voxel sizes, with-and-without bubble function. As one can observe, the use of bubble functions dramatically improves the convergence at no additional cost. Similar improvements were observed for higher order modes, and for other casestudies. herefore, for the remainder of the experiments, bubble functions are employed. able : First eigen-value of short cantilevered beam, with-and-without bubble functions. #DOF Without Bubble (Hz) %error With Bubble (Hz) %error Deficiency of RCG Next, we illustrate the deficiency of RCG. Using the same cantilevered beam example, we computed the first four eigen-modes via RCG for different tolerances values (see Version of RCG algorithm) for a fixed voxel size (55,000 DOF). As one can observe in able, even with tight tolerances, RCG can miss some of the critical modes or compute them out of order the classic RCG method is not robust. able : First 4 eigen-value of beam via RCG. ode- ode- ode-3 ode-4 8 = = = = On the other hand, SaRCG converges to the correct set of eigen-modes, even with a coarse tolerance of 4 = 0, as summarized in able 3. able 3: First 4 eigen-values of beam via SaRCG. ode- ode- ode-3 ode-4 = Beam: Conforming vs. Non-Conforming A premise of this paper is that, for approximate eigenmode analysis, a non-conforming mesh may be sufficient. We illustrate this through the following experiment. he beam from the previous example was rotated about its longest axis by an arbitrary angle of degrees, and the resulting geometry was discretized via a non-conforming set of voxels (see Figure 5). hen the first four eigen-modes were computed for increasing density of the mesh, using SaRCG with 4 bubble-mesh, and a tolerance of = 0. Figure 5: Non-conforming voxelization of a beam. able 4 summarizes the results of this experiment. Clearly, non-conforming meshes are never as accurate as conforming meshes (for a fixed grid-size). However, the inaccuracy can be reduced through brute-force computation. #DOF able 4: First eigen-value for cylindrical beam cantilever using SaCG with tolerance of 0 -. GPU ime (sec) ode- ode- ode-3 ode Crank-Rod: A Practical Example A more realistic example is that of a crank-rod illustrated in Figure 6 that is clamped on the innersurface of the smaller cylindrical hole. he first eigenmode was computed using SolidWorks with default mesh parameters. he resulting conforming (secondorder tetrahedral) mesh contains approximately 90,000 degrees of freedom. he first eigen-mode was computed to be 59.6 Hz; the total computational time including meshing and solver time was approximately 9 seconds. Refinement of the mesh did not change the first eigen-mode significantly in SolidWorks. 4 =

Figure 6: First eigen-mode of a connecting-rod at 59.

As one can observe, the eigen-value converges to the correct value, and the computational time, is

DOF able 5: First eigen-value for crank-rod.

7 secs 5,000 560 Hz 0. secs. secs 3.7 secs 80,000 54 Hz 0.4 secs 4 secs secs 36,000 538 Hz 0.

secs 6 secs 5 secs () 34 Hz () 484 Hz (3) 4 Hz (4) 5856 Hz (5) 7595 Hz Figure 8: he first 5 modes

he corresponding five eigen-modes computed via the proposed method are illustrated in Figure 9.

GPU, and.5 minutes on the CPU. For the next 0 modes, convergence to within 3~5% was also observed. 4.

for the first five eigen-modes for the knuckle problem illustrated in Figure 6a, where the two

Also illustrated in Figure 6b and Figure 6c are the confirming and nonconforming meshes respectively,

() 393 Hz () 445 Hz (3) 46 Hz Figure 7: (a) A knuckle component, (b) conforming mesh, and (c) voxel-mesh

7 Figure 6: First eigen-mode of a connecting-rod at 59.6 Hz In able 5, the computed eigen-value (via SaRCG) as the mesh is refined is summarized. As one can observe, the eigen-value converges to the correct value, and the computational time, is comparable to that of SolidWorks, especially when a GPU such as the GX 480 is used. DOF able 5: First eigen-value for crank-rod. ode- Voxelization time (CPU) Solution time (GPU) Solution time (CPU) 9, Hz 0.07 secs.6 secs.7 secs 5, Hz 0. secs. secs 3.7 secs 80, Hz 0.4 secs 4 secs secs 36, Hz 0.6 secs 7 secs 58 secs 50, Hz. secs 6 secs 5 secs () 34 Hz () 484 Hz (3) 4 Hz (4) 5856 Hz (5) 7595 Hz Figure 8: he first 5 modes computed via SolidWorks. he corresponding five eigen-modes computed via the proposed method are illustrated in Figure 9. he eigen-values are within ~% accuracy, and the computational time was approximately 35 seconds, on the GPU, and.5 minutes on the CPU. For the next 0 modes, convergence to within 3~5% was also observed. 4.4 nuckle: Accuracy of First Few Eigen- odes In the next example, we compare the accuracy of SaRCG method for the first five eigen-modes for the knuckle problem illustrated in Figure 6a, where the two horizontal holes are clamped. Also illustrated in Figure 6b and Figure 6c are the confirming and nonconforming meshes respectively, with approximately 50,000 and 50,000 degrees of freedom, respectively. () 393 Hz () 445 Hz (3) 46 Hz Figure 7: (a) A knuckle component, (b) conforming mesh, and (c) voxel-mesh he first five eigen-modes as computed using SolidWorks, are illustrated in Figure 8; the total computational time was approximately 5 seconds. (4) 574 Hz (5) 7534 Hz Figure 9: he first 5 modes computed via proposed method. 4.5 Gear Housing: Robustness he primary advantages of the proposed method are its robustness, simplicity and ability to handle geometrically complex structures. o illustrate, consider the gear housing illustrated earlier in Figure. he first eigen-mode of the structure computed via the proposed method is illustrated in Figure 0.

8 Figure 0: he first mode of the structure in Figure. he convergence of the first eigen-value with voxelsize is summarized in able 6; it demonstrates that one can rapidly estimate the eigen-values in a fully automated fashion. his is particularly important during the early stages of design. able 6: First eigen-value for gear-housing. DOF ode- Solution time (GPU) Solution time (CPU) 50, secs 6 secs 300, secs. mins 45, secs 3.7 mins,000, secs 3 mins 5. CONCLUSION he main contribution of this paper is an assemblyfree implementation of Subspace augmented Rayleigh-Ritz Conjugate Gradient (SaRCG), for computing the eigen-modes of elastic structures. he proposed method is simple, robust and inverse-free. Since the method only requires an implementation of a sparse matrix vector multiplication (SpV), it is highly parallelizable, and can be easily implemented on modern multi-core architectures. Although the method was demonstrated using a non-conforming structured (voxel) mesh, it is equally applicable to classic conforming meshes. 6. REFERENCES [] V. Hernandez, J. E. Roman, A. omas, and V. Vidal, A survey of software for sparse eigenvalue problems., SLEPc echnical Report SR-6. Universidad Politecnica de Valencia, Valencia, Spain [] P. Arbenz, U. L. Hetmaniuk, R. B. Lehoucq, and R. S. uminaro, A Comparison of Eigensolvers for Large-scale 3D odal Analysis using AG-Preconditioned Iterative ethods, International Journal for Numerical ethods in Engineering, vol. 64, no., pp , 005. [3] Y. Saad, Numerical methods for large eigenvalue problems, nd ed. anchester University Press, 0. [4] R. G. Grimes, J. G. Lewis, and H. D. Simon, A Shifted Block Lanczos Algorithm for Solving Sparse Symmetric Generalized Eigenproblems, SIA Journal on atrix Analysis and Applications, vol. 5, no., p. 8, 994. [5] D. C. Sorensen, Numerical methods for large eigenvalue problems, ACA NUERICA, vol., pp , 00. [6] G. H. Golub and Q. Ye, An Inverse Free Preconditioned rylov Subspace ethod for Symmetric Generalized Eigenvalue Problems, SIA Journal on Scientific Computing, vol. 4, no., pp , 00. [7] L. Bergamaschi, Á. artínez, and G. Pini, Parallel preconditioned conjugate gradient optimization of the Rayleigh quotient for the solution of sparse eigenproblems, Applied mathematics and computation, vol. 75, no., p. 964, 006. [8] A. V. nyazev, oward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient ethod, SIA Journal on Scientific Computing, vol. 3, no., pp , 00. [9] G. L. G. Sleijpen and H. A. Van der Vorst, A Jacobi-Davidson Iteration ethod for Linear Eigenvalue Problems, SIA Journal on atrix Analysis and Applications, vol. 7, no., pp , 996. [0] I. C. F. Ipsen, Computing an Eigenvector with Inverse Iteration, SIA REVIEW, vol. 39, no., pp. 54 9, 997. [] H.-J. Jang, Preconditioned Conjugate Gradient ethod for Large Generalized Eigenproblems, rends in athematics Information Center for athematical Sciences, vol. 4, no., pp , 00. [] Y.. Feng and D. R. J. Owen, Conjugate Gradient ethods for Solving the Smallest Eigenpair of Large Symmetric Eigenvalue Problems, International Journal for Numerical ethods in Engineering, vol. 39, no. 3, pp , 996. [3] J., Wright, S. Nocedal, Numerical optimization. New York: Springer Science + Business edia, 006. [4] H. Yang, Conjugate Gradient ethods for the Rayleigh Quotient inimization of Generalized Eigenvalue Problems, Computing -Wein-, vol. 5, no., pp , 993. [5] A. Duster, J. Parvizian, Z. Yang, and E. Rank, he Finite Cell ethod for 3D problems of solid mechanics., Computer ethods in Applied echanics and Engineering, vol. 97, pp , 008. [6] E. A. arabassi, G. Papaioannou, and. heoharis, A Fast Depth-Buffer-Based

9 Voxelization Algorithm, Journal of Graphics ools, vol. 4, no. 4, 999. [7] O. C. Zienkiewicz, he Finite Element ethod for Solid and Structural echanics. Elsevier, 005. [8] H. H. aiebat and J. P. Carter, hree- Dimensional Non-Conforming Elements, Centre for Geotechnical Research, he University of Sydney, Sydney, R808, 00. [9] C. E. Augarde, A. Ramage, and J. Staudacher, An element-based displacement preconditioner for linear elasticity problems, Computers and Structures, vol. 84, no. 3 3, pp , 006. [0] NVIDIA Corporation, NVIDIA CUDA: Compute Unified Device Architecture, Programming Guide. Santa Clara.:, 008. [] SolidWorks, SolidWorks;

A Deflated Assembly-Free Approach to Large-Scale Implicit Structural Dynamics

A Deflated Assembly-Free Approach to Large-Scale Implicit Structural Dynamics Amir M. Mirzendehdel rishnan Suresh suresh@engr.wisc.edu Department of Mechanical Engineering UW-Madison, Madison, Wisconsin