JAVA PERFORMANCE IN FINITE ELEMENT COMPUTATIONS
|
|
- Stephen Wade
- 5 years ago
- Views:
Transcription
1 JAVA PERFORMANCE IN FINITE ELEMENT COMPUTATIONS G.P. NIKISHKOV University of Aizu Aizu-Wakamatsu, Fukushima , Japan ABSTRACT 1 The performance of the developed Java finite element code is compared to that of the C finite element code on the solution of three-dimensional elasticity problems using Intel Pentium 4 computer. Untuned Java code is approximately two times slower then analogous C code. It is shown that code tuning with the use of blocking technique can provide Java/C performance ratio 90% for the LDU solution of finite element equations. Java performance for PCG iterative solution algorithm tuned by inner loop unrolling is 75% of the C code. We recommend using Java Virtual Machine since in many cases it is considerably faster in finite element computations than JVMs 1. and 1.4. KEY WORDS Finite Element Methods, Java-based Simulation, Performance, Tuning 1 Introduction Finite element codes were traditionally developed in Fortran [1] and recently in Fortran 90 [2]. During last decade FEM developers started using C++ language in order to handle complexity in finite element software [-5]. Using the object-oriented approach with data hiding, encapsulation and inheritance, allows creating reliable and extensible finite element codes. Java language [6] developed by Sun Microsystems possesses features, which makes it attractive for using in computational modelling. Java is a simple language with rich collection of libraries implementing various APIs (Application Programming Interfaces). With Java it is easy to create Graphical User Interfaces and to communicate with other computers over a network. Java has built-in garbage collector preventing memory leaks. Another advantage of Java is its portability. Java Virtual Machines (JVM) [7] are developed for all major computer systems. JVM is embedded in most popular Web browsers. Java applets can be downloaded through the net and executed within Web browser. While object-oriented programming can be done with C++ language, other useful features such as actual portability and garbage collection are unique characteristics of Java language. 1 Applied Simulation and Modeling, Procs of the 12th IASTED Int. Conf., Sept. -5,, Marbella, Spain., ACTA Press, Anaheim,, pp Despite its attractive features, Java is not widely used in engineering computations. Java byte code translation into native instructions leads to a slower operation of Java code. However, Just-In-Time compiler (JIT) can significantly speed up the execution of Java applications and applets. The JIT, which is an integral part of the JVM takes the bytecodes and compile them into native code before execution. Since Java is a dynamic language, the JIT compiles methods on a method-by-method basis just before they are called. If the same method is called many times or if the method contains loop with many repetitions the effect of re-execution of the native code can make the performance of Java code acceptable. Java performance in numerical computing was considered in several publications [8-10]. It was shown that high-performance numerical codes could be developed in Java with suitable code development techniques. While papers [8-10] deal with general issues of numerical computing, this paper addresses Java performance and tuning in finite element computations. We present our experience in designing the efficient finite element code in Java. The performance of the developed Java finite element code is compared to that of the analogous C code on finite element solutions of three-dimensional elasticity problems using Intel computer. For running Java code we employed Sun JVMs 1.2, 1. and 1.4. It is shown that with proper coding and JVM selection the Java finite element code can be almost as fast as the C code. 2 Java Finite Element Code Object-oriented approach is used widely in order to create reusable, extensible, and reliable components, which can be used in later research and practical applications. However, full object-oriented programming approach might not be always ideal for computationally intensive sections of codes. Object creation and destruction in Java are expensive operations. The use of large amount of small objects can lead to considerable time and space overhead. As experiments show, a possible way to increase computing performance is reducing expenses for object creation in the code by using primitive types in place of objects. For a variable of a primitive type the JVM allocates the variable directly on the stack (local variable) or within the memory used for the object (member variable). For such variables there is no object creation overhead, and no
2 class JFem main class controlling FEM solution interface CNST collection of constants used during solution class Element abstract finite element class Element2D8N 2D quadrilateral 8-noded element class ElementD20N D hexahedral 20-noded element class FiniteElementModel - description of the finite element model class LoadVectorAssembler boundary conditions for the finite element model class Material abstract material model class ElasticMaterial material model for elasticity problems class DataFileReader reading data file class Solver abstract finite element solver class ProfileLDUSolver solution of the finite element equation system by the direct LDU method with profile storage of the matrix class SparseRowPCGSolver - solution of the finite element equation system by the preconditioned conjugate gradient method class Node abstract node of the finite element model class Node2D node of the 2D finite element model class NodeD node of the D finite element model Figure 1. Class hierarchy of the JFEM code. garbage collection overhead. Java does not support true multi-dimensional arrays. Because of this it is more appropriate to employ one-dimensional arrays even in the cases where two-subscript notation is used in the mathematical formulation of the problem. It should be noted that computationally critical code sections are small in comparison to the whole code. The whole finite element code can be designed with objectoriented approach. A compromise between using objects and providing high efficiency should be found for the computationally intensive sections of the code. Keeping in mind the above efficiency considerations we developed the Java finite element code JFEM for the solution of two-dimensional and three-dimensional elasticity problems. The class hierarchy of the JFEM code is presented in Fig. 1. The class design allows extensibility of the code. Abstract classes are used for the definition of classes for nodes, finite elements, material models and equation solvers. The abstract class defines the overall structure of the hierarchy. It contains the data members and member methods. Some methods can be implemented in the abstract class; other methods are implemented in class, which is lower in the hierarchy. For example, abstract class Element contains methods for data manipulations (connectivity data and nodal data), which are common to all element types. Methods for computing shape functions, derivatives of shape functions, element stiffness matrix, element load vector etc. are implemented in classes Element2D8N and ElementD20N for the two-dimensional 8-node element and for the three-dimensional 20-node element. It is worth noting that we try to restrict using objects in computationally intensive parts of the finite element procedure. Class Node is used during input of the nodal data for the finite element model. During calculation of the element stiffness matrices and during the assembly and solution of the equation system only primitive types and one-dimensional arrays are used in operations with nodal data. Assembly and Solution of Equation System For linear problems main fraction of computing time is related to calculation of element stiffness matrices, assembly of the equation system and its solution. Here we present algorithms of element stiffness matrix computation and consider two algorithms of equation solution: direct method of decomposition into lower, diagonal and upper matrix (LDU) and iterative preconditioned conjugate gradient (PCG) method..1 Stiffness Matrix Assembly A global stiffness matrix of the structure is assembled of element stiffness matrices. Coefficients of the element stiffness matrix [k] are expressed as follows: kii mn = [ (λ + 2µ) N m N n x i x i V +µ ( Nm x i+1 N m x i+1 + N m x i+2 N m x i+2 )] dv, kij mn = ( ) λ N m N n x i x j + µ N m N n x j x i dv. V Here m, n are local node numbers; i, j are indices related to coordinate axes (x 1, x 2, x ). Cyclic rule is employed in the above equation if coordinate indices become greater
3 than. Material parameters λ and µ are Lame elastic constants. In our computer code integration of the stiffness matrix [k] for the 20-node element is performed using special 14-point integration rule. Since the element stiffness matrix possesses symmetry property, only symmetrical part of the matrix and diagonal coefficients are computed and then used for assembly of the global stiffness matrix. Assembly of the global stiffness matrix is performed with the use of element connectivity information. Assembly algorithm depends on the storage format for the finite element equation system..2 LDU Solution of Equation System Symmetric part of the global stiffness matrix of the order n is stored in a profile form by columns. Each column of the matrix starts from the first top nonzero element and ends at the diagonal element. The matrix is represented by two arrays: one-dimensional double array a, containing matrix elements and a pointer array pcol. Assuming that array indices begin from one, the ith element of pcol contains the index in the array a of the first element of the ith column minus one. The length of the ith column is given by pcol[i+1]-pcol[i]. The length of the array a is equal to pcol[n+1]. The location (row number) of the first nonzero element in the ith column of the matrix [A] is given by the function FN(i): FN(i)=i-(pcol[i+1]-pcol[i])+1. The following correspondence relation can be easily obtained for a transition from two-index matrix notation to one-dimensional array notation: a[i,j] a[i+pcol[j+1]-j]. Solution of a symmetric equation system consists of [U] T [D][U] decomposition of the system matrix followed by forward reduction and backsubstitution for the righthand side. The [U] T [D][U] decomposition takes majority of the computing time. The right-looking algorithm of the decomposition can be presented as the following pseudocode: do j=2,n Cdivt(j) = Cdivt(j) do i=fn(j),j-1 do i=j,n t[i] = a[i,j]/a[i,i] Cmod(j,i) Cmod(j,i) = do j=2,n do k=max(fn(j),fn(i)),j-1 Cdiv(j) a[j,i] -= t[k]*a[k,i] Cdiv(j) = do i=fn(j),j-1 a[i,j] /= a[i,i] Do loop, which takes most time of LDU decomposition is contained in the procedure Cmod(j,i). One column of the matrix is used to modify another column inside inner do loop. Two operands should be loaded from memory in order to perform one Floating-point Multiply-Add (FMA) operation. Data loads can be economized by tuning with the use of blocking technique. After unrolling two outer loops, the tuned version of the LDU decomposition is as follows: do j=1,n,d Bdivt(k,d) = Bdivt(j,d) do j=k,k+d-1 do i=j+d,n,d do i=fn(k),j-1 BBmod(j,i,d) t[i,j] = a[i,j]/a[i,i] do i=j,k+d-1 do j=2,n do l=max(fn(j),fn(i)),j-1 Cdiv(j) a[j,i] -= t[l,j]*a[l,i] BBmod(j,i,d=2) = do k=max(fn(j),fn(i)),j-1 a[j,i] -= t[k,j]*a[k,i] a[j+1,i] -= t[k,j+1]*a[k,i] a[j,i+1] -= t[k,j]*a[k,i+1] a[j+1,i+1] -= t[k,j+1]*a[k,i+1] if j>=fn(j) then a[j+1,i] -= t[j,j+1]*a[j,i] a[j+1,i+1] -= t[j,j+1]*a[j,i+1] end if Method BBmod(j,i,d) performs modification of a column block, which starts from column i by a column block, which starts from column j and contains d columns. The pseudo-code above is given for the block size d = 2 for brevity. In three-dimensional problems, which are solved here, the block size d = is used. It is assumed that columns in the block start at the same row of the matrix a. This is fulfilled automatically if the column block contains columns, which are related to one node of the finite element model.. PCG Solution of Equation System Preconditioned conjugate gradient (PCG) method is an iterative procedure, which does not alter the equation matrix. Because of this, only nonzero coefficients of the finite element global stiffness matrix can be stored. Sparse structure of the matrix should be taken into account in matrix-vector multiplications. We use sparse row format for the equation matrix. In this format all information about matrix is contained in three arrays: a - array of doubles containing non-zero elements of the matrix, row by row; col - array of column indices for non-zero elements of the array a;
4 prow - pointer array of indices of starting elements of matrix rows in the array a, again assuming that indices start from one. Preconditioning techniques are not the subject of this work. Simple diagonal preconditioning is used in our PCG solution procedure of finite element equations. The most time consuming operation in the PCG solution procedure is the sparse matrix-vector product inside iteration loop. Matrixvector multiplication for matrix [A] in sparse-row format is performed as follows: do j=1,n y[j] = 0 do i=prow[j],prow[j+1]-1 y[j] = y[j] + a[i]*x[col[i]] Experience with tuning C codes shows that little can be done to speed up sparse matrix-vector product. To our surprise the following simple inner loop unrolling may improve Java code performance: do j=1,n y[j] = 0 do i=prow[j],prow[j+1]-1, y[j] = y[j]+a[i]*x[col[i]] +a[i+1]*x[col[i+1]]+a[i+2]*x[col[i+2]] Experiments with unrolling the outer loop lead to slower calculations. The speed up of the sparse matrix-vector product after inner loop unrolling and lack of it after outer loop unrolling can be explained by the internal compilation features of the Java compilers. 4 Experimental Results We compared our C and Java implementations of the finite element method on the series of three-dimensional elasticity problems. The test problem is simple tension of an elastic cube. Three-dimensional meshes of E E E bricktype 20-node elements are used for C-Java benchmarking. The value of E varies from 4 to 14 thus providing meshes from 64 elements (1275 degrees of freedom) to 2744 elements (8475 degrees of freedom). The mesh with E = 8 is shown in Fig. 2. Desktop computer with Intel Pentium GHz processor (5 MHz frontside bus and 512 KB L2 cache) was used for running the C and the Java finite element codes. The C code was compiled using Microsoft Visual C with maximum speed optimization. The Java code was compiled using javac compiler developed by Sun Microsystems with optimization option -O and run using Java virtual machine (JVM). Three JVMs were used: with Symantec Just-In-Time compiler; Java HotSpot Client VM b02; Figure 2. Finite element mesh of brick-type 20- node elements. t C /t Java Assembly of profile system, Pentium 4 2.8GHz JVM Figure. Ratio of the C code time to the Java code time for assembly of the global stiffness matrix in the profile format. Java HotSpot Client VM b06. Results for assembly of the global stiffness matrix in the profile format and for the LDU solution of the equation system are presented in Figures -4. Since it is difficult to determine megaflops rate for the assembly phase we present C/Java performance comparison as ratios of computing time used by the C code to computing time used by the Java code. Assembly of the stiffness matrix in the profile format is faster with than with C code. Performance of JVMs 1. and 1.4 is around 75% of the C code performance. Fig. 4 shows megaflops rates for the LDU solution of the equation system stored in the profile format. Untuned version of the Java code produces approximately same speed of calculation for all JVMs. Java performance of the untuned code is roughly 40% of C performance. Tuning of C and Java codes changes the performance ratios
5 800 Untuned LDU solution, Pentium 4 2.8GHz 1 Tuned LDU solution, Pentium 4 2.8GHz 600 JVM JVM (a) (b) Figure 4. Java and C Megaflops rates for the LDU solution before tuning (a) and after tuning (b). t C /t Java Assembly of sparse row system Pentium 4 2.8GHz JVM Figure 5. Ratio of the C code time to the Java code time for assembly of the stiffness matrix in the sparse row format. dramatically (Fig. 4,b). shows computing rates, which are around 90% of the C code rates. JVMs 1. and 1.4 produces lower speed for the tuned LDU code. Significant performance drops are observed for the tuned LDU code when using JVM 1.. Such phenomena can be explained by data block conflicts in cash memory for certain profiles of the equation system. Fig. 5 presents comparison of C and Java speeds for the assembly of the global stiffness matrix in the sparse row format. produces best speed. The speed of Java code run with is higher than the C code speed. Lower speeds are shown by JVMs 1. and 1.4 (60% of the C speed). Megaflops rates for the PCG solution of equation system are depicted in Fig. 6. For the untuned PCG solution, Java is about two times slower then C. Tuning does not affect the speed of the C code. However, simple code tuning with unrolling only inner loop of the sparse matrix-vector product improves Java performance considerably making the Java speed equal to 75% of the C speed. There is a recommendation [9] to use and to run it with the -server option in order to increase speed of the Java codes. Our attempts to do so showed that the finite element computations are 20% slower with the -server option in comparison to the default -client option. The data presented in Figs -6 shows performance results for the three types of computations: 1) Calculation of element stiffness matrices and assembly of the global stiffness matrix: mostly computations with scalar variables; 2) LDU solution of the equation system: mostly triple loop for multiply-add operations for columns with a consecutive access to operands; ) PCG solution of the equation system: mostly double loop for multiply-add operations with a nonconsecutive access to operands. The experimental results show that the performance of Java is on par with C for computations involving mostly scalar variables. For multiply-add operations with the consecutive access to array elements inside the triple loop the Java performance can be 90% of the C performance after tuning. For multiply-add operations with the non-consecutive access to array elements inside double loops, the Java performance is 75% of the C performance. It should be noted that this conclusion is true if the proper choice of the Java machine is done (). While it is reasonable to use the latest Java SDK (Software Development Kit) for most purposes, we can recommend also to install Java Runtime
6 600 Untuned PCG solution, Pentium 4 2.8GHz 600 Tuned PCG solution, Pentium 4 2.8GHz JVM JVM (a) (b) Figure 6. Java and C Megaflops rates for the PCG solution before tuning (a) and after tuning (b). Environment JRE 1.2 and to employ it for performing large finite element analyses. 5 Conclusion We have designed the object-oriented version of the threedimensional finite element code for elasticity problems and implemented it in Java programming language. Special attention has been devoted to the efficient implementation of computationally intensive sections of the code. The performance of the Java code has been compared to the performance of the analogous C code on the solution of three-dimensional elasticity problems using a computer with Intel Pentium 4 processor. Java Virtual Machines 1.2, 1. and 1.4 were used for running Java code. The experimental results show that the performance of the Java finite element code is roughly equal to the performance of the C code for calculation of element stiffness matrices and assembly of the global equation system when using. JVMs 1. and 1.4 provide lower performance. Untuned Java code demonstrates relatively low performance for the LDU solution of the equation system in the profile format. However, tuning with blocking technique affects speed of the Java code more than speed of the C code. Performance of the tuned Java code running on is about 90% of the C code performance. The PCG iterative solution of the equation system is 0% slower using the Java tuned code in comparison to the C tuned code. It is possible to conclude that the Java language is quite suitable for development of finite element software. With the use of proper coding the performance of the Java code is comparable to the performance of the corresponding tuned C code. It is recommended using for large finite element analyses. References [1] K.-J. Bathe, Finite Element Procedures (Englewood Cliffs: Prentice- Hall, 1996). [2] I.M. Smith and D.V. Griffiths, Programming the Finite Element Method (Chichester: Wiley, 1998). [] R.I. Mackie, Using objects to handle complexity in finite element software, Engineering with Computers, 1, 1997, [4] R.I. Mackie, Object-Oriented Methods and Finite Element Analysis (Stirling: Saxe-Coburg, 1). [5] Y. Dubois-Pelerin and P. Pegon, Object-oriented programming in nonlinear finite element analysis, Computers and Structures, 67, 1998, [6] J. Gosling, B. Joy and G. Steele, The Java Language Specification (Reading, MA: Addison-Wesley, 1996). [7] T. Lindholm and F. Yellin, The Java Virtual Machine Specification (Reading, MA: Addison-Wesley, 1996). [8] R.F. Boisvert, J. Moreira, M. Philippsen and R. Pozo, Java and numerical computing, Computing in Science and Engineering, March/April, 1, [9] D. Kruger, Performance tuning in Java, Java Developers Journal, August, 2, [10] J.E. Moreira, S.P. Midkiff, M. Gupta, P.V. Artigas, M. Snir and R.D. Lawrence, Java programming for high-performance numerical computing, IBM Systems Journal, 9, 0,
Java Performance Analysis for Scientific Computing
Java Performance Analysis for Scientific Computing Roldan Pozo Leader, Mathematical Software Group National Institute of Standards and Technology USA UKHEC: Java for High End Computing Nov. 20th, 2000
More informationFinite element algorithm with adaptive quadtree-octree mesh refinement
ANZIAM J. 46 (E) ppc15 C28, 2005 C15 Finite element algorithm with adaptive quadtree-octree mesh refinement G. P. Nikishkov (Received 18 October 2004; revised 24 January 2005) Abstract Certain difficulties
More informationHistory Introduction to Java Characteristics of Java Data types
Course Name: Advanced Java Lecture 1 Topics to be covered History Introduction to Java Characteristics of Java Data types What is Java? An Object-Oriented Programming Language developed at Sun Microsystems
More informationMatrix Multiplication
Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2018 1 / 32 Outline 1 Matrix operations Importance Dense and sparse
More informationMatrix Multiplication
Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2013 1 / 32 Outline 1 Matrix operations Importance Dense and sparse
More informationObject Oriented Finite Element Modeling
Object Oriented Finite Element Modeling Bořek Patzák Czech Technical University Faculty of Civil Engineering Department of Structural Mechanics Thákurova 7, 166 29 Prague, Czech Republic January 2, 2018
More informationTechniques for Optimizing FEM/MoM Codes
Techniques for Optimizing FEM/MoM Codes Y. Ji, T. H. Hubing, and H. Wang Electromagnetic Compatibility Laboratory Department of Electrical & Computer Engineering University of Missouri-Rolla Rolla, MO
More informationEFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI
EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI 1 Akshay N. Panajwar, 2 Prof.M.A.Shah Department of Computer Science and Engineering, Walchand College of Engineering,
More informationIntroduction to Java Programming
Introduction to Java Programming Lecture 1 CGS 3416 Spring 2017 1/9/2017 Main Components of a computer CPU - Central Processing Unit: The brain of the computer ISA - Instruction Set Architecture: the specific
More informationIntroduction to Java. Lecture 1 COP 3252 Summer May 16, 2017
Introduction to Java Lecture 1 COP 3252 Summer 2017 May 16, 2017 The Java Language Java is a programming language that evolved from C++ Both are object-oriented They both have much of the same syntax Began
More informationAdaptive Surface Modeling Using a Quadtree of Quadratic Finite Elements
Adaptive Surface Modeling Using a Quadtree of Quadratic Finite Elements G. P. Nikishkov University of Aizu, Aizu-Wakamatsu 965-8580, Japan niki@u-aizu.ac.jp http://www.u-aizu.ac.jp/ niki Abstract. This
More informationJava On Steroids: Sun s High-Performance Java Implementation. History
Java On Steroids: Sun s High-Performance Java Implementation Urs Hölzle Lars Bak Steffen Grarup Robert Griesemer Srdjan Mitrovic Sun Microsystems History First Java implementations: interpreters compact
More informationStorage Formats for Sparse Matrices in Java
Storage Formats for Sparse Matrices in Java Mikel Luján, Anila Usman, Patrick Hardie, T.L. Freeman, and John R. Gurd Centre for Novel Computing, The University of Manchester, Oxford Road, Manchester M13
More informationHYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE
HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER AVISHA DHISLE PRERIT RODNEY ADHISLE PRODNEY 15618: PARALLEL COMPUTER ARCHITECTURE PROF. BRYANT PROF. KAYVON LET S
More informationThe numerical simulation of complex PDE problems. A numerical simulation project The finite element method for solving a boundary-value problem in R 2
Universidad de Chile The numerical simulation of complex PDE problems Facultad de Ciencias Físicas y Matemáticas P. Frey, M. De Buhan Year 2008 MA691 & CC60X A numerical simulation project The finite element
More informationContents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet
Contents 2 F10: Parallel Sparse Matrix Computations Figures mainly from Kumar et. al. Introduction to Parallel Computing, 1st ed Chap. 11 Bo Kågström et al (RG, EE, MR) 2011-05-10 Sparse matrices and storage
More informationUntyped Memory in the Java Virtual Machine
Untyped Memory in the Java Virtual Machine Andreas Gal and Michael Franz University of California, Irvine {gal,franz}@uci.edu Christian W. Probst Technical University of Denmark probst@imm.dtu.dk July
More informationInteraction of JVM with x86, Sparc and MIPS
Interaction of JVM with x86, Sparc and MIPS Sasikanth Avancha, Dipanjan Chakraborty, Dhiral Gada, Tapan Kamdar {savanc1, dchakr1, dgada1, kamdar}@cs.umbc.edu Department of Computer Science and Electrical
More informationOptimization of Vertical and Horizontal Beamforming Kernels on the PowerPC G4 Processor with AltiVec Technology
Optimization of Vertical and Horizontal Beamforming Kernels on the PowerPC G4 Processor with AltiVec Technology EE382C: Embedded Software Systems Final Report David Brunke Young Cho Applied Research Laboratories:
More informationMathematics and Computer Science
Technical Report TR-2006-010 Revisiting hypergraph models for sparse matrix decomposition by Cevdet Aykanat, Bora Ucar Mathematics and Computer Science EMORY UNIVERSITY REVISITING HYPERGRAPH MODELS FOR
More informationContents. I The Basic Framework for Stationary Problems 1
page v Preface xiii I The Basic Framework for Stationary Problems 1 1 Some model PDEs 3 1.1 Laplace s equation; elliptic BVPs... 3 1.1.1 Physical experiments modeled by Laplace s equation... 5 1.2 Other
More informationApplication of GPU-Based Computing to Large Scale Finite Element Analysis of Three-Dimensional Structures
Paper 6 Civil-Comp Press, 2012 Proceedings of the Eighth International Conference on Engineering Computational Technology, B.H.V. Topping, (Editor), Civil-Comp Press, Stirlingshire, Scotland Application
More informationOOFEM An Object Oriented Framework for Finite Element Analysis B. Patzák, Z. Bittnar
OOFEM An Object Oriented Framework for Finite Element Analysis B. Patzák, Z. Bittnar This paper presents the design principles and structure of the object-oriented finite element software OOFEM, which
More informationParallelism of Java Bytecode Programs and a Java ILP Processor Architecture
Australian Computer Science Communications, Vol.21, No.4, 1999, Springer-Verlag Singapore Parallelism of Java Bytecode Programs and a Java ILP Processor Architecture Kenji Watanabe and Yamin Li Graduate
More informationAdvanced Object-Oriented Programming Introduction to OOP and Java
Advanced Object-Oriented Programming Introduction to OOP and Java Dr. Kulwadee Somboonviwat International College, KMITL kskulwad@kmitl.ac.th Course Objectives Solidify object-oriented programming skills
More informationParallel solution for finite element linear systems of. equations on workstation cluster *
Aug. 2009, Volume 6, No.8 (Serial No.57) Journal of Communication and Computer, ISSN 1548-7709, USA Parallel solution for finite element linear systems of equations on workstation cluster * FU Chao-jiang
More informationOutline. Introduction to Java. What Is Java? History. Java 2 Platform. Java 2 Platform Standard Edition. Introduction Java 2 Platform
Outline Introduction to Java Introduction Java 2 Platform CS 3300 Object-Oriented Concepts Introduction to Java 2 What Is Java? History Characteristics of Java History James Gosling at Sun Microsystems
More informationJava Internals. Frank Yellin Tim Lindholm JavaSoft
Java Internals Frank Yellin Tim Lindholm JavaSoft About This Talk The JavaSoft implementation of the Java Virtual Machine (JDK 1.0.2) Some companies have tweaked our implementation Alternative implementations
More informationA Test Suite for High-Performance Parallel Java
page 1 A Test Suite for High-Performance Parallel Java Jochem Häuser, Thorsten Ludewig, Roy D. Williams, Ralf Winkelmann, Torsten Gollnick, Sharon Brunett, Jean Muylaert presented at 5th National Symposium
More informationScientific Computing. Some slides from James Lambers, Stanford
Scientific Computing Some slides from James Lambers, Stanford Dense Linear Algebra Scaling and sums Transpose Rank-one updates Rotations Matrix vector products Matrix Matrix products BLAS Designing Numerical
More informationA parallel computing framework and a modular collaborative cfd workbench in Java
Advances in Fluid Mechanics VI 21 A parallel computing framework and a modular collaborative cfd workbench in Java S. Sengupta & K. P. Sinhamahapatra Department of Aerospace Engineering, IIT Kharagpur,
More informationBusiness and Scientific Applications of the Java Programming Language
Business and Scientific Applications of the Java Programming Language Angelo Bertolli April 24, 2005 Abstract While Java is arguably a good language with that to write both scientific and business applications,
More informationCh 09 Multidimensional arrays & Linear Systems. Andrea Mignone Physics Department, University of Torino AA
Ch 09 Multidimensional arrays & Linear Systems Andrea Mignone Physics Department, University of Torino AA 2017-2018 Multidimensional Arrays A multidimensional array is an array containing one or more arrays.
More information16.10 Exercises. 372 Chapter 16 Code Improvement. be translated as
372 Chapter 16 Code Improvement 16.10 Exercises 16.1 In Section 16.2 we suggested replacing the instruction r1 := r2 / 2 with the instruction r1 := r2 >> 1, and noted that the replacement may not be correct
More informationIlya Lashuk, Merico Argentati, Evgenii Ovtchinnikov, Andrew Knyazev (speaker)
Ilya Lashuk, Merico Argentati, Evgenii Ovtchinnikov, Andrew Knyazev (speaker) Department of Mathematics and Center for Computational Mathematics University of Colorado at Denver SIAM Conference on Parallel
More informationARRAY DATA STRUCTURE
ARRAY DATA STRUCTURE Isha Batra, Divya Raheja Information Technology Dronacharya College Of Engineering, Farukhnagar,Gurgaon Abstract- In computer science, an array data structure or simply an array is
More informationAlgorithms and Architecture. William D. Gropp Mathematics and Computer Science
Algorithms and Architecture William D. Gropp Mathematics and Computer Science www.mcs.anl.gov/~gropp Algorithms What is an algorithm? A set of instructions to perform a task How do we evaluate an algorithm?
More informationSuper Matrix Solver-P-ICCG:
Super Matrix Solver-P-ICCG: February 2011 VINAS Co., Ltd. Project Development Dept. URL: http://www.vinas.com All trademarks and trade names in this document are properties of their respective owners.
More informationStatic analysis of eolicblade through finite element method and OOP C++
International Conference on Control, Engineering & Information echnology (CEI 4) Proceedings Copyright IPCO-4 ISSN 56-568 Static analysis of eolicblade through finite element method and OOP C++ MateusDantas,
More informationCache Memories. Topics. Next time. Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance
Cache Memories Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance Next time Dynamic memory allocation and memory bugs Fabián E. Bustamante,
More informationIntroduction to Parallel Computing
Introduction to Parallel Computing W. P. Petersen Seminar for Applied Mathematics Department of Mathematics, ETHZ, Zurich wpp@math. ethz.ch P. Arbenz Institute for Scientific Computing Department Informatik,
More informationA Parallel Implementation of the BDDC Method for Linear Elasticity
A Parallel Implementation of the BDDC Method for Linear Elasticity Jakub Šístek joint work with P. Burda, M. Čertíková, J. Mandel, J. Novotný, B. Sousedík Institute of Mathematics of the AS CR, Prague
More informationPerformance Evaluations for Parallel Image Filter on Multi - Core Computer using Java Threads
Performance Evaluations for Parallel Image Filter on Multi - Core Computer using Java s Devrim Akgün Computer Engineering of Technology Faculty, Duzce University, Duzce,Turkey ABSTRACT Developing multi
More informationPerformance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply
Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply University of California, Berkeley Berkeley Benchmarking and Optimization Group (BeBOP) http://bebop.cs.berkeley.edu
More informationUsing Java for Scientific Computing. Mark Bul EPCC, University of Edinburgh
Using Java for Scientific Computing Mark Bul EPCC, University of Edinburgh markb@epcc.ed.ac.uk Java and Scientific Computing? Benefits of Java for Scientific Computing Portability Network centricity Software
More informationChapter 1 GETTING STARTED. SYS-ED/ Computer Education Techniques, Inc.
Chapter 1 GETTING STARTED SYS-ED/ Computer Education Techniques, Inc. Objectives You will learn: Java platform. Applets and applications. Java programming language: facilities and foundation. Memory management
More informationA Preliminary Workload Analysis of SPECjvm2008
A Preliminary Workload Analysis of SPECjvm2008 Hitoshi Oi The University of Aizu, Aizu Wakamatsu, JAPAN oi@oslab.biz Abstract SPECjvm2008 is a new benchmark program suite for measuring client-side Java
More informationIntermediate Representations
Intermediate Representations Intermediate Representations (EaC Chaper 5) Source Code Front End IR Middle End IR Back End Target Code Front end - produces an intermediate representation (IR) Middle end
More informationApproaches to Parallel Implementation of the BDDC Method
Approaches to Parallel Implementation of the BDDC Method Jakub Šístek Includes joint work with P. Burda, M. Čertíková, J. Mandel, J. Novotný, B. Sousedík. Institute of Mathematics of the AS CR, Prague
More informationAutomated Finite Element Computations in the FEniCS Framework using GPUs
Automated Finite Element Computations in the FEniCS Framework using GPUs Florian Rathgeber (f.rathgeber10@imperial.ac.uk) Advanced Modelling and Computation Group (AMCG) Department of Earth Science & Engineering
More informationCache-oblivious Programming
Cache-oblivious Programming Story so far We have studied cache optimizations for array programs Main transformations: loop interchange, loop tiling Loop tiling converts matrix computations into block matrix
More informationFinite Element Implementation
Chapter 8 Finite Element Implementation 8.1 Elements Elements andconditions are the main extension points of Kratos. New formulations can be introduced into Kratos by implementing a new Element and its
More informationReport of Linear Solver Implementation on GPU
Report of Linear Solver Implementation on GPU XIANG LI Abstract As the development of technology and the linear equation solver is used in many aspects such as smart grid, aviation and chemical engineering,
More informationComputers in Engineering COMP 208. Computer Structure. Computer Architecture. Computer Structure Michael A. Hawker
Computers in Engineering COMP 208 Computer Structure Michael A. Hawker Computer Structure We will briefly look at the structure of a modern computer That will help us understand some of the concepts that
More informationThe p-sized partitioning algorithm for fast computation of factorials of numbers
J Supercomput (2006) 38:73 82 DOI 10.1007/s11227-006-7285-5 The p-sized partitioning algorithm for fast computation of factorials of numbers Ahmet Ugur Henry Thompson C Science + Business Media, LLC 2006
More informationOptimization of FEM solver for heterogeneous multicore processor Cell. Noriyuki Kushida 1
Optimization of FEM solver for heterogeneous multicore processor Cell Noriyuki Kushida 1 1 Center for Computational Science and e-system Japan Atomic Energy Research Agency 6-9-3 Higashi-Ueno, Taito-ku,
More informationLarge-scale Structural Analysis Using General Sparse Matrix Technique
Large-scale Structural Analysis Using General Sparse Matrix Technique Yuan-Sen Yang 1), Shang-Hsien Hsieh 1), Kuang-Wu Chou 1), and I-Chau Tsai 1) 1) Department of Civil Engineering, National Taiwan University,
More informationTRIREME Commander: Managing Simulink Simulations And Large Datasets In Java
TRIREME Commander: Managing Simulink Simulations And Large Datasets In Java Andrew Newell Electronic Warfare & Radar Division, Defence Science and Technology Organisation andrew.newell@dsto.defence.gov.au
More information2 Introduction to Java. Introduction to Programming 1 1
2 Introduction to Java Introduction to Programming 1 1 Objectives At the end of the lesson, the student should be able to: Describe the features of Java technology such as the Java virtual machine, garbage
More informationUsing Analytic QP and Sparseness to Speed Training of Support Vector Machines
Using Analytic QP and Sparseness to Speed Training of Support Vector Machines John C. Platt Microsoft Research 1 Microsoft Way Redmond, WA 9805 jplatt@microsoft.com Abstract Training a Support Vector Machine
More informationUNIPROCESSOR PERFORMANCE ANALYSIS OF A REPRESENTATIVE WORKLOAD OF SANDIA NATIONAL LABORATORIES SCIENTIFIC APPLICATIONS CHARLES LAVERTY, B.S.
UNIPROCESSOR PERFORMANCE ANALYSIS OF A REPRESENTATIVE WORKLOAD OF SANDIA NATIONAL LABORATORIES SCIENTIFIC APPLICATIONS BY CHARLES LAVERTY, B.S. A thesis submitted to the Graduate School in partial fulfillment
More informationLecture 1: Overview of Java
Lecture 1: Overview of Java What is java? Developed by Sun Microsystems (James Gosling) A general-purpose object-oriented language Based on C/C++ Designed for easy Web/Internet applications Widespread
More informationEvaluation of sparse LU factorization and triangular solution on multicore architectures. X. Sherry Li
Evaluation of sparse LU factorization and triangular solution on multicore architectures X. Sherry Li Lawrence Berkeley National Laboratory ParLab, April 29, 28 Acknowledgement: John Shalf, LBNL Rich Vuduc,
More informationSpecial Topics: Programming Languages
Lecture #23 0 V22.0490.001 Special Topics: Programming Languages B. Mishra New York University. Lecture # 23 Lecture #23 1 Slide 1 Java: History Spring 1990 April 1991: Naughton, Gosling and Sheridan (
More informationESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report
ESPRESO ExaScale PaRallel FETI Solver Hybrid FETI Solver Report Lubomir Riha, Tomas Brzobohaty IT4Innovations Outline HFETI theory from FETI to HFETI communication hiding and avoiding techniques our new
More informationObject-oriented programming in boundary element methods using C11
Advances in Engineering Software 30 (1999) 127±132 Object-oriented programming in boundary element methods using C11 Wenqing Wang*, Xing Ji, Yuangong Wang Department of Engineering Mechanics and Technology,
More informationAssoc. Prof. Dr. Marenglen Biba. (C) 2010 Pearson Education, Inc. All rights reserved.
Assoc. Prof. Dr. Marenglen Biba (C) 2010 Pearson Education, Inc. All rights reserved. Course: Object-Oriented Programming with Java Instructor : Assoc. Prof. Dr. Marenglen Biba Office : Faculty building
More informationDense Matrix Algorithms
Dense Matrix Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 2003. Topic Overview Matrix-Vector Multiplication
More informationBoca Raton Community High School AP Computer Science A - Syllabus 2009/10
Boca Raton Community High School AP Computer Science A - Syllabus 2009/10 Instructor: Ronald C. Persin Course Resources Java Software Solutions for AP Computer Science, A. J. Lewis, W. Loftus, and C. Cocking,
More information211: Computer Architecture Summer 2016
211: Computer Architecture Summer 2016 Liu Liu Topic: Assembly Programming Storage - Assembly Programming: Recap - Call-chain - Factorial - Storage: - RAM - Caching - Direct - Mapping Rutgers University
More information1. Introduction. Java. Fall 2009 Instructor: Dr. Masoud Yaghini
1. Introduction Java Fall 2009 Instructor: Dr. Masoud Yaghini Outline Introduction Introduction The Java Programming Language The Java Platform References Java technology Java is A high-level programming
More informationA substructure based parallel dynamic solution of large systems on homogeneous PC clusters
CHALLENGE JOURNAL OF STRUCTURAL MECHANICS 1 (4) (2015) 156 160 A substructure based parallel dynamic solution of large systems on homogeneous PC clusters Semih Özmen, Tunç Bahçecioğlu, Özgür Kurç * Department
More informationExample 24 Spring-back
Example 24 Spring-back Summary The spring-back simulation of sheet metal bent into a hat-shape is studied. The problem is one of the famous tests from the Numisheet 93. As spring-back is generally a quasi-static
More informationANALYZE OF PROGRAMMING TECHNIQUES FOR IMPROVEMENT OF JAVA CODE PERFORMANCE
ANALYZE OF PROGRAMMING TECHNIQUES FOR IMPROVEMENT OF JAVA CODE PERFORMANCE Ognian Nakov, Dimiter Tenev Faculty of Computer Systems And Technologies, Technical University-Sofia, Kliment Ohridski 8, Postal
More informationSeminar report Java Submitted in partial fulfillment of the requirement for the award of degree Of CSE
A Seminar report On Java Submitted in partial fulfillment of the requirement for the award of degree Of CSE SUBMITTED TO: www.studymafia.org SUBMITTED BY: www.studymafia.org 1 Acknowledgement I would like
More informationINTERIOR POINT METHOD BASED CONTACT ALGORITHM FOR STRUCTURAL ANALYSIS OF ELECTRONIC DEVICE MODELS
11th World Congress on Computational Mechanics (WCCM XI) 5th European Conference on Computational Mechanics (ECCM V) 6th European Conference on Computational Fluid Dynamics (ECFD VI) E. Oñate, J. Oliver
More informationPARALLELIZATION OF POTENTIAL FLOW SOLVER USING PC CLUSTERS
Proceedings of FEDSM 2000: ASME Fluids Engineering Division Summer Meeting June 11-15,2000, Boston, MA FEDSM2000-11223 PARALLELIZATION OF POTENTIAL FLOW SOLVER USING PC CLUSTERS Prof. Blair.J.Perot Manjunatha.N.
More informationA Finite Element Method for Deformable Models
A Finite Element Method for Deformable Models Persephoni Karaolani, G.D. Sullivan, K.D. Baker & M.J. Baines Intelligent Systems Group, Department of Computer Science University of Reading, RG6 2AX, UK,
More informationA PARALLEL IMPLEMENTATION OF A FEM SOLVER IN SCILAB
powered by A PARALLEL IMPLEMENTATION OF A FEM SOLVER IN SCILAB Author: Massimiliano Margonari Keywords. Scilab; Open source software; Parallel computing; Mesh partitioning, Heat transfer equation. Abstract:
More informationC++ Spring Break Packet 11 The Java Programming Language
C++ Spring Break Packet 11 The Java Programming Language! Programmers write instructions in various programming languages, some directly understandable by computers and others requiring intermediate translation
More informationHardware-Supported Pointer Detection for common Garbage Collections
2013 First International Symposium on Computing and Networking Hardware-Supported Pointer Detection for common Garbage Collections Kei IDEUE, Yuki SATOMI, Tomoaki TSUMURA and Hiroshi MATSUO Nagoya Institute
More informationCSC D70: Compiler Optimization Memory Optimizations
CSC D70: Compiler Optimization Memory Optimizations Prof. Gennady Pekhimenko University of Toronto Winter 2018 The content of this lecture is adapted from the lectures of Todd Mowry, Greg Steffan, and
More informationOptimizing Data Locality for Iterative Matrix Solvers on CUDA
Optimizing Data Locality for Iterative Matrix Solvers on CUDA Raymond Flagg, Jason Monk, Yifeng Zhu PhD., Bruce Segee PhD. Department of Electrical and Computer Engineering, University of Maine, Orono,
More information2.2 Weighting function
Annual Report (23) Kawahara Lab. On Shape Function of Element-Free Galerkin Method for Flow Analysis Daigo NAKAI and Mutsuto KAWAHARA Department of Civil Engineering, Chuo University, Kasuga 3 27, Bunkyo
More informationCSCE 5160 Parallel Processing. CSCE 5160 Parallel Processing
HW #9 10., 10.3, 10.7 Due April 17 { } Review Completing Graph Algorithms Maximal Independent Set Johnson s shortest path algorithm using adjacency lists Q= V; for all v in Q l[v] = infinity; l[s] = 0;
More informationBehavioral Array Mapping into Multiport Memories Targeting Low Power 3
Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Preeti Ranjan Panda and Nikil D. Dutt Department of Information and Computer Science University of California, Irvine, CA 92697-3425,
More informationHigh Performance Iterative Solver for Linear System using Multi GPU )
High Performance Iterative Solver for Linear System using Multi GPU ) Soichiro IKUNO 1), Norihisa FUJITA 1), Yuki KAWAGUCHI 1),TakuITOH 2), Susumu NAKATA 3), Kota WATANABE 4) and Hiroyuki NAKAMURA 5) 1)
More informationEngineers can be significantly more productive when ANSYS Mechanical runs on CPUs with a high core count. Executive Summary
white paper Computer-Aided Engineering ANSYS Mechanical on Intel Xeon Processors Engineer Productivity Boosted by Higher-Core CPUs Engineers can be significantly more productive when ANSYS Mechanical runs
More informationIntroduction to Parallel. Programming
University of Nizhni Novgorod Faculty of Computational Mathematics & Cybernetics Introduction to Parallel Section 9. Programming Parallel Methods for Solving Linear Systems Gergel V.P., Professor, D.Sc.,
More informationAccelerating Double Precision FEM Simulations with GPUs
Accelerating Double Precision FEM Simulations with GPUs Dominik Göddeke 1 3 Robert Strzodka 2 Stefan Turek 1 dominik.goeddeke@math.uni-dortmund.de 1 Mathematics III: Applied Mathematics and Numerics, University
More informationJava and C Performance Comparison on Palm OS. Zhi-Kai Xin
Java and C Performance Comparison on Palm OS Zhi-Kai Xin zxin@cs.columbia.edu Abstract This paper investigates the performance comparisons of Java and C on Palm OS PDA device. The performance comparison
More informationRuntime Application Self-Protection (RASP) Performance Metrics
Product Analysis June 2016 Runtime Application Self-Protection (RASP) Performance Metrics Virtualization Provides Improved Security Without Increased Overhead Highly accurate. Easy to install. Simple to
More informationSecond Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering
State of the art distributed parallel computational techniques in industrial finite element analysis Second Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering Ajaccio, France
More informationCS260 Intro to Java & Android 02.Java Technology
CS260 Intro to Java & Android 02.Java Technology CS260 - Intro to Java & Android 1 Getting Started: http://docs.oracle.com/javase/tutorial/getstarted/index.html Java Technology is: (a) a programming language
More informationCoupled analysis of material flow and die deflection in direct aluminum extrusion
Coupled analysis of material flow and die deflection in direct aluminum extrusion W. Assaad and H.J.M.Geijselaers Materials innovation institute, The Netherlands w.assaad@m2i.nl Faculty of Engineering
More informationIntelligent BEE Method for Matrix-vector Multiplication on Parallel Computers
Intelligent BEE Method for Matrix-vector Multiplication on Parallel Computers Seiji Fujino Research Institute for Information Technology, Kyushu University, Fukuoka, Japan, 812-8581 E-mail: fujino@cc.kyushu-u.ac.jp
More informationEpetra Performance Optimization Guide
SAND2005-1668 Unlimited elease Printed March 2005 Updated for Trilinos 9.0 February 2009 Epetra Performance Optimization Guide Michael A. Heroux Scalable Algorithms Department Sandia National Laboratories
More informationFinite Element Integration and Assembly on Modern Multi and Many-core Processors
Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,
More informationAdvanced Numerical Techniques for Cluster Computing
Advanced Numerical Techniques for Cluster Computing Presented by Piotr Luszczek http://icl.cs.utk.edu/iter-ref/ Presentation Outline Motivation hardware Dense matrix calculations Sparse direct solvers
More information10/26/ Solving Systems of Linear Equations Using Matrices. Objectives. Matrices
6.1 Solving Systems of Linear Equations Using Matrices Objectives Write the augmented matrix for a linear system. Perform matrix row operations. Use matrices and Gaussian elimination to solve systems.
More information