JAVA PERFORMANCE IN FINITE ELEMENT COMPUTATIONS

Size: px
Start display at page:

Download "JAVA PERFORMANCE IN FINITE ELEMENT COMPUTATIONS"

Transcription

1 JAVA PERFORMANCE IN FINITE ELEMENT COMPUTATIONS G.P. NIKISHKOV University of Aizu Aizu-Wakamatsu, Fukushima , Japan ABSTRACT 1 The performance of the developed Java finite element code is compared to that of the C finite element code on the solution of three-dimensional elasticity problems using Intel Pentium 4 computer. Untuned Java code is approximately two times slower then analogous C code. It is shown that code tuning with the use of blocking technique can provide Java/C performance ratio 90% for the LDU solution of finite element equations. Java performance for PCG iterative solution algorithm tuned by inner loop unrolling is 75% of the C code. We recommend using Java Virtual Machine since in many cases it is considerably faster in finite element computations than JVMs 1. and 1.4. KEY WORDS Finite Element Methods, Java-based Simulation, Performance, Tuning 1 Introduction Finite element codes were traditionally developed in Fortran [1] and recently in Fortran 90 [2]. During last decade FEM developers started using C++ language in order to handle complexity in finite element software [-5]. Using the object-oriented approach with data hiding, encapsulation and inheritance, allows creating reliable and extensible finite element codes. Java language [6] developed by Sun Microsystems possesses features, which makes it attractive for using in computational modelling. Java is a simple language with rich collection of libraries implementing various APIs (Application Programming Interfaces). With Java it is easy to create Graphical User Interfaces and to communicate with other computers over a network. Java has built-in garbage collector preventing memory leaks. Another advantage of Java is its portability. Java Virtual Machines (JVM) [7] are developed for all major computer systems. JVM is embedded in most popular Web browsers. Java applets can be downloaded through the net and executed within Web browser. While object-oriented programming can be done with C++ language, other useful features such as actual portability and garbage collection are unique characteristics of Java language. 1 Applied Simulation and Modeling, Procs of the 12th IASTED Int. Conf., Sept. -5,, Marbella, Spain., ACTA Press, Anaheim,, pp Despite its attractive features, Java is not widely used in engineering computations. Java byte code translation into native instructions leads to a slower operation of Java code. However, Just-In-Time compiler (JIT) can significantly speed up the execution of Java applications and applets. The JIT, which is an integral part of the JVM takes the bytecodes and compile them into native code before execution. Since Java is a dynamic language, the JIT compiles methods on a method-by-method basis just before they are called. If the same method is called many times or if the method contains loop with many repetitions the effect of re-execution of the native code can make the performance of Java code acceptable. Java performance in numerical computing was considered in several publications [8-10]. It was shown that high-performance numerical codes could be developed in Java with suitable code development techniques. While papers [8-10] deal with general issues of numerical computing, this paper addresses Java performance and tuning in finite element computations. We present our experience in designing the efficient finite element code in Java. The performance of the developed Java finite element code is compared to that of the analogous C code on finite element solutions of three-dimensional elasticity problems using Intel computer. For running Java code we employed Sun JVMs 1.2, 1. and 1.4. It is shown that with proper coding and JVM selection the Java finite element code can be almost as fast as the C code. 2 Java Finite Element Code Object-oriented approach is used widely in order to create reusable, extensible, and reliable components, which can be used in later research and practical applications. However, full object-oriented programming approach might not be always ideal for computationally intensive sections of codes. Object creation and destruction in Java are expensive operations. The use of large amount of small objects can lead to considerable time and space overhead. As experiments show, a possible way to increase computing performance is reducing expenses for object creation in the code by using primitive types in place of objects. For a variable of a primitive type the JVM allocates the variable directly on the stack (local variable) or within the memory used for the object (member variable). For such variables there is no object creation overhead, and no

2 class JFem main class controlling FEM solution interface CNST collection of constants used during solution class Element abstract finite element class Element2D8N 2D quadrilateral 8-noded element class ElementD20N D hexahedral 20-noded element class FiniteElementModel - description of the finite element model class LoadVectorAssembler boundary conditions for the finite element model class Material abstract material model class ElasticMaterial material model for elasticity problems class DataFileReader reading data file class Solver abstract finite element solver class ProfileLDUSolver solution of the finite element equation system by the direct LDU method with profile storage of the matrix class SparseRowPCGSolver - solution of the finite element equation system by the preconditioned conjugate gradient method class Node abstract node of the finite element model class Node2D node of the 2D finite element model class NodeD node of the D finite element model Figure 1. Class hierarchy of the JFEM code. garbage collection overhead. Java does not support true multi-dimensional arrays. Because of this it is more appropriate to employ one-dimensional arrays even in the cases where two-subscript notation is used in the mathematical formulation of the problem. It should be noted that computationally critical code sections are small in comparison to the whole code. The whole finite element code can be designed with objectoriented approach. A compromise between using objects and providing high efficiency should be found for the computationally intensive sections of the code. Keeping in mind the above efficiency considerations we developed the Java finite element code JFEM for the solution of two-dimensional and three-dimensional elasticity problems. The class hierarchy of the JFEM code is presented in Fig. 1. The class design allows extensibility of the code. Abstract classes are used for the definition of classes for nodes, finite elements, material models and equation solvers. The abstract class defines the overall structure of the hierarchy. It contains the data members and member methods. Some methods can be implemented in the abstract class; other methods are implemented in class, which is lower in the hierarchy. For example, abstract class Element contains methods for data manipulations (connectivity data and nodal data), which are common to all element types. Methods for computing shape functions, derivatives of shape functions, element stiffness matrix, element load vector etc. are implemented in classes Element2D8N and ElementD20N for the two-dimensional 8-node element and for the three-dimensional 20-node element. It is worth noting that we try to restrict using objects in computationally intensive parts of the finite element procedure. Class Node is used during input of the nodal data for the finite element model. During calculation of the element stiffness matrices and during the assembly and solution of the equation system only primitive types and one-dimensional arrays are used in operations with nodal data. Assembly and Solution of Equation System For linear problems main fraction of computing time is related to calculation of element stiffness matrices, assembly of the equation system and its solution. Here we present algorithms of element stiffness matrix computation and consider two algorithms of equation solution: direct method of decomposition into lower, diagonal and upper matrix (LDU) and iterative preconditioned conjugate gradient (PCG) method..1 Stiffness Matrix Assembly A global stiffness matrix of the structure is assembled of element stiffness matrices. Coefficients of the element stiffness matrix [k] are expressed as follows: kii mn = [ (λ + 2µ) N m N n x i x i V +µ ( Nm x i+1 N m x i+1 + N m x i+2 N m x i+2 )] dv, kij mn = ( ) λ N m N n x i x j + µ N m N n x j x i dv. V Here m, n are local node numbers; i, j are indices related to coordinate axes (x 1, x 2, x ). Cyclic rule is employed in the above equation if coordinate indices become greater

3 than. Material parameters λ and µ are Lame elastic constants. In our computer code integration of the stiffness matrix [k] for the 20-node element is performed using special 14-point integration rule. Since the element stiffness matrix possesses symmetry property, only symmetrical part of the matrix and diagonal coefficients are computed and then used for assembly of the global stiffness matrix. Assembly of the global stiffness matrix is performed with the use of element connectivity information. Assembly algorithm depends on the storage format for the finite element equation system..2 LDU Solution of Equation System Symmetric part of the global stiffness matrix of the order n is stored in a profile form by columns. Each column of the matrix starts from the first top nonzero element and ends at the diagonal element. The matrix is represented by two arrays: one-dimensional double array a, containing matrix elements and a pointer array pcol. Assuming that array indices begin from one, the ith element of pcol contains the index in the array a of the first element of the ith column minus one. The length of the ith column is given by pcol[i+1]-pcol[i]. The length of the array a is equal to pcol[n+1]. The location (row number) of the first nonzero element in the ith column of the matrix [A] is given by the function FN(i): FN(i)=i-(pcol[i+1]-pcol[i])+1. The following correspondence relation can be easily obtained for a transition from two-index matrix notation to one-dimensional array notation: a[i,j] a[i+pcol[j+1]-j]. Solution of a symmetric equation system consists of [U] T [D][U] decomposition of the system matrix followed by forward reduction and backsubstitution for the righthand side. The [U] T [D][U] decomposition takes majority of the computing time. The right-looking algorithm of the decomposition can be presented as the following pseudocode: do j=2,n Cdivt(j) = Cdivt(j) do i=fn(j),j-1 do i=j,n t[i] = a[i,j]/a[i,i] Cmod(j,i) Cmod(j,i) = do j=2,n do k=max(fn(j),fn(i)),j-1 Cdiv(j) a[j,i] -= t[k]*a[k,i] Cdiv(j) = do i=fn(j),j-1 a[i,j] /= a[i,i] Do loop, which takes most time of LDU decomposition is contained in the procedure Cmod(j,i). One column of the matrix is used to modify another column inside inner do loop. Two operands should be loaded from memory in order to perform one Floating-point Multiply-Add (FMA) operation. Data loads can be economized by tuning with the use of blocking technique. After unrolling two outer loops, the tuned version of the LDU decomposition is as follows: do j=1,n,d Bdivt(k,d) = Bdivt(j,d) do j=k,k+d-1 do i=j+d,n,d do i=fn(k),j-1 BBmod(j,i,d) t[i,j] = a[i,j]/a[i,i] do i=j,k+d-1 do j=2,n do l=max(fn(j),fn(i)),j-1 Cdiv(j) a[j,i] -= t[l,j]*a[l,i] BBmod(j,i,d=2) = do k=max(fn(j),fn(i)),j-1 a[j,i] -= t[k,j]*a[k,i] a[j+1,i] -= t[k,j+1]*a[k,i] a[j,i+1] -= t[k,j]*a[k,i+1] a[j+1,i+1] -= t[k,j+1]*a[k,i+1] if j>=fn(j) then a[j+1,i] -= t[j,j+1]*a[j,i] a[j+1,i+1] -= t[j,j+1]*a[j,i+1] end if Method BBmod(j,i,d) performs modification of a column block, which starts from column i by a column block, which starts from column j and contains d columns. The pseudo-code above is given for the block size d = 2 for brevity. In three-dimensional problems, which are solved here, the block size d = is used. It is assumed that columns in the block start at the same row of the matrix a. This is fulfilled automatically if the column block contains columns, which are related to one node of the finite element model.. PCG Solution of Equation System Preconditioned conjugate gradient (PCG) method is an iterative procedure, which does not alter the equation matrix. Because of this, only nonzero coefficients of the finite element global stiffness matrix can be stored. Sparse structure of the matrix should be taken into account in matrix-vector multiplications. We use sparse row format for the equation matrix. In this format all information about matrix is contained in three arrays: a - array of doubles containing non-zero elements of the matrix, row by row; col - array of column indices for non-zero elements of the array a;

4 prow - pointer array of indices of starting elements of matrix rows in the array a, again assuming that indices start from one. Preconditioning techniques are not the subject of this work. Simple diagonal preconditioning is used in our PCG solution procedure of finite element equations. The most time consuming operation in the PCG solution procedure is the sparse matrix-vector product inside iteration loop. Matrixvector multiplication for matrix [A] in sparse-row format is performed as follows: do j=1,n y[j] = 0 do i=prow[j],prow[j+1]-1 y[j] = y[j] + a[i]*x[col[i]] Experience with tuning C codes shows that little can be done to speed up sparse matrix-vector product. To our surprise the following simple inner loop unrolling may improve Java code performance: do j=1,n y[j] = 0 do i=prow[j],prow[j+1]-1, y[j] = y[j]+a[i]*x[col[i]] +a[i+1]*x[col[i+1]]+a[i+2]*x[col[i+2]] Experiments with unrolling the outer loop lead to slower calculations. The speed up of the sparse matrix-vector product after inner loop unrolling and lack of it after outer loop unrolling can be explained by the internal compilation features of the Java compilers. 4 Experimental Results We compared our C and Java implementations of the finite element method on the series of three-dimensional elasticity problems. The test problem is simple tension of an elastic cube. Three-dimensional meshes of E E E bricktype 20-node elements are used for C-Java benchmarking. The value of E varies from 4 to 14 thus providing meshes from 64 elements (1275 degrees of freedom) to 2744 elements (8475 degrees of freedom). The mesh with E = 8 is shown in Fig. 2. Desktop computer with Intel Pentium GHz processor (5 MHz frontside bus and 512 KB L2 cache) was used for running the C and the Java finite element codes. The C code was compiled using Microsoft Visual C with maximum speed optimization. The Java code was compiled using javac compiler developed by Sun Microsystems with optimization option -O and run using Java virtual machine (JVM). Three JVMs were used: with Symantec Just-In-Time compiler; Java HotSpot Client VM b02; Figure 2. Finite element mesh of brick-type 20- node elements. t C /t Java Assembly of profile system, Pentium 4 2.8GHz JVM Figure. Ratio of the C code time to the Java code time for assembly of the global stiffness matrix in the profile format. Java HotSpot Client VM b06. Results for assembly of the global stiffness matrix in the profile format and for the LDU solution of the equation system are presented in Figures -4. Since it is difficult to determine megaflops rate for the assembly phase we present C/Java performance comparison as ratios of computing time used by the C code to computing time used by the Java code. Assembly of the stiffness matrix in the profile format is faster with than with C code. Performance of JVMs 1. and 1.4 is around 75% of the C code performance. Fig. 4 shows megaflops rates for the LDU solution of the equation system stored in the profile format. Untuned version of the Java code produces approximately same speed of calculation for all JVMs. Java performance of the untuned code is roughly 40% of C performance. Tuning of C and Java codes changes the performance ratios

5 800 Untuned LDU solution, Pentium 4 2.8GHz 1 Tuned LDU solution, Pentium 4 2.8GHz 600 JVM JVM (a) (b) Figure 4. Java and C Megaflops rates for the LDU solution before tuning (a) and after tuning (b). t C /t Java Assembly of sparse row system Pentium 4 2.8GHz JVM Figure 5. Ratio of the C code time to the Java code time for assembly of the stiffness matrix in the sparse row format. dramatically (Fig. 4,b). shows computing rates, which are around 90% of the C code rates. JVMs 1. and 1.4 produces lower speed for the tuned LDU code. Significant performance drops are observed for the tuned LDU code when using JVM 1.. Such phenomena can be explained by data block conflicts in cash memory for certain profiles of the equation system. Fig. 5 presents comparison of C and Java speeds for the assembly of the global stiffness matrix in the sparse row format. produces best speed. The speed of Java code run with is higher than the C code speed. Lower speeds are shown by JVMs 1. and 1.4 (60% of the C speed). Megaflops rates for the PCG solution of equation system are depicted in Fig. 6. For the untuned PCG solution, Java is about two times slower then C. Tuning does not affect the speed of the C code. However, simple code tuning with unrolling only inner loop of the sparse matrix-vector product improves Java performance considerably making the Java speed equal to 75% of the C speed. There is a recommendation [9] to use and to run it with the -server option in order to increase speed of the Java codes. Our attempts to do so showed that the finite element computations are 20% slower with the -server option in comparison to the default -client option. The data presented in Figs -6 shows performance results for the three types of computations: 1) Calculation of element stiffness matrices and assembly of the global stiffness matrix: mostly computations with scalar variables; 2) LDU solution of the equation system: mostly triple loop for multiply-add operations for columns with a consecutive access to operands; ) PCG solution of the equation system: mostly double loop for multiply-add operations with a nonconsecutive access to operands. The experimental results show that the performance of Java is on par with C for computations involving mostly scalar variables. For multiply-add operations with the consecutive access to array elements inside the triple loop the Java performance can be 90% of the C performance after tuning. For multiply-add operations with the non-consecutive access to array elements inside double loops, the Java performance is 75% of the C performance. It should be noted that this conclusion is true if the proper choice of the Java machine is done (). While it is reasonable to use the latest Java SDK (Software Development Kit) for most purposes, we can recommend also to install Java Runtime

6 600 Untuned PCG solution, Pentium 4 2.8GHz 600 Tuned PCG solution, Pentium 4 2.8GHz JVM JVM (a) (b) Figure 6. Java and C Megaflops rates for the PCG solution before tuning (a) and after tuning (b). Environment JRE 1.2 and to employ it for performing large finite element analyses. 5 Conclusion We have designed the object-oriented version of the threedimensional finite element code for elasticity problems and implemented it in Java programming language. Special attention has been devoted to the efficient implementation of computationally intensive sections of the code. The performance of the Java code has been compared to the performance of the analogous C code on the solution of three-dimensional elasticity problems using a computer with Intel Pentium 4 processor. Java Virtual Machines 1.2, 1. and 1.4 were used for running Java code. The experimental results show that the performance of the Java finite element code is roughly equal to the performance of the C code for calculation of element stiffness matrices and assembly of the global equation system when using. JVMs 1. and 1.4 provide lower performance. Untuned Java code demonstrates relatively low performance for the LDU solution of the equation system in the profile format. However, tuning with blocking technique affects speed of the Java code more than speed of the C code. Performance of the tuned Java code running on is about 90% of the C code performance. The PCG iterative solution of the equation system is 0% slower using the Java tuned code in comparison to the C tuned code. It is possible to conclude that the Java language is quite suitable for development of finite element software. With the use of proper coding the performance of the Java code is comparable to the performance of the corresponding tuned C code. It is recommended using for large finite element analyses. References [1] K.-J. Bathe, Finite Element Procedures (Englewood Cliffs: Prentice- Hall, 1996). [2] I.M. Smith and D.V. Griffiths, Programming the Finite Element Method (Chichester: Wiley, 1998). [] R.I. Mackie, Using objects to handle complexity in finite element software, Engineering with Computers, 1, 1997, [4] R.I. Mackie, Object-Oriented Methods and Finite Element Analysis (Stirling: Saxe-Coburg, 1). [5] Y. Dubois-Pelerin and P. Pegon, Object-oriented programming in nonlinear finite element analysis, Computers and Structures, 67, 1998, [6] J. Gosling, B. Joy and G. Steele, The Java Language Specification (Reading, MA: Addison-Wesley, 1996). [7] T. Lindholm and F. Yellin, The Java Virtual Machine Specification (Reading, MA: Addison-Wesley, 1996). [8] R.F. Boisvert, J. Moreira, M. Philippsen and R. Pozo, Java and numerical computing, Computing in Science and Engineering, March/April, 1, [9] D. Kruger, Performance tuning in Java, Java Developers Journal, August, 2, [10] J.E. Moreira, S.P. Midkiff, M. Gupta, P.V. Artigas, M. Snir and R.D. Lawrence, Java programming for high-performance numerical computing, IBM Systems Journal, 9, 0,

Java Performance Analysis for Scientific Computing

Java Performance Analysis for Scientific Computing Java Performance Analysis for Scientific Computing Roldan Pozo Leader, Mathematical Software Group National Institute of Standards and Technology USA UKHEC: Java for High End Computing Nov. 20th, 2000

More information

Finite element algorithm with adaptive quadtree-octree mesh refinement

Finite element algorithm with adaptive quadtree-octree mesh refinement ANZIAM J. 46 (E) ppc15 C28, 2005 C15 Finite element algorithm with adaptive quadtree-octree mesh refinement G. P. Nikishkov (Received 18 October 2004; revised 24 January 2005) Abstract Certain difficulties

More information

History Introduction to Java Characteristics of Java Data types

History Introduction to Java Characteristics of Java Data types Course Name: Advanced Java Lecture 1 Topics to be covered History Introduction to Java Characteristics of Java Data types What is Java? An Object-Oriented Programming Language developed at Sun Microsystems

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2018 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2013 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

Object Oriented Finite Element Modeling

Object Oriented Finite Element Modeling Object Oriented Finite Element Modeling Bořek Patzák Czech Technical University Faculty of Civil Engineering Department of Structural Mechanics Thákurova 7, 166 29 Prague, Czech Republic January 2, 2018

More information

Techniques for Optimizing FEM/MoM Codes

Techniques for Optimizing FEM/MoM Codes Techniques for Optimizing FEM/MoM Codes Y. Ji, T. H. Hubing, and H. Wang Electromagnetic Compatibility Laboratory Department of Electrical & Computer Engineering University of Missouri-Rolla Rolla, MO

More information

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI 1 Akshay N. Panajwar, 2 Prof.M.A.Shah Department of Computer Science and Engineering, Walchand College of Engineering,

More information

Introduction to Java Programming

Introduction to Java Programming Introduction to Java Programming Lecture 1 CGS 3416 Spring 2017 1/9/2017 Main Components of a computer CPU - Central Processing Unit: The brain of the computer ISA - Instruction Set Architecture: the specific

More information

Introduction to Java. Lecture 1 COP 3252 Summer May 16, 2017

Introduction to Java. Lecture 1 COP 3252 Summer May 16, 2017 Introduction to Java Lecture 1 COP 3252 Summer 2017 May 16, 2017 The Java Language Java is a programming language that evolved from C++ Both are object-oriented They both have much of the same syntax Began

More information

Adaptive Surface Modeling Using a Quadtree of Quadratic Finite Elements

Adaptive Surface Modeling Using a Quadtree of Quadratic Finite Elements Adaptive Surface Modeling Using a Quadtree of Quadratic Finite Elements G. P. Nikishkov University of Aizu, Aizu-Wakamatsu 965-8580, Japan niki@u-aizu.ac.jp http://www.u-aizu.ac.jp/ niki Abstract. This

More information

Java On Steroids: Sun s High-Performance Java Implementation. History

Java On Steroids: Sun s High-Performance Java Implementation. History Java On Steroids: Sun s High-Performance Java Implementation Urs Hölzle Lars Bak Steffen Grarup Robert Griesemer Srdjan Mitrovic Sun Microsystems History First Java implementations: interpreters compact

More information

Storage Formats for Sparse Matrices in Java

Storage Formats for Sparse Matrices in Java Storage Formats for Sparse Matrices in Java Mikel Luján, Anila Usman, Patrick Hardie, T.L. Freeman, and John R. Gurd Centre for Novel Computing, The University of Manchester, Oxford Road, Manchester M13

More information

HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE

HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER AVISHA DHISLE PRERIT RODNEY ADHISLE PRODNEY 15618: PARALLEL COMPUTER ARCHITECTURE PROF. BRYANT PROF. KAYVON LET S

More information

The numerical simulation of complex PDE problems. A numerical simulation project The finite element method for solving a boundary-value problem in R 2

The numerical simulation of complex PDE problems. A numerical simulation project The finite element method for solving a boundary-value problem in R 2 Universidad de Chile The numerical simulation of complex PDE problems Facultad de Ciencias Físicas y Matemáticas P. Frey, M. De Buhan Year 2008 MA691 & CC60X A numerical simulation project The finite element

More information

Contents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet

Contents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet Contents 2 F10: Parallel Sparse Matrix Computations Figures mainly from Kumar et. al. Introduction to Parallel Computing, 1st ed Chap. 11 Bo Kågström et al (RG, EE, MR) 2011-05-10 Sparse matrices and storage

More information

Untyped Memory in the Java Virtual Machine

Untyped Memory in the Java Virtual Machine Untyped Memory in the Java Virtual Machine Andreas Gal and Michael Franz University of California, Irvine {gal,franz}@uci.edu Christian W. Probst Technical University of Denmark probst@imm.dtu.dk July

More information

Interaction of JVM with x86, Sparc and MIPS

Interaction of JVM with x86, Sparc and MIPS Interaction of JVM with x86, Sparc and MIPS Sasikanth Avancha, Dipanjan Chakraborty, Dhiral Gada, Tapan Kamdar {savanc1, dchakr1, dgada1, kamdar}@cs.umbc.edu Department of Computer Science and Electrical

More information

Optimization of Vertical and Horizontal Beamforming Kernels on the PowerPC G4 Processor with AltiVec Technology

Optimization of Vertical and Horizontal Beamforming Kernels on the PowerPC G4 Processor with AltiVec Technology Optimization of Vertical and Horizontal Beamforming Kernels on the PowerPC G4 Processor with AltiVec Technology EE382C: Embedded Software Systems Final Report David Brunke Young Cho Applied Research Laboratories:

More information

Mathematics and Computer Science

Mathematics and Computer Science Technical Report TR-2006-010 Revisiting hypergraph models for sparse matrix decomposition by Cevdet Aykanat, Bora Ucar Mathematics and Computer Science EMORY UNIVERSITY REVISITING HYPERGRAPH MODELS FOR

More information

Contents. I The Basic Framework for Stationary Problems 1

Contents. I The Basic Framework for Stationary Problems 1 page v Preface xiii I The Basic Framework for Stationary Problems 1 1 Some model PDEs 3 1.1 Laplace s equation; elliptic BVPs... 3 1.1.1 Physical experiments modeled by Laplace s equation... 5 1.2 Other

More information

Application of GPU-Based Computing to Large Scale Finite Element Analysis of Three-Dimensional Structures

Application of GPU-Based Computing to Large Scale Finite Element Analysis of Three-Dimensional Structures Paper 6 Civil-Comp Press, 2012 Proceedings of the Eighth International Conference on Engineering Computational Technology, B.H.V. Topping, (Editor), Civil-Comp Press, Stirlingshire, Scotland Application

More information

OOFEM An Object Oriented Framework for Finite Element Analysis B. Patzák, Z. Bittnar

OOFEM An Object Oriented Framework for Finite Element Analysis B. Patzák, Z. Bittnar OOFEM An Object Oriented Framework for Finite Element Analysis B. Patzák, Z. Bittnar This paper presents the design principles and structure of the object-oriented finite element software OOFEM, which

More information

Parallelism of Java Bytecode Programs and a Java ILP Processor Architecture

Parallelism of Java Bytecode Programs and a Java ILP Processor Architecture Australian Computer Science Communications, Vol.21, No.4, 1999, Springer-Verlag Singapore Parallelism of Java Bytecode Programs and a Java ILP Processor Architecture Kenji Watanabe and Yamin Li Graduate

More information

Advanced Object-Oriented Programming Introduction to OOP and Java

Advanced Object-Oriented Programming Introduction to OOP and Java Advanced Object-Oriented Programming Introduction to OOP and Java Dr. Kulwadee Somboonviwat International College, KMITL kskulwad@kmitl.ac.th Course Objectives Solidify object-oriented programming skills

More information

Parallel solution for finite element linear systems of. equations on workstation cluster *

Parallel solution for finite element linear systems of. equations on workstation cluster * Aug. 2009, Volume 6, No.8 (Serial No.57) Journal of Communication and Computer, ISSN 1548-7709, USA Parallel solution for finite element linear systems of equations on workstation cluster * FU Chao-jiang

More information

Outline. Introduction to Java. What Is Java? History. Java 2 Platform. Java 2 Platform Standard Edition. Introduction Java 2 Platform

Outline. Introduction to Java. What Is Java? History. Java 2 Platform. Java 2 Platform Standard Edition. Introduction Java 2 Platform Outline Introduction to Java Introduction Java 2 Platform CS 3300 Object-Oriented Concepts Introduction to Java 2 What Is Java? History Characteristics of Java History James Gosling at Sun Microsystems

More information

Java Internals. Frank Yellin Tim Lindholm JavaSoft

Java Internals. Frank Yellin Tim Lindholm JavaSoft Java Internals Frank Yellin Tim Lindholm JavaSoft About This Talk The JavaSoft implementation of the Java Virtual Machine (JDK 1.0.2) Some companies have tweaked our implementation Alternative implementations

More information

A Test Suite for High-Performance Parallel Java

A Test Suite for High-Performance Parallel Java page 1 A Test Suite for High-Performance Parallel Java Jochem Häuser, Thorsten Ludewig, Roy D. Williams, Ralf Winkelmann, Torsten Gollnick, Sharon Brunett, Jean Muylaert presented at 5th National Symposium

More information

Scientific Computing. Some slides from James Lambers, Stanford

Scientific Computing. Some slides from James Lambers, Stanford Scientific Computing Some slides from James Lambers, Stanford Dense Linear Algebra Scaling and sums Transpose Rank-one updates Rotations Matrix vector products Matrix Matrix products BLAS Designing Numerical

More information

A parallel computing framework and a modular collaborative cfd workbench in Java

A parallel computing framework and a modular collaborative cfd workbench in Java Advances in Fluid Mechanics VI 21 A parallel computing framework and a modular collaborative cfd workbench in Java S. Sengupta & K. P. Sinhamahapatra Department of Aerospace Engineering, IIT Kharagpur,

More information

Business and Scientific Applications of the Java Programming Language

Business and Scientific Applications of the Java Programming Language Business and Scientific Applications of the Java Programming Language Angelo Bertolli April 24, 2005 Abstract While Java is arguably a good language with that to write both scientific and business applications,

More information

Ch 09 Multidimensional arrays & Linear Systems. Andrea Mignone Physics Department, University of Torino AA

Ch 09 Multidimensional arrays & Linear Systems. Andrea Mignone Physics Department, University of Torino AA Ch 09 Multidimensional arrays & Linear Systems Andrea Mignone Physics Department, University of Torino AA 2017-2018 Multidimensional Arrays A multidimensional array is an array containing one or more arrays.

More information

16.10 Exercises. 372 Chapter 16 Code Improvement. be translated as

16.10 Exercises. 372 Chapter 16 Code Improvement. be translated as 372 Chapter 16 Code Improvement 16.10 Exercises 16.1 In Section 16.2 we suggested replacing the instruction r1 := r2 / 2 with the instruction r1 := r2 >> 1, and noted that the replacement may not be correct

More information

Ilya Lashuk, Merico Argentati, Evgenii Ovtchinnikov, Andrew Knyazev (speaker)

Ilya Lashuk, Merico Argentati, Evgenii Ovtchinnikov, Andrew Knyazev (speaker) Ilya Lashuk, Merico Argentati, Evgenii Ovtchinnikov, Andrew Knyazev (speaker) Department of Mathematics and Center for Computational Mathematics University of Colorado at Denver SIAM Conference on Parallel

More information

ARRAY DATA STRUCTURE

ARRAY DATA STRUCTURE ARRAY DATA STRUCTURE Isha Batra, Divya Raheja Information Technology Dronacharya College Of Engineering, Farukhnagar,Gurgaon Abstract- In computer science, an array data structure or simply an array is

More information

Algorithms and Architecture. William D. Gropp Mathematics and Computer Science

Algorithms and Architecture. William D. Gropp Mathematics and Computer Science Algorithms and Architecture William D. Gropp Mathematics and Computer Science www.mcs.anl.gov/~gropp Algorithms What is an algorithm? A set of instructions to perform a task How do we evaluate an algorithm?

More information

Super Matrix Solver-P-ICCG:

Super Matrix Solver-P-ICCG: Super Matrix Solver-P-ICCG: February 2011 VINAS Co., Ltd. Project Development Dept. URL: http://www.vinas.com All trademarks and trade names in this document are properties of their respective owners.

More information

Static analysis of eolicblade through finite element method and OOP C++

Static analysis of eolicblade through finite element method and OOP C++ International Conference on Control, Engineering & Information echnology (CEI 4) Proceedings Copyright IPCO-4 ISSN 56-568 Static analysis of eolicblade through finite element method and OOP C++ MateusDantas,

More information

Cache Memories. Topics. Next time. Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance

Cache Memories. Topics. Next time. Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance Cache Memories Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance Next time Dynamic memory allocation and memory bugs Fabián E. Bustamante,

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing W. P. Petersen Seminar for Applied Mathematics Department of Mathematics, ETHZ, Zurich wpp@math. ethz.ch P. Arbenz Institute for Scientific Computing Department Informatik,

More information

A Parallel Implementation of the BDDC Method for Linear Elasticity

A Parallel Implementation of the BDDC Method for Linear Elasticity A Parallel Implementation of the BDDC Method for Linear Elasticity Jakub Šístek joint work with P. Burda, M. Čertíková, J. Mandel, J. Novotný, B. Sousedík Institute of Mathematics of the AS CR, Prague

More information

Performance Evaluations for Parallel Image Filter on Multi - Core Computer using Java Threads

Performance Evaluations for Parallel Image Filter on Multi - Core Computer using Java Threads Performance Evaluations for Parallel Image Filter on Multi - Core Computer using Java s Devrim Akgün Computer Engineering of Technology Faculty, Duzce University, Duzce,Turkey ABSTRACT Developing multi

More information

Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply

Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply University of California, Berkeley Berkeley Benchmarking and Optimization Group (BeBOP) http://bebop.cs.berkeley.edu

More information

Using Java for Scientific Computing. Mark Bul EPCC, University of Edinburgh

Using Java for Scientific Computing. Mark Bul EPCC, University of Edinburgh Using Java for Scientific Computing Mark Bul EPCC, University of Edinburgh markb@epcc.ed.ac.uk Java and Scientific Computing? Benefits of Java for Scientific Computing Portability Network centricity Software

More information

Chapter 1 GETTING STARTED. SYS-ED/ Computer Education Techniques, Inc.

Chapter 1 GETTING STARTED. SYS-ED/ Computer Education Techniques, Inc. Chapter 1 GETTING STARTED SYS-ED/ Computer Education Techniques, Inc. Objectives You will learn: Java platform. Applets and applications. Java programming language: facilities and foundation. Memory management

More information

A Preliminary Workload Analysis of SPECjvm2008

A Preliminary Workload Analysis of SPECjvm2008 A Preliminary Workload Analysis of SPECjvm2008 Hitoshi Oi The University of Aizu, Aizu Wakamatsu, JAPAN oi@oslab.biz Abstract SPECjvm2008 is a new benchmark program suite for measuring client-side Java

More information

Intermediate Representations

Intermediate Representations Intermediate Representations Intermediate Representations (EaC Chaper 5) Source Code Front End IR Middle End IR Back End Target Code Front end - produces an intermediate representation (IR) Middle end

More information

Approaches to Parallel Implementation of the BDDC Method

Approaches to Parallel Implementation of the BDDC Method Approaches to Parallel Implementation of the BDDC Method Jakub Šístek Includes joint work with P. Burda, M. Čertíková, J. Mandel, J. Novotný, B. Sousedík. Institute of Mathematics of the AS CR, Prague

More information

Automated Finite Element Computations in the FEniCS Framework using GPUs

Automated Finite Element Computations in the FEniCS Framework using GPUs Automated Finite Element Computations in the FEniCS Framework using GPUs Florian Rathgeber (f.rathgeber10@imperial.ac.uk) Advanced Modelling and Computation Group (AMCG) Department of Earth Science & Engineering

More information

Cache-oblivious Programming

Cache-oblivious Programming Cache-oblivious Programming Story so far We have studied cache optimizations for array programs Main transformations: loop interchange, loop tiling Loop tiling converts matrix computations into block matrix

More information

Finite Element Implementation

Finite Element Implementation Chapter 8 Finite Element Implementation 8.1 Elements Elements andconditions are the main extension points of Kratos. New formulations can be introduced into Kratos by implementing a new Element and its

More information

Report of Linear Solver Implementation on GPU

Report of Linear Solver Implementation on GPU Report of Linear Solver Implementation on GPU XIANG LI Abstract As the development of technology and the linear equation solver is used in many aspects such as smart grid, aviation and chemical engineering,

More information

Computers in Engineering COMP 208. Computer Structure. Computer Architecture. Computer Structure Michael A. Hawker

Computers in Engineering COMP 208. Computer Structure. Computer Architecture. Computer Structure Michael A. Hawker Computers in Engineering COMP 208 Computer Structure Michael A. Hawker Computer Structure We will briefly look at the structure of a modern computer That will help us understand some of the concepts that

More information

The p-sized partitioning algorithm for fast computation of factorials of numbers

The p-sized partitioning algorithm for fast computation of factorials of numbers J Supercomput (2006) 38:73 82 DOI 10.1007/s11227-006-7285-5 The p-sized partitioning algorithm for fast computation of factorials of numbers Ahmet Ugur Henry Thompson C Science + Business Media, LLC 2006

More information

Optimization of FEM solver for heterogeneous multicore processor Cell. Noriyuki Kushida 1

Optimization of FEM solver for heterogeneous multicore processor Cell. Noriyuki Kushida 1 Optimization of FEM solver for heterogeneous multicore processor Cell Noriyuki Kushida 1 1 Center for Computational Science and e-system Japan Atomic Energy Research Agency 6-9-3 Higashi-Ueno, Taito-ku,

More information

Large-scale Structural Analysis Using General Sparse Matrix Technique

Large-scale Structural Analysis Using General Sparse Matrix Technique Large-scale Structural Analysis Using General Sparse Matrix Technique Yuan-Sen Yang 1), Shang-Hsien Hsieh 1), Kuang-Wu Chou 1), and I-Chau Tsai 1) 1) Department of Civil Engineering, National Taiwan University,

More information

TRIREME Commander: Managing Simulink Simulations And Large Datasets In Java

TRIREME Commander: Managing Simulink Simulations And Large Datasets In Java TRIREME Commander: Managing Simulink Simulations And Large Datasets In Java Andrew Newell Electronic Warfare & Radar Division, Defence Science and Technology Organisation andrew.newell@dsto.defence.gov.au

More information

2 Introduction to Java. Introduction to Programming 1 1

2 Introduction to Java. Introduction to Programming 1 1 2 Introduction to Java Introduction to Programming 1 1 Objectives At the end of the lesson, the student should be able to: Describe the features of Java technology such as the Java virtual machine, garbage

More information

Using Analytic QP and Sparseness to Speed Training of Support Vector Machines

Using Analytic QP and Sparseness to Speed Training of Support Vector Machines Using Analytic QP and Sparseness to Speed Training of Support Vector Machines John C. Platt Microsoft Research 1 Microsoft Way Redmond, WA 9805 jplatt@microsoft.com Abstract Training a Support Vector Machine

More information

UNIPROCESSOR PERFORMANCE ANALYSIS OF A REPRESENTATIVE WORKLOAD OF SANDIA NATIONAL LABORATORIES SCIENTIFIC APPLICATIONS CHARLES LAVERTY, B.S.

UNIPROCESSOR PERFORMANCE ANALYSIS OF A REPRESENTATIVE WORKLOAD OF SANDIA NATIONAL LABORATORIES SCIENTIFIC APPLICATIONS CHARLES LAVERTY, B.S. UNIPROCESSOR PERFORMANCE ANALYSIS OF A REPRESENTATIVE WORKLOAD OF SANDIA NATIONAL LABORATORIES SCIENTIFIC APPLICATIONS BY CHARLES LAVERTY, B.S. A thesis submitted to the Graduate School in partial fulfillment

More information

Lecture 1: Overview of Java

Lecture 1: Overview of Java Lecture 1: Overview of Java What is java? Developed by Sun Microsystems (James Gosling) A general-purpose object-oriented language Based on C/C++ Designed for easy Web/Internet applications Widespread

More information

Evaluation of sparse LU factorization and triangular solution on multicore architectures. X. Sherry Li

Evaluation of sparse LU factorization and triangular solution on multicore architectures. X. Sherry Li Evaluation of sparse LU factorization and triangular solution on multicore architectures X. Sherry Li Lawrence Berkeley National Laboratory ParLab, April 29, 28 Acknowledgement: John Shalf, LBNL Rich Vuduc,

More information

Special Topics: Programming Languages

Special Topics: Programming Languages Lecture #23 0 V22.0490.001 Special Topics: Programming Languages B. Mishra New York University. Lecture # 23 Lecture #23 1 Slide 1 Java: History Spring 1990 April 1991: Naughton, Gosling and Sheridan (

More information

ESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report

ESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report ESPRESO ExaScale PaRallel FETI Solver Hybrid FETI Solver Report Lubomir Riha, Tomas Brzobohaty IT4Innovations Outline HFETI theory from FETI to HFETI communication hiding and avoiding techniques our new

More information

Object-oriented programming in boundary element methods using C11

Object-oriented programming in boundary element methods using C11 Advances in Engineering Software 30 (1999) 127±132 Object-oriented programming in boundary element methods using C11 Wenqing Wang*, Xing Ji, Yuangong Wang Department of Engineering Mechanics and Technology,

More information

Assoc. Prof. Dr. Marenglen Biba. (C) 2010 Pearson Education, Inc. All rights reserved.

Assoc. Prof. Dr. Marenglen Biba. (C) 2010 Pearson Education, Inc. All rights reserved. Assoc. Prof. Dr. Marenglen Biba (C) 2010 Pearson Education, Inc. All rights reserved. Course: Object-Oriented Programming with Java Instructor : Assoc. Prof. Dr. Marenglen Biba Office : Faculty building

More information

Dense Matrix Algorithms

Dense Matrix Algorithms Dense Matrix Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 2003. Topic Overview Matrix-Vector Multiplication

More information

Boca Raton Community High School AP Computer Science A - Syllabus 2009/10

Boca Raton Community High School AP Computer Science A - Syllabus 2009/10 Boca Raton Community High School AP Computer Science A - Syllabus 2009/10 Instructor: Ronald C. Persin Course Resources Java Software Solutions for AP Computer Science, A. J. Lewis, W. Loftus, and C. Cocking,

More information

211: Computer Architecture Summer 2016

211: Computer Architecture Summer 2016 211: Computer Architecture Summer 2016 Liu Liu Topic: Assembly Programming Storage - Assembly Programming: Recap - Call-chain - Factorial - Storage: - RAM - Caching - Direct - Mapping Rutgers University

More information

1. Introduction. Java. Fall 2009 Instructor: Dr. Masoud Yaghini

1. Introduction. Java. Fall 2009 Instructor: Dr. Masoud Yaghini 1. Introduction Java Fall 2009 Instructor: Dr. Masoud Yaghini Outline Introduction Introduction The Java Programming Language The Java Platform References Java technology Java is A high-level programming

More information

A substructure based parallel dynamic solution of large systems on homogeneous PC clusters

A substructure based parallel dynamic solution of large systems on homogeneous PC clusters CHALLENGE JOURNAL OF STRUCTURAL MECHANICS 1 (4) (2015) 156 160 A substructure based parallel dynamic solution of large systems on homogeneous PC clusters Semih Özmen, Tunç Bahçecioğlu, Özgür Kurç * Department

More information

Example 24 Spring-back

Example 24 Spring-back Example 24 Spring-back Summary The spring-back simulation of sheet metal bent into a hat-shape is studied. The problem is one of the famous tests from the Numisheet 93. As spring-back is generally a quasi-static

More information

ANALYZE OF PROGRAMMING TECHNIQUES FOR IMPROVEMENT OF JAVA CODE PERFORMANCE

ANALYZE OF PROGRAMMING TECHNIQUES FOR IMPROVEMENT OF JAVA CODE PERFORMANCE ANALYZE OF PROGRAMMING TECHNIQUES FOR IMPROVEMENT OF JAVA CODE PERFORMANCE Ognian Nakov, Dimiter Tenev Faculty of Computer Systems And Technologies, Technical University-Sofia, Kliment Ohridski 8, Postal

More information

Seminar report Java Submitted in partial fulfillment of the requirement for the award of degree Of CSE

Seminar report Java Submitted in partial fulfillment of the requirement for the award of degree Of CSE A Seminar report On Java Submitted in partial fulfillment of the requirement for the award of degree Of CSE SUBMITTED TO: www.studymafia.org SUBMITTED BY: www.studymafia.org 1 Acknowledgement I would like

More information

INTERIOR POINT METHOD BASED CONTACT ALGORITHM FOR STRUCTURAL ANALYSIS OF ELECTRONIC DEVICE MODELS

INTERIOR POINT METHOD BASED CONTACT ALGORITHM FOR STRUCTURAL ANALYSIS OF ELECTRONIC DEVICE MODELS 11th World Congress on Computational Mechanics (WCCM XI) 5th European Conference on Computational Mechanics (ECCM V) 6th European Conference on Computational Fluid Dynamics (ECFD VI) E. Oñate, J. Oliver

More information

PARALLELIZATION OF POTENTIAL FLOW SOLVER USING PC CLUSTERS

PARALLELIZATION OF POTENTIAL FLOW SOLVER USING PC CLUSTERS Proceedings of FEDSM 2000: ASME Fluids Engineering Division Summer Meeting June 11-15,2000, Boston, MA FEDSM2000-11223 PARALLELIZATION OF POTENTIAL FLOW SOLVER USING PC CLUSTERS Prof. Blair.J.Perot Manjunatha.N.

More information

A Finite Element Method for Deformable Models

A Finite Element Method for Deformable Models A Finite Element Method for Deformable Models Persephoni Karaolani, G.D. Sullivan, K.D. Baker & M.J. Baines Intelligent Systems Group, Department of Computer Science University of Reading, RG6 2AX, UK,

More information

A PARALLEL IMPLEMENTATION OF A FEM SOLVER IN SCILAB

A PARALLEL IMPLEMENTATION OF A FEM SOLVER IN SCILAB powered by A PARALLEL IMPLEMENTATION OF A FEM SOLVER IN SCILAB Author: Massimiliano Margonari Keywords. Scilab; Open source software; Parallel computing; Mesh partitioning, Heat transfer equation. Abstract:

More information

C++ Spring Break Packet 11 The Java Programming Language

C++ Spring Break Packet 11 The Java Programming Language C++ Spring Break Packet 11 The Java Programming Language! Programmers write instructions in various programming languages, some directly understandable by computers and others requiring intermediate translation

More information

Hardware-Supported Pointer Detection for common Garbage Collections

Hardware-Supported Pointer Detection for common Garbage Collections 2013 First International Symposium on Computing and Networking Hardware-Supported Pointer Detection for common Garbage Collections Kei IDEUE, Yuki SATOMI, Tomoaki TSUMURA and Hiroshi MATSUO Nagoya Institute

More information

CSC D70: Compiler Optimization Memory Optimizations

CSC D70: Compiler Optimization Memory Optimizations CSC D70: Compiler Optimization Memory Optimizations Prof. Gennady Pekhimenko University of Toronto Winter 2018 The content of this lecture is adapted from the lectures of Todd Mowry, Greg Steffan, and

More information

Optimizing Data Locality for Iterative Matrix Solvers on CUDA

Optimizing Data Locality for Iterative Matrix Solvers on CUDA Optimizing Data Locality for Iterative Matrix Solvers on CUDA Raymond Flagg, Jason Monk, Yifeng Zhu PhD., Bruce Segee PhD. Department of Electrical and Computer Engineering, University of Maine, Orono,

More information

2.2 Weighting function

2.2 Weighting function Annual Report (23) Kawahara Lab. On Shape Function of Element-Free Galerkin Method for Flow Analysis Daigo NAKAI and Mutsuto KAWAHARA Department of Civil Engineering, Chuo University, Kasuga 3 27, Bunkyo

More information

CSCE 5160 Parallel Processing. CSCE 5160 Parallel Processing

CSCE 5160 Parallel Processing. CSCE 5160 Parallel Processing HW #9 10., 10.3, 10.7 Due April 17 { } Review Completing Graph Algorithms Maximal Independent Set Johnson s shortest path algorithm using adjacency lists Q= V; for all v in Q l[v] = infinity; l[s] = 0;

More information

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Preeti Ranjan Panda and Nikil D. Dutt Department of Information and Computer Science University of California, Irvine, CA 92697-3425,

More information

High Performance Iterative Solver for Linear System using Multi GPU )

High Performance Iterative Solver for Linear System using Multi GPU ) High Performance Iterative Solver for Linear System using Multi GPU ) Soichiro IKUNO 1), Norihisa FUJITA 1), Yuki KAWAGUCHI 1),TakuITOH 2), Susumu NAKATA 3), Kota WATANABE 4) and Hiroyuki NAKAMURA 5) 1)

More information

Engineers can be significantly more productive when ANSYS Mechanical runs on CPUs with a high core count. Executive Summary

Engineers can be significantly more productive when ANSYS Mechanical runs on CPUs with a high core count. Executive Summary white paper Computer-Aided Engineering ANSYS Mechanical on Intel Xeon Processors Engineer Productivity Boosted by Higher-Core CPUs Engineers can be significantly more productive when ANSYS Mechanical runs

More information

Introduction to Parallel. Programming

Introduction to Parallel. Programming University of Nizhni Novgorod Faculty of Computational Mathematics & Cybernetics Introduction to Parallel Section 9. Programming Parallel Methods for Solving Linear Systems Gergel V.P., Professor, D.Sc.,

More information

Accelerating Double Precision FEM Simulations with GPUs

Accelerating Double Precision FEM Simulations with GPUs Accelerating Double Precision FEM Simulations with GPUs Dominik Göddeke 1 3 Robert Strzodka 2 Stefan Turek 1 dominik.goeddeke@math.uni-dortmund.de 1 Mathematics III: Applied Mathematics and Numerics, University

More information

Java and C Performance Comparison on Palm OS. Zhi-Kai Xin

Java and C Performance Comparison on Palm OS. Zhi-Kai Xin Java and C Performance Comparison on Palm OS Zhi-Kai Xin zxin@cs.columbia.edu Abstract This paper investigates the performance comparisons of Java and C on Palm OS PDA device. The performance comparison

More information

Runtime Application Self-Protection (RASP) Performance Metrics

Runtime Application Self-Protection (RASP) Performance Metrics Product Analysis June 2016 Runtime Application Self-Protection (RASP) Performance Metrics Virtualization Provides Improved Security Without Increased Overhead Highly accurate. Easy to install. Simple to

More information

Second Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering

Second Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering State of the art distributed parallel computational techniques in industrial finite element analysis Second Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering Ajaccio, France

More information

CS260 Intro to Java & Android 02.Java Technology

CS260 Intro to Java & Android 02.Java Technology CS260 Intro to Java & Android 02.Java Technology CS260 - Intro to Java & Android 1 Getting Started: http://docs.oracle.com/javase/tutorial/getstarted/index.html Java Technology is: (a) a programming language

More information

Coupled analysis of material flow and die deflection in direct aluminum extrusion

Coupled analysis of material flow and die deflection in direct aluminum extrusion Coupled analysis of material flow and die deflection in direct aluminum extrusion W. Assaad and H.J.M.Geijselaers Materials innovation institute, The Netherlands w.assaad@m2i.nl Faculty of Engineering

More information

Intelligent BEE Method for Matrix-vector Multiplication on Parallel Computers

Intelligent BEE Method for Matrix-vector Multiplication on Parallel Computers Intelligent BEE Method for Matrix-vector Multiplication on Parallel Computers Seiji Fujino Research Institute for Information Technology, Kyushu University, Fukuoka, Japan, 812-8581 E-mail: fujino@cc.kyushu-u.ac.jp

More information

Epetra Performance Optimization Guide

Epetra Performance Optimization Guide SAND2005-1668 Unlimited elease Printed March 2005 Updated for Trilinos 9.0 February 2009 Epetra Performance Optimization Guide Michael A. Heroux Scalable Algorithms Department Sandia National Laboratories

More information

Finite Element Integration and Assembly on Modern Multi and Many-core Processors

Finite Element Integration and Assembly on Modern Multi and Many-core Processors Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,

More information

Advanced Numerical Techniques for Cluster Computing

Advanced Numerical Techniques for Cluster Computing Advanced Numerical Techniques for Cluster Computing Presented by Piotr Luszczek http://icl.cs.utk.edu/iter-ref/ Presentation Outline Motivation hardware Dense matrix calculations Sparse direct solvers

More information

10/26/ Solving Systems of Linear Equations Using Matrices. Objectives. Matrices

10/26/ Solving Systems of Linear Equations Using Matrices. Objectives. Matrices 6.1 Solving Systems of Linear Equations Using Matrices Objectives Write the augmented matrix for a linear system. Perform matrix row operations. Use matrices and Gaussian elimination to solve systems.

More information