Massively Parallel Computing: Unstructured Finite Element Simulations

Size: px
Start display at page:

Download "Massively Parallel Computing: Unstructured Finite Element Simulations"

Transcription

1 Massively Parallel Computing: Unstructured Finite Element Simulations The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Citation Accessed Citable Link Terms of Use Mathur, Kapil K., Zdenek Johan, S. Lennart Johnsson, and Thomas J.R. Hughes Massively Parallel Computing: Unstructured Finite Element Simulations. Harvard Computer Science Group Technical Report TR August 26, :53:27 PM EDT This article was downloaded from Harvard University's DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at (Article begins on next page)

2 Massively Parallel Computing: Unstructured Finite Element Simulations Kapil K. Mathur Zdenek Johan S. Lennart Johnsson Thomas J.R. Hughes TR March 1993 Parallel Computing Research Group Center for Research in Computing Technology Harvard University Cambridge, Massachusetts To appear in Proceedings of NAFEM 4th International Conference on Quality Assurance and Standards in Finite Element and Associated Technologies, May 26{28, Brighton, England.

3 Massively Parallel Computing: Unstructured Finite Element Simulations Kapil K. Mathur, Zdenek Johan and S. Lennart Johnsson 1 Thinking Machines Corporation 245 First Street, Cambridge, MA Thomas J.R. Hughes Division of Applied Mechanics, Stanford University Durand Building, Stanford, CA Abstract Massively parallel computing holds the promise of extreme performance. Critical for achieving high performance is the ability to exploit locality of reference and eective management of the communication resources. This article describes two communication primitives and associated mapping strategies that have been used for several dierent unstructured, three-dimensional, nite element applications in computational uid dynamics and structural mechanics. 1 Introduction Most modern high-performance computing systems have memory distributed among the processors. These processors along with their memory and communication hardware are often referred to as processing nodes. In turn, processing nodes are interconnected by a network such as a mesh, a binary cube or a fat-tree. These computing systems hold a promise for extreme performance. However, careful attention to the data allocation, data motion in the distributed data structures, memory hierarchies and load balancing is required to achieve such extreme performance. Fundamental changes to classical algorithms are therefore necessary. With the advent of such computing systems, it is now possible to simulate signicantly more complex problems found in science and engineering. The increased complexity may be a result of advancing from coarse, structured, two-dimensional geometries to ne, unstructured, three-dimensional geometries or may be due to the more detailed modeling techniques being used. This paper focuses attention on the nite element method. The inherent parallelism in the nite element method is studied in the context of unstructured, three-dimensional, simulations that typically arise in structural mechanics and computational uid dynamics. As stated before, an ecient implementation of the nite element techniques on distributed-memory high-performance computing systems must address issues related to 1 Also aliated with the Division of Applied Sciences, Harvard University, Cambridge, MA

4 load balance and to the ecient use of the network interconnecting the processing nodes. The class of high-performance computing systems studied in this paper are programmed with a shared address space in single-program multiple-data mode. Programming languages that are based on a shared address space include High Performance Fortran (HPF) [11], Connection Machine Fortran (CMF) [4], Fortran-90 [21], Fortran-D [10], and Fortran- Y [3]. In this paper, the Connection Machine systems CM-200 and CM-5 are used as the model architectures and Connection Machine Fortran is used as the model programming language. Several researchers have studied nite element implementation on these model architectures for a variety of dierent applications. Johnsson and Mathur [15 and 16] discuss three-dimensional linear elastic structural applications. Belytschko et al. [2] investigate explicit crash simulations. Farhat et al. [5 and 6] and Shapiro [21] have studied explicit algorithms for computational uid dynamics. Johan et al. [12] report a fully implicit matrix-free implementation for solving the compressible Euler and Navier-Stokes equations. Mathur et al. [19 and 20] discuss explicit and fully dynamic nite element simulations of ductile fracture in metals. Beaudoin et al. [1] describe implicit metal forming simulations with explicit use of polycrystalline plasticity. The outline of this article is as follows: The next section describes a set of communication primitives that are common to all nite element simulations. These primitives have been identied by studying several applications that span the entire range of solution techniques from explicit to fully implicit solvers. Then, the issue of load balance with respect to the communication system is reported for unstructured nite element meshes. Both the mapping of the nite elements and the inuence of node numbering on the communication costs are reported. Two three-dimensional unstructured meshes one that has been used for simulating airow around a complete airplane [12] and the other that typically arises in crash simulations of automobiles are used to demonstrate the importance of locality of reference and optimal selection of paths for the data motion. 2 Communication Primitives All unstructured nite element simulations can be best based on a formulation that views the entire data set in one of the following two representations. The rst representation is called the element-by-element approach. In this data representation, the entire data set is partitioned into two groups a group of unassembled nite elements and a group of assembled nodal points (or sometimes of assembled nodal degrees of freedom). Here the unassembled nite elements and the assembled nodal points are mapped onto the virtual processing nodes of the architecture. All computations at the element level are performed in the rst group. Computations that must be performed on the assembled nodes are performed in the second group. Any interaction between data stored in the two dierent groups involves data motion between virtual processing nodes. The second data representation is called the assembled stiness matrix approach. Here the entire data set is divided into three groups. In addition to the unassembled nite elements and assembled nodal points, the third group represents the assembled global stiness matrix. As before, computations at the element level are performed in the group representing unassembled nite elements. After the element matrices have been evalu- 2

5 ated, a global stiness matrix is assembled. This involves data motion between the group representing unassembled nite elements and the group representing the assembled stiness matrix. The assembled stiness matrix data representation is particularly useful for certain implicit calculations which involve the solution of sparse linear systems by direct methods. In a previous article, Mathur and Johnsson [18] identify a set of communication primitives for unstructured nite element simulations on high performance computing architectures that are programmed with a shared address space. The model architecture used for that study was a Connection Machine system CM-200. This article reported on applications based on element-by-element algorithms. Four communication primitives were identied: global gather, global scatter, all-to-all broadcast, and all-to-all reduce. The rst two primitives are described here very briey. The reader is referred to the above article for a detailed description. The gather operation is a many-from-one mapping between the destination and source arrays. Every destination array element accumulates data values based on a pointer array which is of the same shape as the destination array. In the context of nite element simulations, one example of the gather operation is the accumulation of the assembled nodal data values to local element vectors. Since many nite elements share the same nodal point, this is indeed a many-from-one mapping. The scatter operation is the reverse of the gather operation. Here, many source array elements combine their data values based on a pointer array which is of the same shape as the source array. This is a many-to-one mapping. Data collision may occur at the destination as several source array elements may be associated with the same destination array element. In this case, the colliding data values must be added to achieve the eect of the assembly operation. The assembled stiness matrix data representation requires no additional communication primitives. The data interaction between the group of unassembled nite elements and the assembled stiness matrix is a scatter operation. The data interaction between the group representing the assembled stiness matrix and the group representing the assembled nodal points can either be a gather or a scatter operation. In particular, when iterative methods are used to solve the sparse linear system, a sparse matrix-vector multiply requiring both gather and scatter operations forms the computation kernel. Figure 1 shows a simple nite element mesh with three elements labeled A, B, and C, and seven nodal points labeled 1 through 7. This mesh is used to outline the algorithms used to formulate the gather/scatter operations (Figure 2). For simplicity, it is assumed that there is only one degree of freedom per nodal point. During the gather operation, the unassembled nite elements accumulate the nodal values. In a preprocessing phase, the group of unassembled nite elements is associated with the group labeled \Nodal { II" in Figure 2 through a one-to-one mapping. Since this one-to-one mapping is solely a function 3

6 @ 6? 6? 1 2 C 3 4 A B Group of unassembled nite elements Network interconnecting the processing nodes Group of assembled nodal points Figure 1: The two groups of the processing nodes used to map unstructured discretizations for element-by-element algorithms. The arrows represent the direction of data motion between the two groups, for the gather and scatter operations. For the simple mesh shown above, the group of unassembled nite elements are mapped on to the processing nodes as a linear array three long (representing the three unassembled elements labeled A, B, and C). Similarly, the group of nodal points are mapped on to the processing nodes as a linear array seven long (representing the seven assembled nodal points labeled 1 7). of the mesh connectivity, the preprocessing time can be amortized over several calls to the gather operation. The actual gather operation is performed by rst making local copies of the data values that are requested by more than one unassembled nite element (\Nodal { I"! \Nodal { II") and then by performing the one-to-one data motion step. The scatter operation is done in the reverse order. First, the one-to-one data motion step is performed. Then, the local data values are added together by a reduce operation. 3 Mesh Partitioning and Node Numbering To make an ecient use of the network interconnecting the processing nodes, the nite elements of the unstructured mesh have to be mapped onto the processing nodes of the architecture so that locality of reference is maximized and the number of routing conicts is kept at a minimum. One useful mapping technique that has been studied extensively is the recursive spectral bisection algorithm proposed by Pothen et al. [22]. This algorithm has been used successfully by Simon [24] for mesh decomposition. Johan [13] and Johan and Hughes [14] report an ecient data-parallel implementation of the recursive spectral bisection algorithm. It is important to note that the implementation of the partitioning algorithm be as ecient as possible because the mapping of the nite elements, for an optimal selection of paths for moving data, requires knowledge of conguration 4

7 @ 1 2 C 3 4 A B Finite Element Mesh 1 a 5 a 6 a 3 a 3 b 6 b 7 b 4 b 1 c 3 c 4 c 2 c???????????? Unassembled nite elements 1 a 2 c 3 a 4 b 5 a 6 a 7 c 3 b 4 c 6 b 3 c Nodal points { II Nodal points { I Figure 2: Data structures used in the gather/scatter operations. For the simple mesh shown above, the gather/scatter primitives generate an internal one-to-one mapping between the group of unassembled nite elements and the group of assembled nodal points. The unassembled nodal values (for example 3 a, 3 b, and 3 c ) are queued in the local memory of the processing node representing the assembled nodal point (3 in the example). A local copy or reduce operation completes the gather and scatter operations respectively. 5

8 of the computing platform, which may only be known at runtime. Moreover, adaptive simulations may require a new mapping whenever the mesh is rened. Briey, the spectral partitioning algorithm is based on the smallest non-zero eigenpair of the Laplacian matrix associated with the dual mesh connectivity. The Laplacian matrix is constructed such that the smallest eigenvalue is zero and its corresponding eigenvector consists of all ones (Note that the Laplacian matrix is dened by some authors to be negative semi-denite, in which case the partitioning is based on the second largest eigenvalue). The eigenvector associated with the smallest non-zero eigenvalue is frequently called the Fiedler vector [7, 8, and 9] and can be used to decompose the nite element mesh. The dual mesh connectivity of a mesh is an alternate method of representing a nite element mesh. It is simply a list of elements that share a face with a given nite element. This is in contrast with the popularly used nodal connectivity representation which is a list of nodal points making up a nite element. The partitioning algorithm provides an ecient method for mapping the unassembled nite elements. The contention for the communication links in the network can be reduced further by an appropriate mapping of the assembled nodal points of the nite element mesh. Two dierent nodal renumbering schemes have been studied. The rst technique is based on the results of the mapping algorithm used for the assembled nite elements. After the nite elements have been mapped on to the processing nodes, the nodal connectivity of the nite elements on each processing node is examined to map the nodal points (or the nodal degrees of freedom) on the processing nodes for further improving the locality of the gather/scatter operations. This node renumbering algorithm works very well when the computational domain is discretized by only one type of nite element. When the computational domain is discretized by more than one element type, each element type requires a dierent mapping. In this case the nodal points are mapped randomly on to the processing nodes. The random mapping is quite eective in minimizing the contention for the communication channels during the gather/scatter operations [23, 26, and 17]. 4 Applications Two three-dimensional nite element meshes are used to illustrate the use of the communication primitives discussed above. The rst mesh is that of a generic Falcon Jet (Figure 3). It has been used in CFD calculations to simulate the inviscid ow over an airplane [12]. The second mesh is that of a complete automobile and represents a typical mesh used in crash simulations (Figure 4). The two meshes shown in Figures 3 and 4 were used to measure the eective communications bandwidth for the primitives described above. All bandwidth data reported here was measured on a 32 processing node CM-5 equipped with 128 vector units. The CFD simulation [12] uses the global gather-scatter primitives only. After accumulating data in the group representing unassembled nite elements, all computations are done locally. The result of the local computations is then scattered back to the group representing assembled nodal points. The nite element mesh is made up of 109,914 tetrahedral elements and 97,085 degrees of freedom. A one-point quadrature rule was 6

9 Figure 3: Generic Falcon Jet. The complete mesh has 109,914 tetrahedral elements and 97,085 nodal degrees of freedom. Figure 4: Finite element mesh used in the crash analysis of an automobile. The complete mesh has 33,590 quadrilateral shell elements, 14,678 triangular shell elements and 270,522 degrees of freedom. 7

10 used in the elements. The mapping phase consisting of the recursive spectral bisection algorithm and the nodal reordering scheme took 66 seconds. In this example, the nodal reordering algorithm renumbers the nodal points based on the outcome of the partitioning algorithm. Since the mesh connectivity does not change during the course of the simulation, this mapping is done once for the entire simulation. For this mesh, the eective data motion rates, normalized to one processing node, are 14 Mbytes s 1 and 9:4 Mbytes s 1 for the gather and scatter operations, respectively. The normalized data motion rate can also be separated into two components { the local (or the on processing node) gather scatter rate and the o processing node gather scatter rate. For this nite element mesh, the normalized local data motion rates for the gather and scatter operations were 94 Mbytes s 1 and 15 Mbytes s 1 respectively. The o processing node data motion rates were 1:5 Mbytes s 1 for the gather operation and 2:0 Mbytes s 1 for the scatter operation. It should be noted that the scatter data motion rate includes the time required to perform the addition operation. On the 32-processing node CM-5, the overall data motion bandwidth is 0:45 Gbytes s 1 for the gather operation and 0:30 Gbytes s 1 for the scatter operation. Approximately 27% of the total time is spent on data motion (9 % for the gather operation and 18 % for the scatter operation). Most explicit dynamic simulations have similar structure to the one used by the implicit CFD simulation reported above. The time-step loop of such algorithms also involves a gather-compute-scatter cycle. The typical structure of dynamic explicit element-byelement algorithms using the global gather and scatter primitives is reported in detail in Mathur et al. [19 and 20]. The automobile mesh has 33,590 quadrilateral shells, 14,678 triangular shells and 45,087 nodal points. The spectral partitioning algorithm was unable to produce a mapping that was any better than randomly mapping the nodal points and nite elements. A detailed study of the nite element mesh reveals that there is more than one nite element connected to a face of a neighboring nite element. Fot this mesh, the maximum number of nite elements connected to a face of a neighboring nite element is nineteen. This property of the nite element mesh seems unique to shell elements and requires special care before the spectral partitioning algorithm can be used. This aspect is under investigation and will be reported elsewhere. The data motion bandwidth for the automobile mesh was measured assuming that there are six degrees of freedom per nodal point (the mesh has a total of 270,522 degrees of freedom). For the stochastic mapping scheme, the data motion rate normalized to one processing node is 2:1 Mbytes s 1 for the gather operation and 1:9 Mbytes s 1 for the scatter operation. The local data motion rates normalized to a processing node are 157 Mbytes s 1 and 22 Mbytes s 1 for the gather and scatter operations respectively. The corresponding o processing node data motion rates normalized to one processing node are 2:0 Mbytes s 1 for both the gather and scatter operations. From these data motion rates, it is clear that the stochastic mapping technique helps in reducing the conict for the communication channels in the network interconnecting the processing nodes signicantly (the o processing node bandwidth is the same as that of the generic Falcon Jet mesh). However, locality is not maximized. Consequently, the overall data motion rates are close to the o processing node rates. 8

11 5 Conclusions By using appropriate mapping strategies, it is possible to achieve very high data motion bandwidths for unstructured meshes. This article describes two dierent mapping ideas that improve the locality of reference and minimize contention for the communication channels. The rst mapping algorithm is based on the spectral properties of the Laplacian matrix associated with the dual connectivity of a nite element mesh. Locality is further improved by using a nodal renumbering scheme that maps the nodal points based on the nite element mapping. The second algorithm uses a stochastic mapping strategy by randomly assigning the nodal points of the mesh to the processing nodes. These two strategies have worked well to produce ecient implementation of the communication primitives which work well for a variety of remarkably dierent nite element meshes. It should be noted that the two communication primitives described here are completely general. They are not specic to nite element simulations. Moreover the mapping strategies and the gather and scatter algorithms are valid for any distributed memory computing architecture. Acknowledgements The mesh for the generic Falcon Jet was provided by Dassault Aviation, France. The mesh for the automobile was provided by Centric Engineering Systems, Inc., Palo Alto. References 1. BEAUDOIN A. J., MATHUR K. K., DAWSON P. R. AND JOHNSON G. C. { Three-dimensional deformation process simulation with explicit use of polycrystalline plasticity models, Int. J. Plas., in press. 2. BELYTSCHKO T., PLASKACZ E. J., KENNEDY J. M. AND GREENWELL D. L. { Finite element analysis on the Connection Machine, Comp. Meth. Appl. Mech. and Engr., Vol. 81, 229{254, CHEN M. AND WU J. J. { Optimizing Fortran-90 programs for data motion on massively parallel systems, Yale U., Tech. Rep., CM Fortran reference manual, versions 2.0, Thinking Machines Corporation, FARHAT C., SOBH N. AND PARK K. C. { Transient nite element computations on 65,536 processors: The Connection Machine, Int. J. Num. Meth. Engr., Vol. 30, 27{55, FARHAT C., FEZOUI L. AND LANTERI S. { Two-dimensional viscous ow computations on the Connection Machine: Unstructured meshes, upwind schemes, and massively parallel computations, Comp. Meth. Appl. Mech. Engr., Vol. 102, 61{ 88,

12 7. FIEDLER M. { Algebraic Connectivity of Graphs, Czech. Math. J., 23, 298{305, FIEDLER M. { Eigenvectors of acyclic matrices, Czech. Math. J., 25, 607{618, FIEDLER M. { A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory, Czech. Math. J., 25, 619{633, FOX G., HIRANANDANI S., KENNEDY K., KOELBEL C., KREMER U., TSENG C. AND WU M. { Fortran D Language Specication, Rice U., TR90{141, High Performance Fortran Language Specication, version 0.4, Dept. Comp. Sci., Rice Univ., JOHAN Z., HUGHES T. J. R., MATHUR, K. K. AND JOHNSSON S. L. { A data parallel nite element method for computational uid dynamics on the Connection Machine system, Comp. Meth. Appl. Mech. and Engr., 99, No. 1., 113{134, JOHAN Z. { Data parallel nite element techniques for large-scale computational uid dynamics, Ph.D. Thesis, Stanford University, JOHAN Z. AND HUGHES T. R. J. { An ecient implementation of the spectral partitioning algorithm on Connection Machine systems, Int. Conf. Comp. Sci. Cont., INRIA, JOHNSSON S. L. AND MATHUR K. K. { Experience with the conjugate gradient method for stress analysis on a data parallel computer, Int. J. Num. Meth. Engr., Vol. 27, 523{546, JOHNSSON S. L. AND MATHUR K. K. { Data structures and algorithms for the - nite element method on a data parallel supercomputer, Int. J. Numer. Meth. Engr., Vol. 29, 881{908, MATHUR K. K. { On the use of randomized address maps in unstructured threedimensional nite element simulations, Tech. Rep. Thinking Machines Corporation 37/CS90{4, MATHUR K. K. AND JOHNSSON S. L. { Communication primitives for unstructured nite element simulations on data parallel architectures, Comp. Syst. Engr., 3, No. 1{4, 63{72, MATHUR K. K., NEEDLEMAN A. AND TVERGAARD V. { Dynamic 3D analysis of the Charpy V-notch, Model. Sim. Mater. Sci. Engr., in press. 20. MATHUR K. K., NEEDLEMAN A. AND TVERGAARD V. { Ductile failure analyses on massively parallel computers, in preparation. 21. METCALF M. AND REID J. Fortran 90 explained, Oxford Univ. Press,

13 22. POTHEN A., SIMON H. D., AND LIOU, K.-P. { Partitioning sparse matrices with eigenvectors of graphs, SIAM J. Mat. Anal. Appl. 11, 430{452, RANADE A. G. { How to emulate shared memory, Proc. 28th Symp. Found. Comp. Sci., IEEE, 185{194, SIMON H. D. { Partitioning of unstructured problems for parallel processing, Comp. Sys. Engr, 2, 135{148, SHAPIRO R. A. { Implementation of an Euler/Navier-Stokes nite element algorithm on the Connection Machine, Proc. AIAA 29th. Aero. Sci., AIAA{91{0433, VALIANT L. { A scheme for fast parallel communication, SIAM J. Comp., Vol. 11, 350{361,

Mesh Decomposition and Communication Procedures for Finite Element Applications on the Connection Machine CM-5 System

Mesh Decomposition and Communication Procedures for Finite Element Applications on the Connection Machine CM-5 System Mesh Decomposition and Communication Procedures for Finite Element Applications on the Connection Machine CM-5 System The Harvard community has made this article openly available. Please share how this

More information

Scalability of Finite Element Applications on Distributed-Memory Parallel Computers

Scalability of Finite Element Applications on Distributed-Memory Parallel Computers Scalability of Finite Element Applications on Distributed-Memory Parallel Computers The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters.

More information

An Efficient Communication Strategy for Finite Element Methods on the Connection Machine CM-5 System

An Efficient Communication Strategy for Finite Element Methods on the Connection Machine CM-5 System An Efficient Communication Strategy for Finite Element Methods on the Connection Machine CM-5 System The Harvard community has made this article openly available. Please share how this access benefits

More information

Flow simulation. Frank Lohmeyer, Oliver Vornberger. University of Osnabruck, D Osnabruck.

Flow simulation. Frank Lohmeyer, Oliver Vornberger. University of Osnabruck, D Osnabruck. To be published in: Notes on Numerical Fluid Mechanics, Vieweg 1994 Flow simulation with FEM on massively parallel systems Frank Lohmeyer, Oliver Vornberger Department of Mathematics and Computer Science

More information

Improvements in Dynamic Partitioning. Aman Arora Snehal Chitnavis

Improvements in Dynamic Partitioning. Aman Arora Snehal Chitnavis Improvements in Dynamic Partitioning Aman Arora Snehal Chitnavis Introduction Partitioning - Decomposition & Assignment Break up computation into maximum number of small concurrent computations that can

More information

Optimal Matrix Transposition and Bit Reversal on. Hypercubes: All{to{All Personalized Communication. Alan Edelman. University of California

Optimal Matrix Transposition and Bit Reversal on. Hypercubes: All{to{All Personalized Communication. Alan Edelman. University of California Optimal Matrix Transposition and Bit Reversal on Hypercubes: All{to{All Personalized Communication Alan Edelman Department of Mathematics University of California Berkeley, CA 94720 Key words and phrases:

More information

Corrected/Updated References

Corrected/Updated References K. Kashiyama, H. Ito, M. Behr and T. Tezduyar, "Massively Parallel Finite Element Strategies for Large-Scale Computation of Shallow Water Flows and Contaminant Transport", Extended Abstracts of the Second

More information

Kernighan/Lin - Preliminary Definitions. Comments on Kernighan/Lin Algorithm. Partitioning Without Nodal Coordinates Kernighan/Lin

Kernighan/Lin - Preliminary Definitions. Comments on Kernighan/Lin Algorithm. Partitioning Without Nodal Coordinates Kernighan/Lin Partitioning Without Nodal Coordinates Kernighan/Lin Given G = (N,E,W E ) and a partitioning N = A U B, where A = B. T = cost(a,b) = edge cut of A and B partitions. Find subsets X of A and Y of B with

More information

PARALLEL COMPUTATION OF THE SINGULAR VALUE DECOMPOSITION ON TREE ARCHITECTURES

PARALLEL COMPUTATION OF THE SINGULAR VALUE DECOMPOSITION ON TREE ARCHITECTURES PARALLEL COMPUTATION OF THE SINGULAR VALUE DECOMPOSITION ON TREE ARCHITECTURES Zhou B. B. and Brent R. P. Computer Sciences Laboratory Australian National University Canberra, ACT 000 Abstract We describe

More information

F k G A S S1 3 S 2 S S V 2 V 3 V 1 P 01 P 11 P 10 P 00

F k G A S S1 3 S 2 S S V 2 V 3 V 1 P 01 P 11 P 10 P 00 PRLLEL SPRSE HOLESKY FTORIZTION J URGEN SHULZE University of Paderborn, Department of omputer Science Furstenallee, 332 Paderborn, Germany Sparse matrix factorization plays an important role in many numerical

More information

Language and Compiler Support for Out-of-Core Irregular Applications on Distributed-Memory Multiprocessors

Language and Compiler Support for Out-of-Core Irregular Applications on Distributed-Memory Multiprocessors Language and Compiler Support for Out-of-Core Irregular Applications on Distributed-Memory Multiprocessors Peter Brezany 1, Alok Choudhary 2, and Minh Dang 1 1 Institute for Software Technology and Parallel

More information

Technical Report TR , Computer and Information Sciences Department, University. Abstract

Technical Report TR , Computer and Information Sciences Department, University. Abstract An Approach for Parallelizing any General Unsymmetric Sparse Matrix Algorithm Tariq Rashid y Timothy A.Davis z Technical Report TR-94-036, Computer and Information Sciences Department, University of Florida,

More information

Level 3: Level 2: Level 1: Level 0:

Level 3: Level 2: Level 1: Level 0: A Graph Based Method for Generating the Fiedler Vector of Irregular Problems 1 Michael Holzrichter 1 and Suely Oliveira 2 1 Texas A&M University, College Station, TX,77843-3112 2 The University of Iowa,

More information

1e+07 10^5 Node Mesh Step Number

1e+07 10^5 Node Mesh Step Number Implicit Finite Element Applications: A Case for Matching the Number of Processors to the Dynamics of the Program Execution Meenakshi A.Kandaswamy y Valerie E. Taylor z Rudolf Eigenmann x Jose' A. B. Fortes

More information

Q. Wang National Key Laboratory of Antenna and Microwave Technology Xidian University No. 2 South Taiba Road, Xi an, Shaanxi , P. R.

Q. Wang National Key Laboratory of Antenna and Microwave Technology Xidian University No. 2 South Taiba Road, Xi an, Shaanxi , P. R. Progress In Electromagnetics Research Letters, Vol. 9, 29 38, 2009 AN IMPROVED ALGORITHM FOR MATRIX BANDWIDTH AND PROFILE REDUCTION IN FINITE ELEMENT ANALYSIS Q. Wang National Key Laboratory of Antenna

More information

Properties of a Family of Parallel Finite Element Simulations

Properties of a Family of Parallel Finite Element Simulations Properties of a Family of Parallel Finite Element Simulations David R. O Hallaron and Jonathan Richard Shewchuk December 23, 1996 CMU-CS-96-141 School of Computer Science Carnegie Mellon University Pittsburgh,

More information

CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning

CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning Parallel sparse matrix-vector product Lay out matrix and vectors by rows y(i) = sum(a(i,j)*x(j)) Only compute terms with A(i,j) 0 P0 P1

More information

Native mesh ordering with Scotch 4.0

Native mesh ordering with Scotch 4.0 Native mesh ordering with Scotch 4.0 François Pellegrini INRIA Futurs Project ScAlApplix pelegrin@labri.fr Abstract. Sparse matrix reordering is a key issue for the the efficient factorization of sparse

More information

Clustering Object-Oriented Software Systems using Spectral Graph Partitioning

Clustering Object-Oriented Software Systems using Spectral Graph Partitioning Clustering Object-Oriented Software Systems using Spectral Graph Partitioning Spiros Xanthos University of Illinois at Urbana-Champaign 0 North Goodwin Urbana, IL 680 xanthos@cs.uiuc.edu Abstract In this

More information

Network Related Performance Issues and Techniques for MPPs

Network Related Performance Issues and Techniques for MPPs Network Related Performance Issues and Techniques for MPPs The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Citation Accessed

More information

IMPLEMENTATION OF IMPLICIT FINITE ELEMENT METHODS FOR INCOMPRESSIBLE FLOWS ON THE CM-5

IMPLEMENTATION OF IMPLICIT FINITE ELEMENT METHODS FOR INCOMPRESSIBLE FLOWS ON THE CM-5 Computer Methods in Applied Mechanics and Engineering, 119 95 111 (1994) 1 IMPLEMENTATION OF IMPLICIT FINITE ELEMENT METHODS FOR INCOMPRESSIBLE FLOWS ON THE CM-5 J.G. Kennedy Thinking Machines Corporation

More information

Network. Department of Statistics. University of California, Berkeley. January, Abstract

Network. Department of Statistics. University of California, Berkeley. January, Abstract Parallelizing CART Using a Workstation Network Phil Spector Leo Breiman Department of Statistics University of California, Berkeley January, 1995 Abstract The CART (Classication and Regression Trees) program,

More information

Solving Sparse Linear Systems. Forward and backward substitution for solving lower or upper triangular systems

Solving Sparse Linear Systems. Forward and backward substitution for solving lower or upper triangular systems AMSC 6 /CMSC 76 Advanced Linear Numerical Analysis Fall 7 Direct Solution of Sparse Linear Systems and Eigenproblems Dianne P. O Leary c 7 Solving Sparse Linear Systems Assumed background: Gauss elimination

More information

PARALLELIZATION OF POTENTIAL FLOW SOLVER USING PC CLUSTERS

PARALLELIZATION OF POTENTIAL FLOW SOLVER USING PC CLUSTERS Proceedings of FEDSM 2000: ASME Fluids Engineering Division Summer Meeting June 11-15,2000, Boston, MA FEDSM2000-11223 PARALLELIZATION OF POTENTIAL FLOW SOLVER USING PC CLUSTERS Prof. Blair.J.Perot Manjunatha.N.

More information

A Recursive Coalescing Method for Bisecting Graphs

A Recursive Coalescing Method for Bisecting Graphs A Recursive Coalescing Method for Bisecting Graphs The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Citation Accessed Citable

More information

2.7 Cloth Animation. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter 2 123

2.7 Cloth Animation. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter 2 123 2.7 Cloth Animation 320491: Advanced Graphics - Chapter 2 123 Example: Cloth draping Image Michael Kass 320491: Advanced Graphics - Chapter 2 124 Cloth using mass-spring model Network of masses and springs

More information

Stability Analysis of the Muscl Method on General Unstructured Grids for Applications to Compressible Fluid Flow

Stability Analysis of the Muscl Method on General Unstructured Grids for Applications to Compressible Fluid Flow Stability Analysis of the Muscl Method on General Unstructured Grids for Applications to Compressible Fluid Flow F. Haider 1, B. Courbet 1, J.P. Croisille 2 1 Département de Simulation Numérique des Ecoulements

More information

A substructure based parallel dynamic solution of large systems on homogeneous PC clusters

A substructure based parallel dynamic solution of large systems on homogeneous PC clusters CHALLENGE JOURNAL OF STRUCTURAL MECHANICS 1 (4) (2015) 156 160 A substructure based parallel dynamic solution of large systems on homogeneous PC clusters Semih Özmen, Tunç Bahçecioğlu, Özgür Kurç * Department

More information

2 do i = 1,n sum = 0.0D0 do j = rowptr(i), rowptr(i+1)-1 sum = sum + a(jp) * x(colind(jp)) end do y(i) = sum end do Fig. 1. A sparse matrix-vector mul

2 do i = 1,n sum = 0.0D0 do j = rowptr(i), rowptr(i+1)-1 sum = sum + a(jp) * x(colind(jp)) end do y(i) = sum end do Fig. 1. A sparse matrix-vector mul Improving Memory-System Performance of Sparse Matrix-Vector Multiplication Sivan Toledo y Abstract Sparse matrix-vector multiplication is an important kernel that often runs ineciently on superscalar RISC

More information

TAU mesh deformation. Thomas Gerhold

TAU mesh deformation. Thomas Gerhold TAU mesh deformation Thomas Gerhold The parallel mesh deformation of the DLR TAU-Code Introduction Mesh deformation method & Parallelization Results & Applications Conclusion & Outlook Introduction CFD

More information

SENSEI / SENSEI-Lite / SENEI-LDC Updates

SENSEI / SENSEI-Lite / SENEI-LDC Updates SENSEI / SENSEI-Lite / SENEI-LDC Updates Chris Roy and Brent Pickering Aerospace and Ocean Engineering Dept. Virginia Tech July 23, 2014 Collaborations with Math Collaboration on the implicit SENSEI-LDC

More information

i.e. variable extrapolation along the characteristic propagation directions. This leads to a family of rst and second-order accurate schemes with an i

i.e. variable extrapolation along the characteristic propagation directions. This leads to a family of rst and second-order accurate schemes with an i Cell-centered Genuinely Multidimensional Upwind Algorithms and Structured Meshes P. Van Ransbeeck, Ch. Hirsch Department of Fluid Mechanics Vrije Universiteit Brussel Brussels, Belgium A family of cell-centered

More information

LINE AND PLANE SEPARATORS. PADMA RAGHAVAN y. Abstract. We consider sparse matrices arising from nite-element or nite-dierence methods.

LINE AND PLANE SEPARATORS. PADMA RAGHAVAN y. Abstract. We consider sparse matrices arising from nite-element or nite-dierence methods. LAPACK WORKING NOTE 63 (UT CS-93-202) LINE AND PLANE SEPARATORS PADMA RAGHAVAN y Abstract. We consider sparse matrices arising from nite-element or nite-dierence methods. The graphs of such matrices are

More information

Contents. I The Basic Framework for Stationary Problems 1

Contents. I The Basic Framework for Stationary Problems 1 page v Preface xiii I The Basic Framework for Stationary Problems 1 1 Some model PDEs 3 1.1 Laplace s equation; elliptic BVPs... 3 1.1.1 Physical experiments modeled by Laplace s equation... 5 1.2 Other

More information

Automatic Array Alignment for. Mitsuru Ikei. Hitachi Chemical Company Ltd. Michael Wolfe. Oregon Graduate Institute of Science & Technology

Automatic Array Alignment for. Mitsuru Ikei. Hitachi Chemical Company Ltd. Michael Wolfe. Oregon Graduate Institute of Science & Technology Automatic Array Alignment for Distributed Memory Multicomputers Mitsuru Ikei Hitachi Chemical Company Ltd. Michael Wolfe Oregon Graduate Institute of Science & Technology P.O. Box 91000 Portland OR 97291

More information

PARTI Primitives for Unstructured and Block Structured Problems

PARTI Primitives for Unstructured and Block Structured Problems Syracuse University SURFACE College of Engineering and Computer Science - Former Departments, Centers, Institutes and Projects College of Engineering and Computer Science 1992 PARTI Primitives for Unstructured

More information

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Zhou B. B., Brent R. P. and Tridgell A. y Computer Sciences Laboratory The Australian National University Canberra,

More information

Center for Automation and Autonomous Complex Systems. Computer Science Department, Tulane University. New Orleans, LA June 5, 1991.

Center for Automation and Autonomous Complex Systems. Computer Science Department, Tulane University. New Orleans, LA June 5, 1991. Two-phase Backpropagation George M. Georgiou Cris Koutsougeras Center for Automation and Autonomous Complex Systems Computer Science Department, Tulane University New Orleans, LA 70118 June 5, 1991 Abstract

More information

NIA CFD Seminar, October 4, 2011 Hyperbolic Seminar, NASA Langley, October 17, 2011

NIA CFD Seminar, October 4, 2011 Hyperbolic Seminar, NASA Langley, October 17, 2011 NIA CFD Seminar, October 4, 2011 Hyperbolic Seminar, NASA Langley, October 17, 2011 First-Order Hyperbolic System Method If you have a CFD book for hyperbolic problems, you have a CFD book for all problems.

More information

Computational Fluid Dynamics - Incompressible Flows

Computational Fluid Dynamics - Incompressible Flows Computational Fluid Dynamics - Incompressible Flows March 25, 2008 Incompressible Flows Basis Functions Discrete Equations CFD - Incompressible Flows CFD is a Huge field Numerical Techniques for solving

More information

Improving Performance of Sparse Matrix-Vector Multiplication

Improving Performance of Sparse Matrix-Vector Multiplication Improving Performance of Sparse Matrix-Vector Multiplication Ali Pınar Michael T. Heath Department of Computer Science and Center of Simulation of Advanced Rockets University of Illinois at Urbana-Champaign

More information

A Data-Parallel Adaptive N-body Method

A Data-Parallel Adaptive N-body Method A Data-Parallel Adaptive N-body Method The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Citation Accessed Citable Link Terms

More information

Sparse Matrices. Mathematics In Science And Engineering Volume 99 READ ONLINE

Sparse Matrices. Mathematics In Science And Engineering Volume 99 READ ONLINE Sparse Matrices. Mathematics In Science And Engineering Volume 99 READ ONLINE If you are looking for a ebook Sparse Matrices. Mathematics in Science and Engineering Volume 99 in pdf form, in that case

More information

Geometry based pre-processor for parallel fluid dynamic simulations using a hierarchical basis

Geometry based pre-processor for parallel fluid dynamic simulations using a hierarchical basis Geometry based pre-processor for parallel fluid dynamic simulations using a hierarchical basis Anil Kumar Karanam Scientific Computation Research Center, RPI Kenneth E. Jansen Scientific Computation Research

More information

2 The Service Provision Problem The formulation given here can also be found in Tomasgard et al. [6]. That paper also details the background of the mo

2 The Service Provision Problem The formulation given here can also be found in Tomasgard et al. [6]. That paper also details the background of the mo Two-Stage Service Provision by Branch and Bound Shane Dye Department ofmanagement University of Canterbury Christchurch, New Zealand s.dye@mang.canterbury.ac.nz Asgeir Tomasgard SINTEF, Trondheim, Norway

More information

1.2 Numerical Solutions of Flow Problems

1.2 Numerical Solutions of Flow Problems 1.2 Numerical Solutions of Flow Problems DIFFERENTIAL EQUATIONS OF MOTION FOR A SIMPLIFIED FLOW PROBLEM Continuity equation for incompressible flow: 0 Momentum (Navier-Stokes) equations for a Newtonian

More information

Lecture 9: Group Communication Operations. Shantanu Dutt ECE Dept. UIC

Lecture 9: Group Communication Operations. Shantanu Dutt ECE Dept. UIC Lecture 9: Group Communication Operations Shantanu Dutt ECE Dept. UIC Acknowledgement Adapted from Chapter 4 slides of the text, by A. Grama w/ a few changes, augmentations and corrections Topic Overview

More information

A Compiler for Parallel Finite Element Methods. with Domain-Decomposed Unstructured Meshes JONATHAN RICHARD SHEWCHUK AND OMAR GHATTAS

A Compiler for Parallel Finite Element Methods. with Domain-Decomposed Unstructured Meshes JONATHAN RICHARD SHEWCHUK AND OMAR GHATTAS Contemporary Mathematics Volume 00, 0000 A Compiler for Parallel Finite Element Methods with Domain-Decomposed Unstructured Meshes JONATHAN RICHARD SHEWCHUK AND OMAR GHATTAS December 11, 1993 Abstract.

More information

2 Data Reduction Techniques The granularity of reducible information is one of the main criteria for classifying the reduction techniques. While the t

2 Data Reduction Techniques The granularity of reducible information is one of the main criteria for classifying the reduction techniques. While the t Data Reduction - an Adaptation Technique for Mobile Environments A. Heuer, A. Lubinski Computer Science Dept., University of Rostock, Germany Keywords. Reduction. Mobile Database Systems, Data Abstract.

More information

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA A taxonomy of race conditions. D. P. Helmbold, C. E. McDowell UCSC-CRL-94-34 September 28, 1994 Board of Studies in Computer and Information Sciences University of California, Santa Cruz Santa Cruz, CA

More information

Lesson 2 7 Graph Partitioning

Lesson 2 7 Graph Partitioning Lesson 2 7 Graph Partitioning The Graph Partitioning Problem Look at the problem from a different angle: Let s multiply a sparse matrix A by a vector X. Recall the duality between matrices and graphs:

More information

A ow-condition-based interpolation nite element procedure for triangular grids

A ow-condition-based interpolation nite element procedure for triangular grids INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS Int. J. Numer. Meth. Fluids 2006; 51:673 699 Published online in Wiley InterScience (www.interscience.wiley.com).1246 A ow-condition-based interpolation

More information

Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1

Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanfordedu) February 6, 2018 Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1 In the

More information

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions Data In single-program multiple-data (SPMD) parallel programs, global data is partitioned, with a portion of the data assigned to each processing node. Issues relevant to choosing a partitioning strategy

More information

Studies of the Continuous and Discrete Adjoint Approaches to Viscous Automatic Aerodynamic Shape Optimization

Studies of the Continuous and Discrete Adjoint Approaches to Viscous Automatic Aerodynamic Shape Optimization Studies of the Continuous and Discrete Adjoint Approaches to Viscous Automatic Aerodynamic Shape Optimization Siva Nadarajah Antony Jameson Stanford University 15th AIAA Computational Fluid Dynamics Conference

More information

Parallel Computation of the Singular Value Decomposition on Tree Architectures

Parallel Computation of the Singular Value Decomposition on Tree Architectures Parallel Computation of the Singular Value Decomposition on Tree Architectures Zhou B. B. and Brent R. P. y Computer Sciences Laboratory The Australian National University Canberra, ACT 000, Australia

More information

Object Oriented Finite Element Modeling

Object Oriented Finite Element Modeling Object Oriented Finite Element Modeling Bořek Patzák Czech Technical University Faculty of Civil Engineering Department of Structural Mechanics Thákurova 7, 166 29 Prague, Czech Republic January 2, 2018

More information

THE application of advanced computer architecture and

THE application of advanced computer architecture and 544 IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, VOL. 45, NO. 3, MARCH 1997 Scalable Solutions to Integral-Equation and Finite-Element Simulations Tom Cwik, Senior Member, IEEE, Daniel S. Katz, Member,

More information

AGGLOMERATION MULTIGRID FOR THE THREE-DIMENSIONAL EULER EQUATIONS. MS 132C, NASA Langley Research Center. Abstract

AGGLOMERATION MULTIGRID FOR THE THREE-DIMENSIONAL EULER EQUATIONS. MS 132C, NASA Langley Research Center. Abstract AGGLOMERATION MULTIGRID FOR THE THREE-DIMENSIONAL EULER EQUATIONS V. Venkatakrishnan D. J. Mavriplis y Institute for Computer Applications in Science and Engineering MS 132C, NASA Langley Research Center

More information

Second Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering

Second Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering State of the art distributed parallel computational techniques in industrial finite element analysis Second Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering Ajaccio, France

More information

Elliptic model problem, finite elements, and geometry

Elliptic model problem, finite elements, and geometry Thirteenth International Conference on Domain Decomposition Methods Editors: N. Debit, M.Garbey, R. Hoppe, J. Périaux, D. Keyes, Y. Kuznetsov c 200 DDM.org 4 FETI-DP Methods for Elliptic Problems with

More information

Memory Hierarchy Management for Iterative Graph Structures

Memory Hierarchy Management for Iterative Graph Structures Memory Hierarchy Management for Iterative Graph Structures Ibraheem Al-Furaih y Syracuse University Sanjay Ranka University of Florida Abstract The increasing gap in processor and memory speeds has forced

More information

Comparing SIMD and MIMD Programming Modes Ravikanth Ganesan, Kannan Govindarajan, and Min-You Wu Department of Computer Science State University of Ne

Comparing SIMD and MIMD Programming Modes Ravikanth Ganesan, Kannan Govindarajan, and Min-You Wu Department of Computer Science State University of Ne Comparing SIMD and MIMD Programming Modes Ravikanth Ganesan, Kannan Govindarajan, and Min-You Wu Department of Computer Science State University of New York Bualo, NY 14260 Abstract The Connection Machine

More information

A High-Order Accurate Unstructured GMRES Solver for Poisson s Equation

A High-Order Accurate Unstructured GMRES Solver for Poisson s Equation A High-Order Accurate Unstructured GMRES Solver for Poisson s Equation Amir Nejat * and Carl Ollivier-Gooch Department of Mechanical Engineering, The University of British Columbia, BC V6T 1Z4, Canada

More information

Improved Collision Resolution Algorithms for Multiple Access Channels with Limited Number of Users * Chiung-Shien Wu y and Po-Ning Chen z y Computer a

Improved Collision Resolution Algorithms for Multiple Access Channels with Limited Number of Users * Chiung-Shien Wu y and Po-Ning Chen z y Computer a Improved Collision Resolution Algorithms for Multiple Access Channels with Limited Number of Users * Chiung-Shien Wu y and Po-Ning Chen z y Computer and Communication Research Labs. ITRI, Hsinchu, Taiwan

More information

N. Hitschfeld. Blanco Encalada 2120, Santiago, CHILE.

N. Hitschfeld. Blanco Encalada 2120, Santiago, CHILE. Generalization of modied octrees for geometric modeling N. Hitschfeld Dpto. Ciencias de la Computacion, Univ. de Chile Blanco Encalada 2120, Santiago, CHILE E-mail: nancy@dcc.uchile.cl Abstract. This paper

More information

Ecient Processor Allocation for 3D Tori. Wenjian Qiao and Lionel M. Ni. Department of Computer Science. Michigan State University

Ecient Processor Allocation for 3D Tori. Wenjian Qiao and Lionel M. Ni. Department of Computer Science. Michigan State University Ecient Processor llocation for D ori Wenjian Qiao and Lionel M. Ni Department of Computer Science Michigan State University East Lansing, MI 4884-107 fqiaow, nig@cps.msu.edu bstract Ecient allocation of

More information

Wei Shu and Min-You Wu. Abstract. partitioning patterns, and communication optimization to achieve a speedup.

Wei Shu and Min-You Wu. Abstract. partitioning patterns, and communication optimization to achieve a speedup. Sparse Implementation of Revised Simplex Algorithms on Parallel Computers Wei Shu and Min-You Wu Abstract Parallelizing sparse simplex algorithms is one of the most challenging problems. Because of very

More information

Dislocation Image Stresses at Free Surfaces by the Finite Element Method

Dislocation Image Stresses at Free Surfaces by the Finite Element Method Mat. Res. Soc. Symp. Proc. Vol. 795 2004 Materials Research Society U2.4.1 Dislocation Image Stresses at Free Surfaces by the Finite Element Method Meijie Tang, Guanshui Xu*, Wei Cai, Vasily Bulatov, Lawrence

More information

GRAPH CENTERS USED FOR STABILIZATION OF MATRIX FACTORIZATIONS

GRAPH CENTERS USED FOR STABILIZATION OF MATRIX FACTORIZATIONS Discussiones Mathematicae Graph Theory 30 (2010 ) 349 359 GRAPH CENTERS USED FOR STABILIZATION OF MATRIX FACTORIZATIONS Pavla Kabelíková Department of Applied Mathematics FEI, VSB Technical University

More information

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication

More information

Module 1: Introduction to Finite Difference Method and Fundamentals of CFD Lecture 13: The Lecture deals with:

Module 1: Introduction to Finite Difference Method and Fundamentals of CFD Lecture 13: The Lecture deals with: The Lecture deals with: Some more Suggestions for Improvement of Discretization Schemes Some Non-Trivial Problems with Discretized Equations file:///d /chitra/nptel_phase2/mechanical/cfd/lecture13/13_1.htm[6/20/2012

More information

A parallel frontal solver for nite element applications

A parallel frontal solver for nite element applications INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING Int. J. Numer. Meth. Engng 2001; 50:1131 1144 A parallel frontal solver for nite element applications Jennifer A. Scott ; Computational Science

More information

The problem of minimizing the elimination tree height for general graphs is N P-hard. However, there exist classes of graphs for which the problem can

The problem of minimizing the elimination tree height for general graphs is N P-hard. However, there exist classes of graphs for which the problem can A Simple Cubic Algorithm for Computing Minimum Height Elimination Trees for Interval Graphs Bengt Aspvall, Pinar Heggernes, Jan Arne Telle Department of Informatics, University of Bergen N{5020 Bergen,

More information

High Order Nédélec Elements with local complete sequence properties

High Order Nédélec Elements with local complete sequence properties High Order Nédélec Elements with local complete sequence properties Joachim Schöberl and Sabine Zaglmayr Institute for Computational Mathematics, Johannes Kepler University Linz, Austria E-mail: {js,sz}@jku.at

More information

Implementation of Interaction Algorithm to Non-matching Discrete Interfaces Between Structure and Fluid Mesh

Implementation of Interaction Algorithm to Non-matching Discrete Interfaces Between Structure and Fluid Mesh NASA/CR-1999-209340 ICASE Interim Report No. 36 Implementation of Interaction Algorithm to Non-matching Discrete Interfaces Between Structure and Fluid Mesh Po-Shu Chen ICASE, Hampton, Virginia Institute

More information

We consider the problem of rening quadrilateral and hexahedral element meshes. For

We consider the problem of rening quadrilateral and hexahedral element meshes. For Rening quadrilateral and hexahedral element meshes R. Schneiders RWTH Aachen Lehrstuhl fur Angewandte Mathematik, insb. Informatik Ahornstr. 55, 5056 Aachen, F.R. Germany (robert@feanor.informatik.rwth-aachen.de)

More information

CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS. Xiaodong Zhang and Yongsheng Song

CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS. Xiaodong Zhang and Yongsheng Song CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS Xiaodong Zhang and Yongsheng Song 1. INTRODUCTION Networks of Workstations (NOW) have become important distributed

More information

Topology Optimization and JuMP

Topology Optimization and JuMP Immense Potential and Challenges School of Engineering and Information Technology UNSW Canberra June 28, 2018 Introduction About Me First year PhD student at UNSW Canberra Multidisciplinary design optimization

More information

A class of parallel multiple-front algorithms on subdomains

A class of parallel multiple-front algorithms on subdomains INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING Int. J. Numer. Meth. Engng 2003; 56:1569 1592 (DOI: 10.1002/nme.627) A class of parallel multiple-front algorithms on subdomains A. Bose 1, G.

More information

CS 534: Computer Vision Segmentation and Perceptual Grouping

CS 534: Computer Vision Segmentation and Perceptual Grouping CS 534: Computer Vision Segmentation and Perceptual Grouping Ahmed Elgammal Dept of Computer Science CS 534 Segmentation - 1 Outlines Mid-level vision What is segmentation Perceptual Grouping Segmentation

More information

Transactions on Information and Communications Technologies vol 3, 1993 WIT Press, ISSN

Transactions on Information and Communications Technologies vol 3, 1993 WIT Press,   ISSN The implementation of a general purpose FORTRAN harness for an arbitrary network of transputers for computational fluid dynamics J. Mushtaq, A.J. Davies D.J. Morgan ABSTRACT Many Computational Fluid Dynamics

More information

Matrices. D. P. Koester, S. Ranka, and G. C. Fox. The Northeast Parallel Architectures Center (NPAC) Syracuse University

Matrices. D. P. Koester, S. Ranka, and G. C. Fox. The Northeast Parallel Architectures Center (NPAC) Syracuse University Parallel LU Factorization of Block-Diagonal-Bordered Sparse Matrices D. P. Koester, S. Ranka, and G. C. Fox School of Computer and Information Science and The Northeast Parallel Architectures Center (NPAC)

More information

Basic LOgical Bulk Shapes (BLOBs) for Finite Element Hexahedral Mesh Generation

Basic LOgical Bulk Shapes (BLOBs) for Finite Element Hexahedral Mesh Generation Basic LOgical Bulk Shapes (BLOBs) for Finite Element Hexahedral Mesh Generation Shang-Sheng Liu and Rajit Gadh Department of Mechanical Engineering University of Wisconsin - Madison Madison, Wisconsin

More information

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND Student Submission for the 5 th OpenFOAM User Conference 2017, Wiesbaden - Germany: SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND TESSA UROIĆ Faculty of Mechanical Engineering and Naval Architecture, Ivana

More information

This is the main idea of the evolution Galerkin (EG) methods, which evolve the initial data using the bicharacteristic cone and then project them onto

This is the main idea of the evolution Galerkin (EG) methods, which evolve the initial data using the bicharacteristic cone and then project them onto Finite volume evolution Galerkin methods for multidimensional hyperbolic problems M. Lukacova { Medvid'ova 1 3, K. W. Morton 2,G.Warnecke 1 1 Institut fur Analysis und Numerik, Otto-von-Guericke-Universitat

More information

682 M. Nordén, S. Holmgren, and M. Thuné

682 M. Nordén, S. Holmgren, and M. Thuné OpenMP versus MPI for PDE Solvers Based on Regular Sparse Numerical Operators? Markus Nord n, Sverk er Holmgren, and Michael Thun Uppsala University, Information Technology, Dept. of Scientic Computing,

More information

Skill. Robot/ Controller

Skill. Robot/ Controller Skill Acquisition from Human Demonstration Using a Hidden Markov Model G. E. Hovland, P. Sikka and B. J. McCarragher Department of Engineering Faculty of Engineering and Information Technology The Australian

More information

Numerical Analysis of Shock Tube Problem by using TVD and ACM Schemes

Numerical Analysis of Shock Tube Problem by using TVD and ACM Schemes Numerical Analysis of Shock Tube Problem by using TVD and Schemes Dr. Mukkarum Husain, Dr. M. Nauman Qureshi, Syed Zaid Hasany IST Karachi, Email: mrmukkarum@yahoo.com Abstract Computational Fluid Dynamics

More information

Optimal Communication Channel Utilization for Matrix Transposition and Related Permutations on Binary Cubes

Optimal Communication Channel Utilization for Matrix Transposition and Related Permutations on Binary Cubes Optimal Communication Channel Utilization for Matrix Transposition and Related Permutations on Binary Cubes The Harvard community has made this article openly available. Please share how this access benefits

More information

LETTERS TO THE EDITOR

LETTERS TO THE EDITOR INTERNATIONAL JOURNAL FOR NUMERICAL AND ANALYTICAL METHODS IN GEOMECHANICS, VOL. 7, 135-141 (1983) LETTERS TO THE EDITOR NUMERICAL PREDICTION OF COLLAPSE LOADS USING FINITE ELEMENT METHODS by S. W. Sloan

More information

Development of a Maxwell Equation Solver for Application to Two Fluid Plasma Models. C. Aberle, A. Hakim, and U. Shumlak

Development of a Maxwell Equation Solver for Application to Two Fluid Plasma Models. C. Aberle, A. Hakim, and U. Shumlak Development of a Maxwell Equation Solver for Application to Two Fluid Plasma Models C. Aberle, A. Hakim, and U. Shumlak Aerospace and Astronautics University of Washington, Seattle American Physical Society

More information

Computation of Turbulence with Stabilized Methods. Kenneth Jansen 1. 1 Dept. of Mech. Engr., Aero. Engr. & Mech.

Computation of Turbulence with Stabilized Methods. Kenneth Jansen 1. 1 Dept. of Mech. Engr., Aero. Engr. & Mech. Computation of Turbulence with Stabilized Methods Kenneth Jansen 1 1 Dept. of Mech. Engr., Aero. Engr. & Mech. Scientic Computation Research Center (SCOREC) Rensselaer Polytechnic Institute, Troy, New

More information

Experiments on string matching in memory structures

Experiments on string matching in memory structures Experiments on string matching in memory structures Thierry Lecroq LIR (Laboratoire d'informatique de Rouen) and ABISS (Atelier de Biologie Informatique Statistique et Socio-Linguistique), Universite de

More information

Final drive lubrication modeling

Final drive lubrication modeling Final drive lubrication modeling E. Avdeev a,b 1, V. Ovchinnikov b a Samara University, b Laduga Automotive Engineering Abstract. In this paper we describe the method, which is the composition of finite

More information

Finite Element Model of Fracture Formation on Growing Surfaces

Finite Element Model of Fracture Formation on Growing Surfaces Finite Element Model of Fracture Formation on Growing Surfaces Pavol Federl and Przemyslaw Prusinkiewicz Department of Computer Science, University of Calgary Calgary, Alberta, Canada T2N 1N4 e mail: federl

More information

Technische Universitat Munchen. Institut fur Informatik. D Munchen.

Technische Universitat Munchen. Institut fur Informatik. D Munchen. Developing Applications for Multicomputer Systems on Workstation Clusters Georg Stellner, Arndt Bode, Stefan Lamberts and Thomas Ludwig? Technische Universitat Munchen Institut fur Informatik Lehrstuhl

More information

FB(9,3) Figure 1(a). A 4-by-4 Benes network. Figure 1(b). An FB(4, 2) network. Figure 2. An FB(27, 3) network

FB(9,3) Figure 1(a). A 4-by-4 Benes network. Figure 1(b). An FB(4, 2) network. Figure 2. An FB(27, 3) network Congestion-free Routing of Streaming Multimedia Content in BMIN-based Parallel Systems Harish Sethu Department of Electrical and Computer Engineering Drexel University Philadelphia, PA 19104, USA sethu@ece.drexel.edu

More information

Math 340 Fall 2014, Victor Matveev. Binary system, round-off errors, loss of significance, and double precision accuracy.

Math 340 Fall 2014, Victor Matveev. Binary system, round-off errors, loss of significance, and double precision accuracy. Math 340 Fall 2014, Victor Matveev Binary system, round-off errors, loss of significance, and double precision accuracy. 1. Bits and the binary number system A bit is one digit in a binary representation

More information

Efficient Algorithm for Gray-to-Binary Permutation on Hypercubes.

Efficient Algorithm for Gray-to-Binary Permutation on Hypercubes. An Efficient Algorithm for Gray-to- Binary Permutation on Hypercubes The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation

More information