Kriging in a Parallel Environment
|
|
- Oswald Jacobs
- 5 years ago
- Views:
Transcription
1 Kriging in a Parallel Environment Jason Morrison (Ph.D. Candidate, School of Computer Science, Carleton University, ON, K1S 5B6, Canada; (613) ; morrison@scs.carleton.ca) Introduction In spatial data modelling and analysis there are a variety of techniques to perform prediction. The goal of these techniques is to take spatially located data and to establish estimates of data values at unknown locations. Of these techniques, the attractive aspects of kriging are often overshadowed by the slow speed of the calculation. Unfortunately the calculations necessary to perform kriging have a high computational complexity (i.e., for simple, ordinary and universal kriging). Even if algorithms requiring the theoretical minimum complexity become available, kriging will still be too slow to be an interactive process. Faster paradigms are required to advance the use of this technique in modern interactive data analysis. As a part of the Parallel and Distributed Geomatics Project the goal of this work is to provide insight into kriging. Problem Definition From the previous work by Kerry and Hawick [4] it is known that parallelism can be successfully applied to kriging. In that work the authors showed that it is possible to produce speedup using tightly coupled parallel machines (a CM-5 and a farm of Alpha Workstations with an Optical based ATM switching network). Since tightly coupled machines are not always available and can be prohibitively expensive, we try to address a more practical approach to parallelism. Is it possible to attain speedup in kriging on a general network of workstations (NOW)? We further constrain ourselves by demanding that the implementation be portable to different platforms and configurations. Kriging, itself is often used to describe an entire field of estimation techniques. In this research we are restricting our attention to three types of Kriging: Simple, Ordinary and Universal Kriging. This restriction is motivated by the fact that the mathematics involved in these three types of estimation is very similar. From here on we will only discuss the Simple Kriging algorithm. The interested reader should consult [5] and [6] to see how our work will extend to Ordinary and Universal Kriging. Essentially Simple Kriging(SK) is a mathematical technique that uses the known data points to calculate a data value at whatever point the user desires. In our algorithm SK is used repeatedly to generate data values of points on a grid of size m. We assume the size of the data set n is smaller than the number of output locations (i.e., n<m). This assumption seems reasonable given this is true in the applications discussed within the literature [5]. Theory of Simple Kriging Simple Kriging(SK) begins with the assumption that the original set of data is a partial realization of a random function denoted by Z. Each known data value Z(x) has an associated spatial location x. It is further assumed that Z has the property of second order stationarity. This
2 means that the first and second order statistical properties of Z are invariant under any translation. That is, Equations (1) and (2) must hold. E[Z(x)] = m (1) E[(Z(x)-m)(Z(x+h)-m)] = Cov(x,x+h) = Cov(h) (2) where m is a scalar constant, h is a vector distance and Cov() is the covariance of the random function. The set of all points is defined as X={x 1,x 2,,x n }. In SK it is the data analyst's task to define a covariance function Cov(x i,x j )=Cov(x i -x j ) which matches the observed covariance in the data. This task is quite complex and involves statistical measurements of the data as well as knowledge of the data source and the data collection technique. More information on this can be found in [5] and [6]. The SK estimate of a data value at point x 0 is denoted Z est (x 0 ). Each estimate is defined by calculating weights l(x 0, x) for each known data value x. The weights and their respective data values are then multiplied together and summed to establish the estimate. The first task is to calculate a vector L(x 0 ) of length n which contains the weights l(x 0,x i ) at element L (i) (x 0 ). The n by n matrix C and the 1 by n vector c must be calculated with elements C (i,j) =Cov(x i,x j ) and c i =Cov(x 0,x i ). Equation (3) is then used to calculate the weights in L(x 0 ). The estimate can be calculated by using equation (4), the calculated weights and by defining the 1 by n vector Y with elements Y (i) =Z(x i )-m. This makes the time for each of the estimations O(n) to form c and perform the dot product with q. To calculate q can be done in a single pre-computation of O(n 3 ) to calculate C and multiply it with Y. L=C -1 c (3) Z est (x 0 )=YL(x 0 )+m =YC -1 c+m=qc + m (4) Z 2 err (x 0 )=Cov(0)-cC -1 c (5) Finally, equation (5) defines the standard squared error of the estimate at a point x 0. The cost of the this calculation is expensive at O(n 2 ) for each error but mandatory when trying to analyze data. Fortunately, this calculation does not add anything to the pre-computational step as its requirements are identical to Z est (x 0 ). The estimation achieved by SK is commonly classified as a BLUE or Best Linear Unbiased Estimate. It is considered linear because it is a linear combination of known values, it is unbiased because the average estimated error is zero and it is best because the square of the errors is minimized (see [5] or [6] for all derivations). Algorithms There are three versions of our algorithm. The first version is sequential while the second and third versions are parallel. The sequential version of the algorithm follows the basic step described in the theory and operates in two phases. In phase one the data points, their locations and Cov() are used to create Y, C -1 and q. In the second phase each of the estimation locations on the grid and the appropriate Z est () and Z err 2 () are calculated.
3 Sequential I -- (and Parallel II) 1) Calculate Y, C -1 and q 2) Compute all estimation locations on the output grid 3) For each location calculate c, Z est () and Z err 2 () The first parallel version takes the approach of dividing the calculations for the output grid between the processors, running the sequential algorithm and then returning the output to a single processor. This implies that the input, output and intermediate structures Y, C -1 and q must fit on each of the processors. This establishes a requirement of more than n 2 +(4+d)n storage for each processor plus required space to perform the matrix inversion (note d is the dimension of each location). We further make the assumption, commonly made in CGM algorithms [3], that n > p 2 and hence m > p 2 where p is the number of processors. It is not assumed here that the data is already copied to every processor but rather it is included as the initial step of the algorithm. Parallel I 1) Distribute data to all processors 2) Distribute grid information to all processors 3) Each processor calculates their portion of the grid 4) Each processor runs the Sequential program on their sub-grid 5) Collect answers on original processor The important stage of this program is how to calculate the sub-grid for each processor. This is done by applying a technique used in division of raster data between multiple processors[7]. Suppose, without loss of generalization, that the output grid has a larger or equal number of rows than columns. Then the processors are assigned numbers in the arrangement of a single column. If the number of processors divides evenly into the number of rows then each processor is assigned an equal number of rows and the problem is solved. If however the division has a remainder of r, then the first r processors are assigned one extra row. This strategy minimizes the difference between rows while still assigning blocks of the grid to a single processor. It is also guaranteed to produce a sub-grid for each processor. Since m>p 2 and the number of rows m row is greater than number of columns m col then p<m row. The work by Kerry and Hawick [4] the authors concentrate on a web-based interface to their parallel kriging implementation. However, the paper does provide inspiration for a second form of parallel kriging. They mention that "at present High Performance Fortran cannot express the high degree of parallelism obtained with message passing implementations of ScaLAPACK"(page 5). The high degree of parallelism that they refer to is the basis of finegrained parallelism and the ScaLAPACK library. Instead of dividing up the output of the program fine-grained parallelism tries to perform the algorithm's computations in the same order as in the sequential algorithm. The emphasis is instead placed on making each matrix and vector computation work in parallel. Dividing each matrix into blocks, distributing them across the processors allows each processor to work on a
4 portion of the overall computation. Details of each operation go beyond the scope of this abstract and the reader is referred to [2] for more information. Implementation Proceeding in the spirit of providing a usable implementation on a variety of platforms we used a collection of standard, freely available libraries. The libraries mentioned are the LAMMPI message passing library, the BLAS basic linear algebra library, the LAPACK linear algebra package, the BLACS basic linear algebra communication subroutines library, the ATLAS library for optimized BLAS routines and the ScaLAPACK scalable LAPACK library. With the exception of the LAMMPI library all of the above can be found at web site currently maintained by the University of Tennessee, Knoxville. The main implementation issues with the sequential program are in the efficiency, stability and precision requirements of the matrix inversion and multiplication. To support these requirements the LAPACK, BLAS and ATLAS libraries were used. The most important aspect of the sequential algorithm is the inversion of matrix C and the subsequent calculation of q. First the calculation of q was performed by solving the system of equations Cq=Y. The LAPACK function dsysvx() solves such systems of equations using iteration and LU decomposition to guarantee precision and convergence to the solution(see [1] for details). The matrix C was then inverted using the LU decomposition obtained from the solution to q. This was performed by the LAPACK function dsytri() whose details can also be found in [1]. The remainder of the algorithm follows directly and the BLAS functionality was used for all remaining operations. The only remaining concern of the first parallel implementation is distribution of the data and the collection of the output. This was performed using the BLACS library, which is optimized for communication of vectors and matrices between processors. On our system BLACS is essentially a front-end for MPI but it was chosen because of its common use in the numerical analysis community and its subsequent availability on many types of high performance computers. The implementation of the second parallel algorithm has been completed using the sequential algorithm and changing the calls from LAPACK and BLAS calls to ScaLAPACK and PBLAS equivalent parallel calls. The distribution of the matrices also had to be facilitated using BLACS but the remainder of the code stays the same. While the code for this program is currently functional it is clear that our implementation is restricted by the hardware we are currently using. We are reserving our final conclusions until the program has been ported to a tightly coupled machine (e.g., IBM SP2, SGI Origin 2000, or Cray T3E). Results and Discussion At the time of writing the testing of the algorithms has yielded very interesting results. It is clear that our first parallel version is a success and can achieve very high efficiency. For our testing we have used the sample data sets presented in the appendix of [5]. The tests were conducted on a cluster of Pentium III's 450 Mhz, running Sun Solaris 2.7 and connected with a 100 Mb hub. This strictly non-parallel communication device is limiting and simulates
5 somewhat average conditions in NOWs. The high speedup and parallel efficiency are especially important given the high computational speed vs the medium communication speed. A small sampling of efficiencies are demonstrated in Table (1). Detailed results will be presented at the conference. Table (1) -- Parallel Speed up and Efficiency in Parallel Algorithm 1 (n=200, m=25,000) # of procs Speed-Up Efficiency 96.8% 95.6% 94.5% 91.7% 93.4% Conclusions In previous work the concentration was on programming for tightly coupled computer systems. This work shows that such a system is not necessary to achieve high performance. High levels of performance are achievable through less expensive, off the shelf components. We also show that the performance is achievable using easily accessed libraries and that high efficiency is obtained by working in parallel. References [1] E. Anderson, et.al., LAPACK User's Guide 3 rd ed.. Siam Publishing, Philadelphia [2] L.S. Blackford, et.al., ScaLAPACK User's Guide. Siam Publishing, Philadelphia [3] F. Dehne, A. Fabri and A. Rau-Chaplin, "Scalable Parallel Geometric Algorithms for Coarse Grained Multicomputers," in Proc. ACM 9 th Ann. Comp. Geom., p [4] K. Kerry and K. Hawick, "Kriging Interpolation on High Performance Computers", Proc. of High Performance Computing and Networks Europe. LNCS 1401, Springer-Verlag [5] R. Olea, Geostatistics for Engineers and Earth Scientists. Kluwer Academic Publishers, Boston [6] B. Ripley, Spatial Statistics. Wiley Series in Probability and Mathematical Statistics, John Wiley and Sons. Toronto [7] D. Roytenberg, "Developing Parallel GIS Applications on the ALEX AVX II Computer", Master's Thesis, School of Computer Science, Carleton University Acknowledgements A special thank you to Hossam Khalil for his work in implementing the applications described. This work was funded by OGS and GEOIDE.
Parallel Hybrid Monte Carlo Algorithms for Matrix Computations
Parallel Hybrid Monte Carlo Algorithms for Matrix Computations V. Alexandrov 1, E. Atanassov 2, I. Dimov 2, S.Branford 1, A. Thandavan 1 and C. Weihrauch 1 1 Department of Computer Science, University
More information6.1 Multiprocessor Computing Environment
6 Parallel Computing 6.1 Multiprocessor Computing Environment The high-performance computing environment used in this book for optimization of very large building structures is the Origin 2000 multiprocessor,
More informationHomework # 2 Due: October 6. Programming Multiprocessors: Parallelism, Communication, and Synchronization
ECE669: Parallel Computer Architecture Fall 2 Handout #2 Homework # 2 Due: October 6 Programming Multiprocessors: Parallelism, Communication, and Synchronization 1 Introduction When developing multiprocessor
More informationEXPERIMENTS WITH STRASSEN S ALGORITHM: FROM SEQUENTIAL TO PARALLEL
EXPERIMENTS WITH STRASSEN S ALGORITHM: FROM SEQUENTIAL TO PARALLEL Fengguang Song, Jack Dongarra, and Shirley Moore Computer Science Department University of Tennessee Knoxville, Tennessee 37996, USA email:
More informationA Fast and Exact Simulation Algorithm for General Gaussian Markov Random Fields
A Fast and Exact Simulation Algorithm for General Gaussian Markov Random Fields HÅVARD RUE DEPARTMENT OF MATHEMATICAL SCIENCES NTNU, NORWAY FIRST VERSION: FEBRUARY 23, 1999 REVISED: APRIL 23, 1999 SUMMARY
More informationAbstract HPF was originally created to simplify high-level programming of parallel computers. The inventors of HPF strove for an easy-to-use language
Ecient HPF Programs Harald J. Ehold 1 Wilfried N. Gansterer 2 Dieter F. Kvasnicka 3 Christoph W. Ueberhuber 2 1 VCPC, European Centre for Parallel Computing at Vienna E-Mail: ehold@vcpc.univie.ac.at 2
More informationSelf Adapting Numerical Software (SANS-Effort)
Self Adapting Numerical Software (SANS-Effort) Jack Dongarra Innovative Computing Laboratory University of Tennessee and Oak Ridge National Laboratory 1 Work on Self Adapting Software 1. Lapack For Clusters
More informationExtra-High Speed Matrix Multiplication on the Cray-2. David H. Bailey. September 2, 1987
Extra-High Speed Matrix Multiplication on the Cray-2 David H. Bailey September 2, 1987 Ref: SIAM J. on Scientic and Statistical Computing, vol. 9, no. 3, (May 1988), pg. 603{607 Abstract The Cray-2 is
More informationx = 12 x = 12 1x = 16
2.2 - The Inverse of a Matrix We've seen how to add matrices, multiply them by scalars, subtract them, and multiply one matrix by another. The question naturally arises: Can we divide one matrix by another?
More informationIn 1986, I had degrees in math and engineering and found I wanted to compute things. What I ve mostly found is that:
Parallel Computing and Data Locality Gary Howell In 1986, I had degrees in math and engineering and found I wanted to compute things. What I ve mostly found is that: Real estate and efficient computation
More informationModelling and implementation of algorithms in applied mathematics using MPI
Modelling and implementation of algorithms in applied mathematics using MPI Lecture 1: Basics of Parallel Computing G. Rapin Brazil March 2011 Outline 1 Structure of Lecture 2 Introduction 3 Parallel Performance
More informationDynamic Selection of Auto-tuned Kernels to the Numerical Libraries in the DOE ACTS Collection
Numerical Libraries in the DOE ACTS Collection The DOE ACTS Collection SIAM Parallel Processing for Scientific Computing, Savannah, Georgia Feb 15, 2012 Tony Drummond Computational Research Division Lawrence
More informationSolution of Out-of-Core Lower-Upper Decomposition for Complex Valued Matrices
Solution of Out-of-Core Lower-Upper Decomposition for Complex Valued Matrices Marianne Spurrier and Joe Swartz, Lockheed Martin Corp. and ruce lack, Cray Inc. ASTRACT: Matrix decomposition and solution
More informationParallel solution for finite element linear systems of. equations on workstation cluster *
Aug. 2009, Volume 6, No.8 (Serial No.57) Journal of Communication and Computer, ISSN 1548-7709, USA Parallel solution for finite element linear systems of equations on workstation cluster * FU Chao-jiang
More informationExercise Set Decide whether each matrix below is an elementary matrix. (a) (b) (c) (d) Answer:
Understand the relationships between statements that are equivalent to the invertibility of a square matrix (Theorem 1.5.3). Use the inversion algorithm to find the inverse of an invertible matrix. Express
More informationScientific Computing. Some slides from James Lambers, Stanford
Scientific Computing Some slides from James Lambers, Stanford Dense Linear Algebra Scaling and sums Transpose Rank-one updates Rotations Matrix vector products Matrix Matrix products BLAS Designing Numerical
More informationOptimizing Cache Performance in Matrix Multiplication. UCSB CS240A, 2017 Modified from Demmel/Yelick s slides
Optimizing Cache Performance in Matrix Multiplication UCSB CS240A, 2017 Modified from Demmel/Yelick s slides 1 Case Study with Matrix Multiplication An important kernel in many problems Optimization ideas
More informationBLAS: Basic Linear Algebra Subroutines I
BLAS: Basic Linear Algebra Subroutines I Most numerical programs do similar operations 90% time is at 10% of the code If these 10% of the code is optimized, programs will be fast Frequently used subroutines
More informationI. This material refers to mathematical methods designed for facilitating calculations in matrix
A FEW CONSIDERATIONS REGARDING MATRICES operations. I. This material refers to mathematical methods designed for facilitating calculations in matri In this case, we know the operations of multiplying two
More informationCell based GIS. Introduction to rasters
Week 9 Cell based GIS Introduction to rasters topics of the week Spatial Problems Modeling Raster basics Application functions Analysis environment, the mask Application functions Spatial Analyst in ArcGIS
More informationHomework # 1 Due: Feb 23. Multicore Programming: An Introduction
C O N D I T I O N S C O N D I T I O N S Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.86: Parallel Computing Spring 21, Agarwal Handout #5 Homework #
More informationBLAS: Basic Linear Algebra Subroutines I
BLAS: Basic Linear Algebra Subroutines I Most numerical programs do similar operations 90% time is at 10% of the code If these 10% of the code is optimized, programs will be fast Frequently used subroutines
More informationPAMIHR. A Parallel FORTRAN Program for Multidimensional Quadrature on Distributed Memory Architectures
PAMIHR. A Parallel FORTRAN Program for Multidimensional Quadrature on Distributed Memory Architectures G. Laccetti and M. Lapegna Center for Research on Parallel Computing and Supercomputers - CNR University
More informationTools and Primitives for High Performance Graph Computation
Tools and Primitives for High Performance Graph Computation John R. Gilbert University of California, Santa Barbara Aydin Buluç (LBNL) Adam Lugowski (UCSB) SIAM Minisymposium on Analyzing Massive Real-World
More informationLARP / 2018 ACK : 1. Linear Algebra and Its Applications - Gilbert Strang 2. Autar Kaw, Transforming Numerical Methods Education for STEM Graduates
Triangular Factors and Row Exchanges LARP / 28 ACK :. Linear Algebra and Its Applications - Gilbert Strang 2. Autar Kaw, Transforming Numerical Methods Education for STEM Graduates Then there were three
More informationCS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it
Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1
More information1 2 (3 + x 3) x 2 = 1 3 (3 + x 1 2x 3 ) 1. 3 ( 1 x 2) (3 + x(0) 3 ) = 1 2 (3 + 0) = 3. 2 (3 + x(0) 1 2x (0) ( ) = 1 ( 1 x(0) 2 ) = 1 3 ) = 1 3
6 Iterative Solvers Lab Objective: Many real-world problems of the form Ax = b have tens of thousands of parameters Solving such systems with Gaussian elimination or matrix factorizations could require
More informationLinear Algebra libraries in Debian. DebConf 10 New York 05/08/2010 Sylvestre
Linear Algebra libraries in Debian Who I am? Core developer of Scilab (daily job) Debian Developer Involved in Debian mainly in Science and Java aspects sylvestre.ledru@scilab.org / sylvestre@debian.org
More informationClassification of All Crescent Configurations on Four and Five Points
Classification of All Crescent Configurations on Four and Five Points Rebecca F. Durst, Max Hlavacek, Chi Huynh SMALL 2016 rfd1@williams.edu, mhlavacek@hmc.edu, nhuynh30@gatech.edu Young Mathematicians
More informationCoarse-Grained Parallel Geometric Search 1
Journal of Parallel and Distributed Computing 57, 224235 (1999) Article ID jpdc.1998.1527, available online at http:www.idealibrary.com on Coarse-Grained Parallel Geometric Search 1 Albert Chan,* Frank
More informationSciDAC CScADS Summer Workshop on Libraries and Algorithms for Petascale Applications
Parallel Tiled Algorithms for Multicore Architectures Alfredo Buttari, Jack Dongarra, Jakub Kurzak and Julien Langou SciDAC CScADS Summer Workshop on Libraries and Algorithms for Petascale Applications
More informationCOMP Preliminaries Jan. 6, 2015
Lecture 1 Computer graphics, broadly defined, is a set of methods for using computers to create and manipulate images. There are many applications of computer graphics including entertainment (games, cinema,
More informationSummer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics
Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics Moysey Brio & Paul Dostert July 4, 2009 1 / 18 Sparse Matrices In many areas of applied mathematics and modeling, one
More informationEstimating the Information Rate of Noisy Two-Dimensional Constrained Channels
Estimating the Information Rate of Noisy Two-Dimensional Constrained Channels Mehdi Molkaraie and Hans-Andrea Loeliger Dept. of Information Technology and Electrical Engineering ETH Zurich, Switzerland
More informationOn the Performance of Simple Parallel Computer of Four PCs Cluster
On the Performance of Simple Parallel Computer of Four PCs Cluster H. K. Dipojono and H. Zulhaidi High Performance Computing Laboratory Department of Engineering Physics Institute of Technology Bandung
More informationThe Randomized Shortest Path model in a nutshell
Panzacchi M, Van Moorter B, Strand O, Saerens M, Kivimäki I, Cassady St.Clair C., Herfindal I, Boitani L. (2015) Predicting the continuum between corridors and barriers to animal movements using Step Selection
More informationStrategies for Parallelizing the Solution of Rational Matrix Equations
Strategies for Parallelizing the Solution of Rational Matrix Equations José M. Badía 1, Peter Benner, Maribel Castillo 1, Heike Faßbender 3, Rafael Mayo 1, Enrique S. Quintana-Ortí 1, and Gregorio Quintana-Ortí
More informationAnalysis of Matrix Multiplication Computational Methods
European Journal of Scientific Research ISSN 1450-216X / 1450-202X Vol.121 No.3, 2014, pp.258-266 http://www.europeanjournalofscientificresearch.com Analysis of Matrix Multiplication Computational Methods
More informationLINUX. Benchmark problems have been calculated with dierent cluster con- gurations. The results obtained from these experiments are compared to those
Parallel Computing on PC Clusters - An Alternative to Supercomputers for Industrial Applications Michael Eberl 1, Wolfgang Karl 1, Carsten Trinitis 1 and Andreas Blaszczyk 2 1 Technische Universitat Munchen
More informationJava Performance Analysis for Scientific Computing
Java Performance Analysis for Scientific Computing Roldan Pozo Leader, Mathematical Software Group National Institute of Standards and Technology USA UKHEC: Java for High End Computing Nov. 20th, 2000
More informationPARALLEL TRAINING OF NEURAL NETWORKS FOR SPEECH RECOGNITION
PARALLEL TRAINING OF NEURAL NETWORKS FOR SPEECH RECOGNITION Stanislav Kontár Speech@FIT, Dept. of Computer Graphics and Multimedia, FIT, BUT, Brno, Czech Republic E-mail: xkonta00@stud.fit.vutbr.cz In
More informationSupercomputing and Science An Introduction to High Performance Computing
Supercomputing and Science An Introduction to High Performance Computing Part VII: Scientific Computing Henry Neeman, Director OU Supercomputing Center for Education & Research Outline Scientific Computing
More informationTowards a Portable Cluster Computing Environment Supporting Single System Image
Towards a Portable Cluster Computing Environment Supporting Single System Image Tatsuya Asazu y Bernady O. Apduhan z Itsujiro Arita z Department of Artificial Intelligence Kyushu Institute of Technology
More information1 Motivation for Improving Matrix Multiplication
CS170 Spring 2007 Lecture 7 Feb 6 1 Motivation for Improving Matrix Multiplication Now we will just consider the best way to implement the usual algorithm for matrix multiplication, the one that take 2n
More informationPerformance Modeling of Pipelined Linear Algebra Architectures on FPGAs
Performance Modeling of Pipelined Linear Algebra Architectures on FPGAs Sam Skalicky, Sonia López, Marcin Łukowiak, James Letendre, and Matthew Ryan Rochester Institute of Technology, Rochester NY 14623,
More informationBLAS and LAPACK + Data Formats for Sparse Matrices. Part of the lecture Wissenschaftliches Rechnen. Hilmar Wobker
BLAS and LAPACK + Data Formats for Sparse Matrices Part of the lecture Wissenschaftliches Rechnen Hilmar Wobker Institute of Applied Mathematics and Numerics, TU Dortmund email: hilmar.wobker@math.tu-dortmund.de
More informationEFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI
EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI 1 Akshay N. Panajwar, 2 Prof.M.A.Shah Department of Computer Science and Engineering, Walchand College of Engineering,
More informationBindel, Fall 2011 Applications of Parallel Computers (CS 5220) Tuning on a single core
Tuning on a single core 1 From models to practice In lecture 2, we discussed features such as instruction-level parallelism and cache hierarchies that we need to understand in order to have a reasonable
More informationReconstruction of Trees from Laser Scan Data and further Simulation Topics
Reconstruction of Trees from Laser Scan Data and further Simulation Topics Helmholtz-Research Center, Munich Daniel Ritter http://www10.informatik.uni-erlangen.de Overview 1. Introduction of the Chair
More informationDriven Cavity Example
BMAppendixI.qxd 11/14/12 6:55 PM Page I-1 I CFD Driven Cavity Example I.1 Problem One of the classic benchmarks in CFD is the driven cavity problem. Consider steady, incompressible, viscous flow in a square
More informationA Graph Theoretic Approach to Image Database Retrieval
A Graph Theoretic Approach to Image Database Retrieval Selim Aksoy and Robert M. Haralick Intelligent Systems Laboratory Department of Electrical Engineering University of Washington, Seattle, WA 98195-2500
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming Linda Woodard CAC 19 May 2010 Introduction to Parallel Computing on Ranger 5/18/2010 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor
More informationCS6015 / LARP ACK : Linear Algebra and Its Applications - Gilbert Strang
Solving and CS6015 / LARP 2018 ACK : Linear Algebra and Its Applications - Gilbert Strang Introduction Chapter 1 concentrated on square invertible matrices. There was one solution to Ax = b and it was
More informationParallelisation of Surface-Related Multiple Elimination
Parallelisation of Surface-Related Multiple Elimination G. M. van Waveren High Performance Computing Centre, Groningen, The Netherlands and I.M. Godfrey Stern Computing Systems, Lyon,
More informationImage-Space-Parallel Direct Volume Rendering on a Cluster of PCs
Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs B. Barla Cambazoglu and Cevdet Aykanat Bilkent University, Department of Computer Engineering, 06800, Ankara, Turkey {berkant,aykanat}@cs.bilkent.edu.tr
More informationGeometric Mean Algorithms Based on Harmonic and Arithmetic Iterations
Geometric Mean Algorithms Based on Harmonic and Arithmetic Iterations Ben Jeuris and Raf Vandebril KU Leuven, Dept. of Computer Science, 3001 Leuven(Heverlee), Belgium {ben.jeuris,raf.vandebril}@cs.kuleuven.be
More informationPerformance Metrics of a Parallel Three Dimensional Two-Phase DSMC Method for Particle-Laden Flows
Performance Metrics of a Parallel Three Dimensional Two-Phase DSMC Method for Particle-Laden Flows Benzi John* and M. Damodaran** Division of Thermal and Fluids Engineering, School of Mechanical and Aerospace
More informationThe Geometry of Carpentry and Joinery
The Geometry of Carpentry and Joinery Pat Morin and Jason Morrison School of Computer Science, Carleton University, 115 Colonel By Drive Ottawa, Ontario, CANADA K1S 5B6 Abstract In this paper we propose
More informationChapter 1 A New Parallel Algorithm for Computing the Singular Value Decomposition
Chapter 1 A New Parallel Algorithm for Computing the Singular Value Decomposition Nicholas J. Higham Pythagoras Papadimitriou Abstract A new method is described for computing the singular value decomposition
More informationStorage Formats for Sparse Matrices in Java
Storage Formats for Sparse Matrices in Java Mikel Luján, Anila Usman, Patrick Hardie, T.L. Freeman, and John R. Gurd Centre for Novel Computing, The University of Manchester, Oxford Road, Manchester M13
More informationIlya Lashuk, Merico Argentati, Evgenii Ovtchinnikov, Andrew Knyazev (speaker)
Ilya Lashuk, Merico Argentati, Evgenii Ovtchinnikov, Andrew Knyazev (speaker) Department of Mathematics and Center for Computational Mathematics University of Colorado at Denver SIAM Conference on Parallel
More informationRobust Signal-Structure Reconstruction
Robust Signal-Structure Reconstruction V. Chetty 1, D. Hayden 2, J. Gonçalves 2, and S. Warnick 1 1 Information and Decision Algorithms Laboratories, Brigham Young University 2 Control Group, Department
More informationMatrix Multiplication
Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2013 1 / 32 Outline 1 Matrix operations Importance Dense and sparse
More informationPerformance Evaluation of Multiple and Mixed Precision Iterative Refinement Method and its Application to High-Order Implicit Runge-Kutta Method
Performance Evaluation of Multiple and Mixed Precision Iterative Refinement Method and its Application to High-Order Implicit Runge-Kutta Method Tomonori Kouya Shizuoa Institute of Science and Technology,
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 Advance Encryption Standard (AES) Rijndael algorithm is symmetric block cipher that can process data blocks of 128 bits, using cipher keys with lengths of 128, 192, and 256
More informationImage Coding with Active Appearance Models
Image Coding with Active Appearance Models Simon Baker, Iain Matthews, and Jeff Schneider CMU-RI-TR-03-13 The Robotics Institute Carnegie Mellon University Abstract Image coding is the task of representing
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming David Lifka lifka@cac.cornell.edu May 23, 2011 5/23/2011 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor or computer to complete
More informationThe Use of Biplot Analysis and Euclidean Distance with Procrustes Measure for Outliers Detection
Volume-8, Issue-1 February 2018 International Journal of Engineering and Management Research Page Number: 194-200 The Use of Biplot Analysis and Euclidean Distance with Procrustes Measure for Outliers
More informationDense matrix algebra and libraries (and dealing with Fortran)
Dense matrix algebra and libraries (and dealing with Fortran) CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Dense matrix algebra and libraries (and dealing with Fortran)
More informationFigure (5) Kohonen Self-Organized Map
2- KOHONEN SELF-ORGANIZING MAPS (SOM) - The self-organizing neural networks assume a topological structure among the cluster units. - There are m cluster units, arranged in a one- or two-dimensional array;
More informationDISTRIBUTION STATEMENT A Approved for public release: distribution unlimited.
AVIA Test Selection through Spatial Variance Bounding Method for Autonomy Under Test By Miles Thompson Senior Research Engineer Aerospace, Transportation, and Advanced Systems Lab DISTRIBUTION STATEMENT
More informationAdvanced Numerical Techniques for Cluster Computing
Advanced Numerical Techniques for Cluster Computing Presented by Piotr Luszczek http://icl.cs.utk.edu/iter-ref/ Presentation Outline Motivation hardware Dense matrix calculations Sparse direct solvers
More informationA Few Numerical Libraries for HPC
A Few Numerical Libraries for HPC CPS343 Parallel and High Performance Computing Spring 2016 CPS343 (Parallel and HPC) A Few Numerical Libraries for HPC Spring 2016 1 / 37 Outline 1 HPC == numerical linear
More informationA Domain Decomposition Based Algorithm For Non-linear 2D Inverse Heat Conduction Problems
Contemporary Mathematics Volume 28, 998 B -828-988--35- A Domain Decomposition Based Algorithm For Non-linear 2D Inverse Heat Conduction Problems Charaka J. Palansuriya, Choi-Hong Lai, Constantinos S.
More informationMatrix Multiplication
Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2018 1 / 32 Outline 1 Matrix operations Importance Dense and sparse
More informationIntel Math Kernel Library 10.3
Intel Math Kernel Library 10.3 Product Brief Intel Math Kernel Library 10.3 The Flagship High Performance Computing Math Library for Windows*, Linux*, and Mac OS* X Intel Math Kernel Library (Intel MKL)
More informationAbstract. modeling of full scale airborne systems has been ported to three networked
Distributed Computational Electromagnetics Systems Gang Cheng y Kenneth A. Hawick y Gerald Mortensen z Georey C. Fox y Abstract We describe our development of a \real world" electromagnetic application
More informationsurface but these local maxima may not be optimal to the objective function. In this paper, we propose a combination of heuristic methods: first, addi
MetaHeuristics for a Non-Linear Spatial Sampling Problem Eric M. Delmelle Department of Geography and Earth Sciences University of North Carolina at Charlotte eric.delmelle@uncc.edu 1 Introduction In spatial
More informationThe parallelization of a block-tridiagonal matrix system for an electromagnetic wave simulation in TOKAMAK(TORIC) by MPI Fortran
The parallelization of a block-tridiagonal matrix system for an electromagnetic wave simulation in TOKAMAK(TORIC) by MPI Fortran Jungpyo Lee Graduate Student in a department
More informationRobot Mapping. Least Squares Approach to SLAM. Cyrill Stachniss
Robot Mapping Least Squares Approach to SLAM Cyrill Stachniss 1 Three Main SLAM Paradigms Kalman filter Particle filter Graphbased least squares approach to SLAM 2 Least Squares in General Approach for
More informationGraphbased. Kalman filter. Particle filter. Three Main SLAM Paradigms. Robot Mapping. Least Squares Approach to SLAM. Least Squares in General
Robot Mapping Three Main SLAM Paradigms Least Squares Approach to SLAM Kalman filter Particle filter Graphbased Cyrill Stachniss least squares approach to SLAM 1 2 Least Squares in General! Approach for
More informationIssues In Implementing The Primal-Dual Method for SDP. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM
Issues In Implementing The Primal-Dual Method for SDP Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.edu Outline 1. Cache and shared memory parallel computing concepts.
More informationIterated Functions Systems and Fractal Coding
Qing Jun He 90121047 Math 308 Essay Iterated Functions Systems and Fractal Coding 1. Introduction Fractal coding techniques are based on the theory of Iterated Function Systems (IFS) founded by Hutchinson
More informationcomputational Fluid Dynamics - Prof. V. Esfahanian
Three boards categories: Experimental Theoretical Computational Crucial to know all three: Each has their advantages and disadvantages. Require validation and verification. School of Mechanical Engineering
More informationIntel Performance Libraries
Intel Performance Libraries Powerful Mathematical Library Intel Math Kernel Library (Intel MKL) Energy Science & Research Engineering Design Financial Analytics Signal Processing Digital Content Creation
More informationHigh Performance Computing
The Need for Parallelism High Performance Computing David McCaughan, HPC Analyst SHARCNET, University of Guelph dbm@sharcnet.ca Scientific investigation traditionally takes two forms theoretical empirical
More informationChapter 18. Geometric Operations
Chapter 18 Geometric Operations To this point, the image processing operations have computed the gray value (digital count) of the output image pixel based on the gray values of one or more input pixels;
More informationMAT 275 Laboratory 2 Matrix Computations and Programming in MATLAB
MATLAB sessions: Laboratory MAT 75 Laboratory Matrix Computations and Programming in MATLAB In this laboratory session we will learn how to. Create and manipulate matrices and vectors.. Write simple programs
More informationThe Application of EXCEL in Teaching Finite Element Analysis to Final Year Engineering Students.
The Application of EXCEL in Teaching Finite Element Analysis to Final Year Engineering Students. Kian Teh and Laurie Morgan Curtin University of Technology Abstract. Many commercial programs exist for
More informationAdvanced Topics UNIT 2 PERFORMANCE EVALUATIONS
Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Structure Page Nos. 2.0 Introduction 4 2. Objectives 5 2.2 Metrics for Performance Evaluation 5 2.2. Running Time 2.2.2 Speed Up 2.2.3 Efficiency 2.3 Factors
More informationChapter 18 out of 37 from Discrete Mathematics for Neophytes: Number Theory, Probability, Algorithms, and Other Stuff by J. M. Cargal.
Chapter 8 out of 7 from Discrete Mathematics for Neophytes: Number Theory, Probability, Algorithms, and Other Stuff by J. M. Cargal 8 Matrices Definitions and Basic Operations Matrix algebra is also known
More information2.7 Numerical Linear Algebra Software
2.7 Numerical Linear Algebra Software In this section we will discuss three software packages for linear algebra operations: (i) (ii) (iii) Matlab, Basic Linear Algebra Subroutines (BLAS) and LAPACK. There
More informationWhat is Multigrid? They have been extended to solve a wide variety of other problems, linear and nonlinear.
AMSC 600/CMSC 760 Fall 2007 Solution of Sparse Linear Systems Multigrid, Part 1 Dianne P. O Leary c 2006, 2007 What is Multigrid? Originally, multigrid algorithms were proposed as an iterative method to
More informationCGMGRAPH/CGMLIB: IMPLEMENTING AND TESTING CGM GRAPH ALGORITHMS ON PC CLUSTERS AND SHARED MEMORY MACHINES
CGMGRAPH/CGMLIB: IMPLEMENTING AND TESTING CGM GRAPH ALGORITHMS ON PC CLUSTERS AND SHARED MEMORY MACHINES Albert Chan 1 Frank Dehne 2 Ryan Taylor 3 Abstract In this paper, we present CGMgraph, the first
More information3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:
BIT 325 PARALLEL PROCESSING ASSESSMENT CA 40% TESTS 30% PRESENTATIONS 10% EXAM 60% CLASS TIME TABLE SYLLUBUS & RECOMMENDED BOOKS Parallel processing Overview Clarification of parallel machines Some General
More informationParallel Implementation of Interval Analysis for Equations Solving
Parallel Implementation of Interval Analysis for Equations Solving Yves Papegay, David Daney, and Jean-Pierre Merlet INRIA Sophia Antipolis COPRIN Team, 2004 route des Lucioles, F-06902 Sophia Antipolis,
More informationIntroduction to Parallel Computing
Introduction to Parallel Computing W. P. Petersen Seminar for Applied Mathematics Department of Mathematics, ETHZ, Zurich wpp@math. ethz.ch P. Arbenz Institute for Scientific Computing Department Informatik,
More informationLAPACK. Linear Algebra PACKage. Janice Giudice David Knezevic 1
LAPACK Linear Algebra PACKage 1 Janice Giudice David Knezevic 1 Motivating Question Recalling from last week... Level 1 BLAS: vectors ops Level 2 BLAS: matrix-vectors ops 2 2 O( n ) flops on O( n ) data
More informationParallel Performance Studies for a Clustering Algorithm
Parallel Performance Studies for a Clustering Algorithm Robin V. Blasberg and Matthias K. Gobbert Naval Research Laboratory, Washington, D.C. Department of Mathematics and Statistics, University of Maryland,
More informationCMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)
CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can
More information