Block of PEs used in computing A[k, N-l] Block of PEs used in computing A[k, l] A[k, N-l] A[N-k, N-l] Block of PEs used in computing A[N-k, l]

Size: px
Start display at page:

Download "Block of PEs used in computing A[k, N-l] Block of PEs used in computing A[k, l] A[k, N-l] A[N-k, N-l] Block of PEs used in computing A[N-k, l]"

Transcription

1 A Parallel Algorithm with Embedded Load Balancing for the Computation of Autocorrelation Matrix S.R. Subramanya Department of Electrical Engineering and Computer Science, The George Washington University, Washington, DC Abstract. The computation of autocorrelation matrix is used heavily in several areas including signal and image processing, where parallel architectures are also being increasingly used. Therefore, an ecient scheme to compute autocorrelation matrix on parallel architectures has tremendous bene- ts. In this paper, an ecient parallel algorithm for the computation of autocorrelation matrix on 2-D mesh is presented. The computation requirements for the elements of the autocorrelation matrix is highly skewed and the proposed algorithm attempts to balance the computation load, without requiring an external load balancing algorithm or processor. In this sense, the load balancing is embedded within the algorithm. Communication and computation complexities are analyzed separately. The proposed algorithm is shown to provide speedup of up to 375% over the straight-forward parallel algorithm. Keywords: Autocorrelation matrix, Parallel algorithm, 2-D Mesh, Load balancing. 1 Introduction The computation of autocorrelation matrix is central to several applications including signal and image processing. For example, it is used for the computation of the coecients of the ARMA (autoregressive moving average) model, used for modeling stationary signals. Non-stationary signals are sometimes approximated by considering windows of the signal and modeling the signal windows as a stationary signal with suitable parameters. Given a matrix X = (x i;j ) 0i;jN?1, the autocorrelation matrix is given by A = (a k;l ) 0k;lN?1, where a k;l = 1 (N? k)(n? l) N?1?k X i=0 N?1?l X j=0 x i;j x i+k;j+l We consider a 2-D mesh as the architecture for which parallel algorithms for autocorrelation matrix computation will be developed. The basic architecture consists of processing elements (PEs) arranged in a 2-D array with nearest-neighbor interconnections. Each PE has a simple structure capable of arithmetic operations and basic communication capabilities such as send and receive, to and from the directly connected neighbors, and has local memory for its exclusive use. In the subsequent discussion, the terms PE and processor are used synonymously. Since the proposed algorithm has the same asymptotic complexity as the straight-forward parallel algorithm, we compute the exact number of steps required by the two algorithms. Computation and communication steps are separately computed. The next section gives the notations and assumptions used in rest of the paper, Sections 3 and 4 describe and analyze, respectively, the straight-forward parallel algorithm and the proposed algorithm, the speedup of the proposed algorithm is given in Section 4, followed by conclusion.

2 2 Notations and Assumptions Algorithm 3.1 ParAutoCorr (X, A) In the algorithms and discussion to follow, the following notations and assumptions are used. The input matrix X is of size N N, as is the size of the 2-D mesh used in the computation and of matrix A which holds the autocorrelation matrix resulting from the computation. The indices for both the matrix and the PEs range from 0 to (N? 1). Each PE (i; j) initially contains X[i; j] and there is enough local memory at each processor to hold the entire matrix X. An all-to-all broadcast is done and the entire matrix X is built in each processor [2]. Computation of autocorrelation matrix then proceeds and at the end of the computation, each PE (i; j) will contain the element A[i; j]. 2. for k = 0 to N? 1 do 3. for l = 0 to N? 1 do 4. for i = 0 to N? 1? k pardo 5. for j = 0 to N? 1? l pardo 6. P (k + i; l + j) (does): S[k + i; l + j] X[i; j] X[i + k; j + l]; 7. endfor 8. endfor 9. for i = k to N? 1 pardo 10. Sum all S[i; j]; l j N? 1 using parallel reduction, and store the result in S[i; l]. 11. endfor 12. Sum all S[i; l]; k i N? 1 using parallel reduction, and store the result in A[k; l]. 13. endfor 1for 15. end 3.1 Analysis of the Straight-Forward Parallel Algorithm 3 The Straight-Forward Parallel Algorithm We give below the straight-forward parallel algorithm to aid in the understanding of the underlying parallelism and analyze its communication and computation complexity. The straight-forward parallel algorithm is described below: It is easily seen that the computation requirements of the autocorrelation matrix elements is highly skewed: a 0;0 takes N 2 multiplications, N 2? 1 additions, and 2(N? 1) communication steps, while a N?1;N?1 takes just 1 multiplication and no addition and communication steps, and the number of steps required for the remaining elements is something in between. Thus the load on the processors is highly skewed, which is the motivating factor for our proposed algorithm with embedded load balancing. Since the proposed algorithm has the same asymptotic complexity as the straight-forward parallel algorithm, we compute the exact number of steps taken by the two algorithms for computation and communication, and compare them. Note that for the computation of an element

3 A[k; l], processors in the rectangular block with the top left corner at (k; l) and bottom right corner (N? 1; N? 1) (with height H = N? k and width W = N? l) operate in parallel and the result is stored in PE: P (k; l). All the multiplications are carried out in parallel in one step. The additions require dlog W e + dlog He steps and (N? 1? l) + (N? 1? k) steps. The exact time for computation (derived in [2]) is: N 2 + 2N[Nlog N? N + 1] The time for communications consists essentially of the product terms to move from PE to PE for addition. This requires a communication time of (N? l? 1) + (N? k? 1), which is the time for the product term from the farthest PE, P (N? 1; N? 1) to reach P (k; l). So, the total communication time is shown (in [2]) to be: N 2 (N? 1) So, the execution time, which is the sum of computation and communication times, of the straight-forward parallel algorithm is: N 2 + 2N[Nlog N? N + 1] + N 2 (N? 1) the corresponding blocks of processors used during each of the phases. Elements Physical block of processors Ph. computed used in computation concurrently Top left corner Height Width 1 (0; 0) (0; 0) N N 1 l N=2 2 (0; l) (0; 0) N (N? l) (0; N? l) (0; N? l) N l 1 k N=2 3 (k; 0) (0; 0) (N? k) N (N? k; 0) (N? k; 0) k N 1 k; l N=2 4 (k; l) (0; 0) (N? k) (N? l) (k; N? l) (0; N? l) (N? k) l (N? k; l) (N? k; 0) k (N? l) (N? k; N? l) (N? k; N? l) k l Table 1: Blocks of processors used in the computation of elements of A The computation pattern is easy to visualize and Figure 1 shows pattern for the fourth phase. Block of PEs used in computing A[k, l] A[k, l] Block of PEs used in computing A[k, N-l] A[k, N-l] = N 3 + 2N[Nlog N? N + 1] 4 Parallel Algorithm with Embedded Load Balancing In the above algorithm each a k;l uses a block of PEs working in parallel. Since the computation and communication requirements of the elements are highly skewed, we propose an algorithm which does concurrent computation of the a k;l 's, using dierent blocks of processors working in parallel for dierent a k;l 's, without block interference. Since attempt is made to balance the load on each PE without using any external load balancing algorithm or processor, we say that the load balancing is embedded in the algorithm. In this algorithm N is assumed to be odd, although only a slight modication is required for even N. The algorithm has four phases and Table 1 below summarizes the elements computed and A[N-k, l] Block of PEs used in computing A[N-k, l] Figure 1: A[N-k, N-l] Block of PEs used in computing A[N-k, N-l] Pattern of computation in the proposed algorithm (fourth phase). The dark nodes indicate the element being computed and also the node where the result will be stored. The rectangles enclosing the dark nodes indicate the block of PEs which participate in the computation of the corresponding element. With this scheme, the all

4 the multiplications for all the four elements can be done in one step. Addition of all the product terms is then done by a series of communication and addition steps. Algorithm 4.1 ParAutoCorrWithLoadBal (X; A) fphase 1g 2. Compute a 0;0. fphase 2g 3. for l = 1 to N=2 do 4. Do Concurrently: 5. Compute a 0;l and Compute a 0;N?l. 6. enddo 7. endfor fphase 3g 8. for k = 1 to N=2 do 9. Do Concurrently: 10. Compute a k;0 and Compute a N?k; enddo 12. endfor fphase 4g 13. for k = 1 to N=2 do 14. for l = 1 to N=2 do 15. Do Concurrently: 16. Compute a k;l, Compute a k;n?l, Compute a N?k;l, and Compute a N?k;N?l. 17. enddo 18. endfor 19. endfor 20. end Algorithm 4.2 Compute a k;l 2. MultiplyBlock (k; l; N? k; N? l; 0; 0). 3. AddBlock (k; l; N? k; N? l; 0; 0). Compute a k;n?l 2. MultiplyBlock (k; N? l; N? k; l; 0; N? l). 3. AddBlock (k; N? l; N? k; l; 0; N? l). Compute a N?k;l 2. MultiplyBlock (N? k; l; k; N? l; N? k; 0). 3. AddBlock (N? k; l; k; N? l; N? k; 0). Compute a N?k;N?l 2. MultiplyBlock (N? k; N? l; k; l; N? k; N? l). 3. AddBlock (N? k; N? l; k; l; N? k; N? l). Algorithm 4.3 (k; l; H; W; x; y) MultiplyBlock Block of processors with top left corner at (x; y) and of height H and width W are used in the computation of element a k;l. 2. for i = 0 to H? 1 pardo 3. for j = 0 to W? 1 pardo 4. P(x+i,y+j) (does): 5. prod X[i; j] X[k + i; l + j]; 6. endfor 7. endfor 8. end

5 Algorithm 4.4 (k; l; H; W; x; y) AddBlock total time. Communication time T c required for the addition of all elements of any row in a processor block of width W, with the result to be stored in column l is given in Table 2. Block of processors with top left corner at (x; y) and of height H and width W are used in the computation of element a k;l. 2. for i = 0 to H? 1 pardo 3. for m = 1 to dlog W e do 4. Sum in parallel, the terms in nodes that are distance m apart in such a way that the sum of any row i is available in PE (i; l) at the end. 5. endfor 6. endfor f Sum of row i will be available in A[i; l]g 7. for m = 1 to dlog He do 8. Sum in parallel, the terms in nodes of column l that are distance m in such a way that the sum of column l is available in PE (k; l) at the end. 9. endfor f Sum of column l will be available in A[k; l]g 10. end Column index of result: l l < W=2 l = W=2 l > W=2 Communication steps: T c W? 1? l l + (dw=2e? bw=2c) l Table 2: Communication steps as a function of the index of element being computed A similar table holds for the communication time required for the addition of elements of any column in a processor block of height H, with the result to be stored in row k. This table is central in the computation of communication times during the additions. In phase four, during the concurrent computation of a k;l, a k;n?l, a N?k;l, and a N?k;N?l, the corresponding blocks of PEs work independently of other blocks and the communication time is dominated by the communication time required during the computation of a k;l and hence will suce to calculate only that. The computation and communication steps for the four phases are derived in [2], and results are tabulated below. 4.1 Analysis of the Proposed Algorithm We determine the time taken by the proposed algorithm by computing the exact number of steps required by computation and communication. It should be noted that during the computation of any a k;l, all the required multiplications are done in one step and the remaining time is taken by the additions requiring addition and communication steps. So, the total multiplication steps for all a k;l 's being N 2, we consider only the addition times in further calculations and add N 2 at the end, to get the Phase Computation Time dlog Ne +P N 2, 3 dlog Ne (N?1) dlog ie 2 i=n=2 P N?1 4 N dlog ie i=n=2 Phase Communication Time 1 2(N? 1) N 2, 3 (19N? 26) N 2 (N? 2)

6 The total computation steps P is: 1 + N?1 2dlog Ne+Ndlog Ne+(N +2)( dlog ie). i=n=2 The total communication time is: 2(N? 1) + N 24 (19N? 26) + N 24 (19N? 26) N 2 (N? 2) = 7 24 N 3 + N 2? N 6? 2. The execution time for the proposed algorithm is the sum of its computation and communication times. The speedup provided by the proposed algorithm over the straightforward parallel algorithm is (shown in [2] to be): > N 3 + 2N 2 log N? 2N 2 + 2N 7 N 3 + N 2? N + (2N + 4)log N 24 6 References [1] Akl, S.G. Design and Analysis of Parallel Algorithms, Prentice-Hall, [2] S.R.Subramanya. `A Parallel Algorithm with Embedded Load Balancing for the Computation of Autocorrelation Matrix on 2-D Meshes', (unpublished manuscript). [3] Hedetniemi, S.M. `A Survey of Gossiping and Broadcasting in Communication Networks', Networks, Vol. 18, 1988, pp [4] Brockwell, P.J. and Davis, R.A. Time Series: Theory and Methods, Springer-Verlag, The speedup is shown in Figure 2 for various values of N (matrix size) Speedup Matrix size Figure 2: Speedup of the proposed algorithm over straight-forward parallel algorithm. 5 Conclusions In this paper, a parallel algorithm for the computation of autocorrelation matrix on a 2-D mesh was presented. Load balancing was embedded in the algorithm. The computation and communication times were separately computed. The proposed algorithm was shown to provide a speedup of up to 375% over the straight-forward parallel algorithm.

Hypercubes. (Chapter Nine)

Hypercubes. (Chapter Nine) Hypercubes (Chapter Nine) Mesh Shortcomings: Due to its simplicity and regular structure, the mesh is attractive, both theoretically and practically. A problem with the mesh is that movement of data is

More information

Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar

Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003 Topic Overview One-to-All Broadcast

More information

For example, the system. 22 may be represented by the augmented matrix

For example, the system. 22 may be represented by the augmented matrix Matrix Solutions to Linear Systems A matrix is a rectangular array of elements. o An array is a systematic arrangement of numbers or symbols in rows and columns. Matrices (the plural of matrix) may be

More information

Matrix Multiplication on an Experimental Parallel System With Hybrid Architecture

Matrix Multiplication on an Experimental Parallel System With Hybrid Architecture Matrix Multiplication on an Experimental Parallel System With Hybrid Architecture SOTIRIOS G. ZIAVRAS and CONSTANTINE N. MANIKOPOULOS Department of Electrical and Computer Engineering New Jersey Institute

More information

Lecture 12 (Last): Parallel Algorithms for Solving a System of Linear Equations. Reference: Introduction to Parallel Computing Chapter 8.

Lecture 12 (Last): Parallel Algorithms for Solving a System of Linear Equations. Reference: Introduction to Parallel Computing Chapter 8. CZ4102 High Performance Computing Lecture 12 (Last): Parallel Algorithms for Solving a System of Linear Equations - Dr Tay Seng Chuan Reference: Introduction to Parallel Computing Chapter 8. 1 Topic Overview

More information

Department of Computer Science. a vertex can communicate with a particular neighbor. succeeds if it shares no edge with other calls during

Department of Computer Science. a vertex can communicate with a particular neighbor. succeeds if it shares no edge with other calls during Sparse Hypercube A Minimal k-line Broadcast Graph Satoshi Fujita Department of Electrical Engineering Hiroshima University Email: fujita@se.hiroshima-u.ac.jp Arthur M. Farley Department of Computer Science

More information

Estimating normal vectors and curvatures by centroid weights

Estimating normal vectors and curvatures by centroid weights Computer Aided Geometric Design 21 (2004) 447 458 www.elsevier.com/locate/cagd Estimating normal vectors and curvatures by centroid weights Sheng-Gwo Chen, Jyh-Yang Wu Department of Mathematics, National

More information

Dense Matrix Algorithms

Dense Matrix Algorithms Dense Matrix Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 2003. Topic Overview Matrix-Vector Multiplication

More information

Vertex Magic Total Labelings of Complete Graphs 1

Vertex Magic Total Labelings of Complete Graphs 1 Vertex Magic Total Labelings of Complete Graphs 1 Krishnappa. H. K. and Kishore Kothapalli and V. Ch. Venkaiah Centre for Security, Theory, and Algorithmic Research International Institute of Information

More information

Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering

Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering George Karypis and Vipin Kumar Brian Shi CSci 8314 03/09/2017 Outline Introduction Graph Partitioning Problem Multilevel

More information

An Ecient Approximation Algorithm for the. File Redistribution Scheduling Problem in. Fully Connected Networks. Abstract

An Ecient Approximation Algorithm for the. File Redistribution Scheduling Problem in. Fully Connected Networks. Abstract An Ecient Approximation Algorithm for the File Redistribution Scheduling Problem in Fully Connected Networks Ravi Varadarajan Pedro I. Rivera-Vega y Abstract We consider the problem of transferring a set

More information

Data parallel algorithms 1

Data parallel algorithms 1 Data parallel algorithms (Guy Steele): The data-parallel programming style is an approach to organizing programs suitable for execution on massively parallel computers. In this lecture, we will characterize

More information

Parallelizing The Matrix Multiplication. 6/10/2013 LONI Parallel Programming Workshop

Parallelizing The Matrix Multiplication. 6/10/2013 LONI Parallel Programming Workshop Parallelizing The Matrix Multiplication 6/10/2013 LONI Parallel Programming Workshop 2013 1 Serial version 6/10/2013 LONI Parallel Programming Workshop 2013 2 X = A md x B dn = C mn d c i,j = a i,k b k,j

More information

CPS222 Lecture: Parallel Algorithms Last Revised 4/20/2015. Objectives

CPS222 Lecture: Parallel Algorithms Last Revised 4/20/2015. Objectives CPS222 Lecture: Parallel Algorithms Last Revised 4/20/2015 Objectives 1. To introduce fundamental concepts of parallel algorithms, including speedup and cost. 2. To introduce some simple parallel algorithms.

More information

Modelling and simulation of seismic reflectivity

Modelling and simulation of seismic reflectivity Modelling reflectivity Modelling and simulation of seismic reflectivity Rita Aggarwala, Michael P. Lamoureux, and Gary F. Margrave ABSTRACT We decompose the reflectivity series obtained from a seismic

More information

Parallel Clustering on a Unidirectional Ring. Gunter Rudolph 1. University of Dortmund, Department of Computer Science, LS XI, D{44221 Dortmund

Parallel Clustering on a Unidirectional Ring. Gunter Rudolph 1. University of Dortmund, Department of Computer Science, LS XI, D{44221 Dortmund Parallel Clustering on a Unidirectional Ring Gunter Rudolph 1 University of Dortmund, Department of Computer Science, LS XI, D{44221 Dortmund 1. Introduction Abstract. In this paper a parallel version

More information

Linear Arrays. Chapter 7

Linear Arrays. Chapter 7 Linear Arrays Chapter 7 1. Basics for the linear array computational model. a. A diagram for this model is P 1 P 2 P 3... P k b. It is the simplest of all models that allow some form of communication between

More information

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Zhou B. B., Brent R. P. and Tridgell A. y Computer Sciences Laboratory The Australian National University Canberra,

More information

The PRAM model. A. V. Gerbessiotis CIS 485/Spring 1999 Handout 2 Week 2

The PRAM model. A. V. Gerbessiotis CIS 485/Spring 1999 Handout 2 Week 2 The PRAM model A. V. Gerbessiotis CIS 485/Spring 1999 Handout 2 Week 2 Introduction The Parallel Random Access Machine (PRAM) is one of the simplest ways to model a parallel computer. A PRAM consists of

More information

As an additional safeguard on the total buer size required we might further

As an additional safeguard on the total buer size required we might further As an additional safeguard on the total buer size required we might further require that no superblock be larger than some certain size. Variable length superblocks would then require the reintroduction

More information

Chapter 18. Geometric Operations

Chapter 18. Geometric Operations Chapter 18 Geometric Operations To this point, the image processing operations have computed the gray value (digital count) of the output image pixel based on the gray values of one or more input pixels;

More information

International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA

International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERATIONS KAM_IL SARAC, OMER E GEC_IO GLU, AMR EL ABBADI

More information

INTERCONNECTION NETWORKS LECTURE 4

INTERCONNECTION NETWORKS LECTURE 4 INTERCONNECTION NETWORKS LECTURE 4 DR. SAMMAN H. AMEEN 1 Topology Specifies way switches are wired Affects routing, reliability, throughput, latency, building ease Routing How does a message get from source

More information

Interconnection networks

Interconnection networks Interconnection networks When more than one processor needs to access a memory structure, interconnection networks are needed to route data from processors to memories (concurrent access to a shared memory

More information

Tiling Rectangles with Gaps by Ribbon Right Trominoes

Tiling Rectangles with Gaps by Ribbon Right Trominoes Open Journal of Discrete Mathematics, 2017, 7, 87-102 http://www.scirp.org/journal/ojdm ISSN Online: 2161-7643 ISSN Print: 2161-7635 Tiling Rectangles with Gaps by Ribbon Right Trominoes Premalatha Junius,

More information

Bit Summation on the Recongurable Mesh. Martin Middendorf? Institut fur Angewandte Informatik

Bit Summation on the Recongurable Mesh. Martin Middendorf? Institut fur Angewandte Informatik Bit Summation on the Recongurable Mesh Martin Middendorf? Institut fur Angewandte Informatik und Formale Beschreibungsverfahren, Universitat Karlsruhe, D-76128 Karlsruhe, Germany mmi@aifb.uni-karlsruhe.de

More information

Bounds on the signed domination number of a graph.

Bounds on the signed domination number of a graph. Bounds on the signed domination number of a graph. Ruth Haas and Thomas B. Wexler September 7, 00 Abstract Let G = (V, E) be a simple graph on vertex set V and define a function f : V {, }. The function

More information

Statistical Methods in AI

Statistical Methods in AI Statistical Methods in AI Distance Based and Linear Classifiers Shrenik Lad, 200901097 INTRODUCTION : The aim of the project was to understand different types of classification algorithms by implementing

More information

Digital Signal Processing. Soma Biswas

Digital Signal Processing. Soma Biswas Digital Signal Processing Soma Biswas 2017 Partial credit for slides: Dr. Manojit Pramanik Outline What is FFT? Types of FFT covered in this lecture Decimation in Time (DIT) Decimation in Frequency (DIF)

More information

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Structure Page Nos. 2.0 Introduction 4 2. Objectives 5 2.2 Metrics for Performance Evaluation 5 2.2. Running Time 2.2.2 Speed Up 2.2.3 Efficiency 2.3 Factors

More information

Constant Queue Routing on a Mesh

Constant Queue Routing on a Mesh Constant Queue Routing on a Mesh Sanguthevar Rajasekaran Richard Overholt Dept. of Computer and Information Science Univ. of Pennsylvania, Philadelphia, PA 19104 ABSTRACT Packet routing is an important

More information

Rainbow game domination subdivision number of a graph

Rainbow game domination subdivision number of a graph Rainbow game domination subdivision number of a graph J. Amjadi Department of Mathematics Azarbaijan Shahid Madani University Tabriz, I.R. Iran j-amjadi@azaruniv.edu Abstract The rainbow game domination

More information

Lecture 5: Matrices. Dheeraj Kumar Singh 07CS1004 Teacher: Prof. Niloy Ganguly Department of Computer Science and Engineering IIT Kharagpur

Lecture 5: Matrices. Dheeraj Kumar Singh 07CS1004 Teacher: Prof. Niloy Ganguly Department of Computer Science and Engineering IIT Kharagpur Lecture 5: Matrices Dheeraj Kumar Singh 07CS1004 Teacher: Prof. Niloy Ganguly Department of Computer Science and Engineering IIT Kharagpur 29 th July, 2008 Types of Matrices Matrix Addition and Multiplication

More information

SHARED MEMORY VS DISTRIBUTED MEMORY

SHARED MEMORY VS DISTRIBUTED MEMORY OVERVIEW Important Processor Organizations 3 SHARED MEMORY VS DISTRIBUTED MEMORY Classical parallel algorithms were discussed using the shared memory paradigm. In shared memory parallel platform processors

More information

CSIT 691 Independent Project

CSIT 691 Independent Project CSIT 691 Independent Project A comparison of Mean Average Error (MAE) Based Image Search for Hexagonally and Regularly Structured Pixel Data Student: Sijing LIU Email: sijing@ust.hk Supervisor: Prof. David

More information

From Task Graphs to Petri Nets

From Task Graphs to Petri Nets From Task Graphs to Petri Nets Anthony Spiteri Staines Department of Computer Inf. Systems, Faculty of ICT, University of Malta Abstract This paper describes the similarities between task graphs and Petri

More information

Vertex Colorings without Rainbow or Monochromatic Subgraphs. 1 Introduction

Vertex Colorings without Rainbow or Monochromatic Subgraphs. 1 Introduction Vertex Colorings without Rainbow or Monochromatic Subgraphs Wayne Goddard and Honghai Xu Dept of Mathematical Sciences, Clemson University Clemson SC 29634 {goddard,honghax}@clemson.edu Abstract. This

More information

Digital Image Processing Chapter 11: Image Description and Representation

Digital Image Processing Chapter 11: Image Description and Representation Digital Image Processing Chapter 11: Image Description and Representation Image Representation and Description? Objective: To represent and describe information embedded in an image in other forms that

More information

The Pyramid Table. The Four Matrices; Top Level, Matrix; x = 1. Matrix; x = 2

The Pyramid Table. The Four Matrices; Top Level, Matrix; x = 1. Matrix; x = 2 The Janet Periodic Table (first printed 1928) is also known as the Left Step Table. The Janet table orders the elements according to the filling sequence of the atom. This table may be re-organized as

More information

Vertex Magic Total Labelings of Complete Graphs

Vertex Magic Total Labelings of Complete Graphs AKCE J. Graphs. Combin., 6, No. 1 (2009), pp. 143-154 Vertex Magic Total Labelings of Complete Graphs H. K. Krishnappa, Kishore Kothapalli and V. Ch. Venkaiah Center for Security, Theory, and Algorithmic

More information

Deficient Quartic Spline Interpolation

Deficient Quartic Spline Interpolation International Journal of Computational Science and Mathematics. ISSN 0974-3189 Volume 3, Number 2 (2011), pp. 227-236 International Research Publication House http://www.irphouse.com Deficient Quartic

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

CSCE 750, Spring 2001 Notes 3 Page Symmetric Multi Processors (SMPs) (e.g., Cray vector machines, Sun Enterprise with caveats) Many processors

CSCE 750, Spring 2001 Notes 3 Page Symmetric Multi Processors (SMPs) (e.g., Cray vector machines, Sun Enterprise with caveats) Many processors CSCE 750, Spring 2001 Notes 3 Page 1 5 Parallel Algorithms 5.1 Basic Concepts With ordinary computers and serial (=sequential) algorithms, we have one processor and one memory. We count the number of operations

More information

Algorithms for an FPGA Switch Module Routing Problem with. Application to Global Routing. Abstract

Algorithms for an FPGA Switch Module Routing Problem with. Application to Global Routing. Abstract Algorithms for an FPGA Switch Module Routing Problem with Application to Global Routing Shashidhar Thakur y Yao-Wen Chang y D. F. Wong y S. Muthukrishnan z Abstract We consider a switch-module-routing

More information

Distributed Systems - I

Distributed Systems - I CSE 421/521 - Operating Systems Fall 2011 Lecture - XXIII Distributed Systems - I Tevfik Koşar University at Buffalo November 22 nd, 2011 1 Motivation Distributed system is collection of loosely coupled

More information

Minimum congestion spanning trees of grids and discrete toruses

Minimum congestion spanning trees of grids and discrete toruses Minimum congestion spanning trees of grids and discrete toruses A. Castejón Department of Applied Mathematics I Higher Technical School of Telecommunications Engineering (ETSIT) Universidad de Vigo Lagoas-Marcosende

More information

Algorithm Analysis Part 2. Complexity Analysis

Algorithm Analysis Part 2. Complexity Analysis Algorithm Analysis Part 2 Complexity Analysis Introduction Algorithm Analysis measures the efficiency of an algorithm, or its implementation as a program, as the input size becomes large Actually, an estimation

More information

Sorting on Linear Arrays. Xuan Guo

Sorting on Linear Arrays. Xuan Guo Sorting on Linear Arrays Xuan Guo 1 Outline Motivation & Models Sorting algorithms on linear array Sorting by Comparison Exchange Sorting by Merging Paper Reference 2 Motivation Linear array is the simplest

More information

A Quantitative Approach for Textural Image Segmentation with Median Filter

A Quantitative Approach for Textural Image Segmentation with Median Filter International Journal of Advancements in Research & Technology, Volume 2, Issue 4, April-2013 1 179 A Quantitative Approach for Textural Image Segmentation with Median Filter Dr. D. Pugazhenthi 1, Priya

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1 / 19 Outline

More information

reasonable to store in a software implementation, it is likely to be a signicant burden in a low-cost hardware implementation. We describe in this pap

reasonable to store in a software implementation, it is likely to be a signicant burden in a low-cost hardware implementation. We describe in this pap Storage-Ecient Finite Field Basis Conversion Burton S. Kaliski Jr. 1 and Yiqun Lisa Yin 2 RSA Laboratories 1 20 Crosby Drive, Bedford, MA 01730. burt@rsa.com 2 2955 Campus Drive, San Mateo, CA 94402. yiqun@rsa.com

More information

Parallel Program for Sorting NXN Matrix Using PVM (Parallel Virtual Machine)

Parallel Program for Sorting NXN Matrix Using PVM (Parallel Virtual Machine) Parallel Program for Sorting NXN Matrix Using PVM (Parallel Virtual Machine) Ehab AbdulRazak Al-Asadi College of Science Kerbala University, Iraq Abstract The study will focus for analysis the possibilities

More information

Homework #4 Due Friday 10/27/06 at 5pm

Homework #4 Due Friday 10/27/06 at 5pm CSE 160, Fall 2006 University of California, San Diego Homework #4 Due Friday 10/27/06 at 5pm 1. Interconnect. A k-ary d-cube is an interconnection network with k d nodes, and is a generalization of the

More information

Parallel Implementations of Gaussian Elimination

Parallel Implementations of Gaussian Elimination s of Western Michigan University vasilije.perovic@wmich.edu January 27, 2012 CS 6260: in Parallel Linear systems of equations General form of a linear system of equations is given by a 11 x 1 + + a 1n

More information

Exam Design and Analysis of Algorithms for Parallel Computer Systems 9 15 at ÖP3

Exam Design and Analysis of Algorithms for Parallel Computer Systems 9 15 at ÖP3 UMEÅ UNIVERSITET Institutionen för datavetenskap Lars Karlsson, Bo Kågström och Mikael Rännar Design and Analysis of Algorithms for Parallel Computer Systems VT2009 June 2, 2009 Exam Design and Analysis

More information

Unit-5 Dynamic Programming 2016

Unit-5 Dynamic Programming 2016 5 Dynamic programming Overview, Applications - shortest path in graph, matrix multiplication, travelling salesman problem, Fibonacci Series. 20% 12 Origin: Richard Bellman, 1957 Programming referred to

More information

Character Recognition

Character Recognition Character Recognition 5.1 INTRODUCTION Recognition is one of the important steps in image processing. There are different methods such as Histogram method, Hough transformation, Neural computing approaches

More information

The p-sized partitioning algorithm for fast computation of factorials of numbers

The p-sized partitioning algorithm for fast computation of factorials of numbers J Supercomput (2006) 38:73 82 DOI 10.1007/s11227-006-7285-5 The p-sized partitioning algorithm for fast computation of factorials of numbers Ahmet Ugur Henry Thompson C Science + Business Media, LLC 2006

More information

Convergence of C 2 Deficient Quartic Spline Interpolation

Convergence of C 2 Deficient Quartic Spline Interpolation Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 4 (2017) pp. 519-527 Research India Publications http://www.ripublication.com Convergence of C 2 Deficient Quartic Spline

More information

Subdivision Curves and Surfaces

Subdivision Curves and Surfaces Subdivision Surfaces or How to Generate a Smooth Mesh?? Subdivision Curves and Surfaces Subdivision given polyline(2d)/mesh(3d) recursively modify & add vertices to achieve smooth curve/surface Each iteration

More information

Algorithms and Applications

Algorithms and Applications Algorithms and Applications 1 Areas done in textbook: Sorting Algorithms Numerical Algorithms Image Processing Searching and Optimization 2 Chapter 10 Sorting Algorithms - rearranging a list of numbers

More information

Section 3.1 Gaussian Elimination Method (GEM) Key terms

Section 3.1 Gaussian Elimination Method (GEM) Key terms Section 3.1 Gaussian Elimination Method (GEM) Key terms Rectangular systems Consistent system & Inconsistent systems Rank Types of solution sets RREF Upper triangular form & back substitution Nonsingular

More information

F. THOMSON LEIGHTON INTRODUCTION TO PARALLEL ALGORITHMS AND ARCHITECTURES: ARRAYS TREES HYPERCUBES

F. THOMSON LEIGHTON INTRODUCTION TO PARALLEL ALGORITHMS AND ARCHITECTURES: ARRAYS TREES HYPERCUBES F. THOMSON LEIGHTON INTRODUCTION TO PARALLEL ALGORITHMS AND ARCHITECTURES: ARRAYS TREES HYPERCUBES MORGAN KAUFMANN PUBLISHERS SAN MATEO, CALIFORNIA Contents Preface Organization of the Material Teaching

More information

Challenges in Ubiquitous Data Mining

Challenges in Ubiquitous Data Mining LIAAD-INESC Porto, University of Porto, Portugal jgama@fep.up.pt 1 2 Very-short-term Forecasting in Photovoltaic Systems 3 4 Problem Formulation: Network Data Model Querying Model Query = Q( n i=0 S i)

More information

A Document Image Analysis System on Parallel Processors

A Document Image Analysis System on Parallel Processors A Document Image Analysis System on Parallel Processors Shamik Sural, CMC Ltd. 28 Camac Street, Calcutta 700 016, India. P.K.Das, Dept. of CSE. Jadavpur University, Calcutta 700 032, India. Abstract This

More information

HPC Algorithms and Applications

HPC Algorithms and Applications HPC Algorithms and Applications Dwarf #5 Structured Grids Michael Bader Winter 2012/2013 Dwarf #5 Structured Grids, Winter 2012/2013 1 Dwarf #5 Structured Grids 1. dense linear algebra 2. sparse linear

More information

Keywords Computer Graphics, Line Clipping, 2D geometry, 3D geometry.

Keywords Computer Graphics, Line Clipping, 2D geometry, 3D geometry. Volume 6, Issue 11, November 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com RJ - ASHI

More information

Efficient Implementation of the K-SVD Algorithm and the Batch-OMP Method

Efficient Implementation of the K-SVD Algorithm and the Batch-OMP Method Efficient Implementation of the K-SVD Algorithm and the Batch-OMP Method Ron Rubinstein, Michael Zibulevsky and Michael Elad Abstract The K-SVD algorithm is a highly effective method of training overcomplete

More information

Antimagic Labelings of Weighted and Oriented Graphs

Antimagic Labelings of Weighted and Oriented Graphs Antimagic Labelings of Weighted and Oriented Graphs Zhanar Berikkyzy, Axel Brandt, Sogol Jahanbekam, Victor Larsen, Danny Rorabaugh October 7, 014 Abstract A graph G is k weighted list antimagic if for

More information

Communication Networks I December 4, 2001 Agenda Graph theory notation Trees Shortest path algorithms Distributed, asynchronous algorithms Page 1

Communication Networks I December 4, 2001 Agenda Graph theory notation Trees Shortest path algorithms Distributed, asynchronous algorithms Page 1 Communication Networks I December, Agenda Graph theory notation Trees Shortest path algorithms Distributed, asynchronous algorithms Page Communication Networks I December, Notation G = (V,E) denotes a

More information

An Automatic Timestamp Replanting Algorithm for Panorama Video Surveillance *

An Automatic Timestamp Replanting Algorithm for Panorama Video Surveillance * An Automatic Timestamp Replanting Algorithm for Panorama Video Surveillance * Xinguo Yu, Wu Song, Jun Cheng, Bo Qiu, and Bin He National Engineering Research Center for E-Learning, Central China Normal

More information

Digital Computer Arithmetic

Digital Computer Arithmetic Digital Computer Arithmetic Part 6 High-Speed Multiplication Soo-Ik Chae Spring 2010 Koren Chap.6.1 Speeding Up Multiplication Multiplication involves 2 basic operations generation of partial products

More information

Location of Access Points in Wireless Local Area Networking

Location of Access Points in Wireless Local Area Networking Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2003 Proceedings Americas Conference on Information Systems (AMCIS) December 2003 Location of Access Points in Wireless Local Area

More information

P 1234 P 4231 P 3241 P 3421 P 2314 P 4312 P 1423 P 1432 P 4213 P 1243 P 3142 P 2143

P 1234 P 4231 P 3241 P 3421 P 2314 P 4312 P 1423 P 1432 P 4213 P 1243 P 3142 P 2143 Technical Report No. 97-407 EFFICIENT SORTING ON THE STAR GRAPH INTERCONNECTION NETWORK Selim G. Akl and Tanya Wol Department of Computing and Information Science Queen's University Kingston, Ontario Canada

More information

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions Data In single-program multiple-data (SPMD) parallel programs, global data is partitioned, with a portion of the data assigned to each processing node. Issues relevant to choosing a partitioning strategy

More information

Parallel & Concurrent Programming: ZPL. Emery Berger CMPSCI 691W Spring 2006 AMHERST. Department of Computer Science UNIVERSITY OF MASSACHUSETTS

Parallel & Concurrent Programming: ZPL. Emery Berger CMPSCI 691W Spring 2006 AMHERST. Department of Computer Science UNIVERSITY OF MASSACHUSETTS Parallel & Concurrent Programming: ZPL Emery Berger CMPSCI 691W Spring 2006 Department of Computer Science Outline Previously: MPI point-to-point & collective Complicated, far from problem abstraction

More information

Analytical Modeling of Parallel Programs

Analytical Modeling of Parallel Programs 2014 IJEDR Volume 2, Issue 1 ISSN: 2321-9939 Analytical Modeling of Parallel Programs Hardik K. Molia Master of Computer Engineering, Department of Computer Engineering Atmiya Institute of Technology &

More information

P1 REVISION EXERCISE: 1

P1 REVISION EXERCISE: 1 P1 REVISION EXERCISE: 1 1. Solve the simultaneous equations: x + y = x +y = 11. For what values of p does the equation px +4x +(p 3) = 0 have equal roots? 3. Solve the equation 3 x 1 =7. Give your answer

More information

(Refer Slide Time: 01:00)

(Refer Slide Time: 01:00) Advanced Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras Lecture minus 26 Heuristics for TSP In this lecture, we continue our discussion

More information

PARALLEL ANALYSIS OF THE RIJNDAEL BLOCK CIPHER

PARALLEL ANALYSIS OF THE RIJNDAEL BLOCK CIPHER PARALLEL ANALYSIS OF THE RIJNDAEL BLOCK CIPHER Philip Brisk, Adam Kaplan, Majid Sarrafzadeh Computer Science Department, University of California Los Angeles 3532C Boelter Hall, Los Angeles, CA 90095-1596

More information

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Seminar on A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Mohammad Iftakher Uddin & Mohammad Mahfuzur Rahman Matrikel Nr: 9003357 Matrikel Nr : 9003358 Masters of

More information

An Improved Measurement Placement Algorithm for Network Observability

An Improved Measurement Placement Algorithm for Network Observability IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 16, NO. 4, NOVEMBER 2001 819 An Improved Measurement Placement Algorithm for Network Observability Bei Gou and Ali Abur, Senior Member, IEEE Abstract This paper

More information

Lecture 10: Image Descriptors and Representation

Lecture 10: Image Descriptors and Representation I2200: Digital Image processing Lecture 10: Image Descriptors and Representation Prof. YingLi Tian Nov. 15, 2017 Department of Electrical Engineering The City College of New York The City University of

More information

Near Optimal Broadcast with Network Coding in Large Sensor Networks

Near Optimal Broadcast with Network Coding in Large Sensor Networks in Large Sensor Networks Cédric Adjih, Song Yean Cho, Philippe Jacquet INRIA/École Polytechnique - Hipercom Team 1 st Intl. Workshop on Information Theory for Sensor Networks (WITS 07) - Santa Fe - USA

More information

Ms Nurazrin Jupri. Frequency Distributions

Ms Nurazrin Jupri. Frequency Distributions Frequency Distributions Frequency Distributions After collecting data, the first task for a researcher is to organize and simplify the data so that it is possible to get a general overview of the results.

More information

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. Fall Parallel Sorting

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. Fall Parallel Sorting Parallel Systems Course: Chapter VIII Sorting Algorithms Kumar Chapter 9 Jan Lemeire ETRO Dept. Fall 2017 Overview 1. Parallel sort distributed memory 2. Parallel sort shared memory 3. Sorting Networks

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

Create a Swirly Lollipop Using the Spiral Tool Philip Christie on Jun 13th 2012 with 12 Comments

Create a Swirly Lollipop Using the Spiral Tool Philip Christie on Jun 13th 2012 with 12 Comments Advertise Here Create a Swirly Lollipop Using the Spiral Tool Philip Christie on Jun 13th 2012 with 12 Comments Tutorial Details Program: Adobe Illustrator CS5 Difficulty: Beginner Es timated Completion

More information

Solving ONE S interval linear assignment problem

Solving ONE S interval linear assignment problem RESEARCH ARTICLE OPEN ACCESS Solving ONE S interval linear assignment problem Dr.A.Ramesh Kumar 1,S. Deepa 2, 1 Head, Department of Mathematics, Srimad Andavan Arts and Science College (Autonomous), T.V.Kovil,

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

The performance of xed block size fractal coding schemes for this model were investigated by calculating the distortion for each member of an ensemble

The performance of xed block size fractal coding schemes for this model were investigated by calculating the distortion for each member of an ensemble Fractal Coding Performance for First Order Gauss-Markov Models B E Wohlberg and G de Jager Digital Image Processing Laboratory, Electrical Engineering Department, University of Cape Town, Private Bag,

More information

Minimal Steiner Trees for Rectangular Arrays of Lattice Points*

Minimal Steiner Trees for Rectangular Arrays of Lattice Points* journal of combinatorial theory, Series A 79, 181208 (1997) article no. TA962751 Minimal Steiner Trees for Rectangular Arrays of Lattice Points* M. Brazil Department of Electrical Engineering, University

More information

Homework # 2 Due: October 6. Programming Multiprocessors: Parallelism, Communication, and Synchronization

Homework # 2 Due: October 6. Programming Multiprocessors: Parallelism, Communication, and Synchronization ECE669: Parallel Computer Architecture Fall 2 Handout #2 Homework # 2 Due: October 6 Programming Multiprocessors: Parallelism, Communication, and Synchronization 1 Introduction When developing multiprocessor

More information

Parallel Edge Detection Using Uni-Directional Architecture

Parallel Edge Detection Using Uni-Directional Architecture -20 PDPTA '04 International Conference --: Parallel Edge Detection Using Uni-Directional Architecture MultiRing on Spiral Xiangjian He Hamid R. Arabnia Department of Computer Systems University of Technology,

More information

CSC630/CSC730 Parallel & Distributed Computing

CSC630/CSC730 Parallel & Distributed Computing CSC630/CSC730 Parallel & Distributed Computing Analytical Modeling of Parallel Programs Chapter 5 1 Contents Sources of Parallel Overhead Performance Metrics Granularity and Data Mapping Scalability 2

More information

Refine boundary at resolution r. r+1 r. Update context information CI(r) based on CI(r-1) Classify at resolution r, based on CI(r), update CI(r)

Refine boundary at resolution r. r+1 r. Update context information CI(r) based on CI(r-1) Classify at resolution r, based on CI(r), update CI(r) Context Based Multiscale Classication of Images Jia Li Robert M. Gray EE Department EE Department Stanford Univ., CA 94305 Stanford Univ., CA 94305 jiali@isl.stanford.edu rmgray@stanford.edu Abstract This

More information

Rectangular Matrix Multiplication Revisited

Rectangular Matrix Multiplication Revisited JOURNAL OF COMPLEXITY, 13, 42 49 (1997) ARTICLE NO. CM970438 Rectangular Matrix Multiplication Revisited Don Coppersmith IBM Research, T. J. Watson Research Center, Yorktown Heights, New York 10598 Received

More information

Synchronous Computations

Synchronous Computations Synchronous Computations Material based on B. Wilkinson et al., PARALLEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers c 2002-2004 R. Leduc Introduction We

More information

Honeycomb Networks: Topological Properties and Communication Algorithms

Honeycomb Networks: Topological Properties and Communication Algorithms 106 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 8, NO. 10, OCTOBER 1997 Honeycomb Networks: Topological Properties and Communication Algorithms Ivan Stojmenovic Abstract The honeycomb mesh,

More information

EE/CSCI 451 Midterm 1

EE/CSCI 451 Midterm 1 EE/CSCI 451 Midterm 1 Spring 2018 Instructor: Xuehai Qian Friday: 02/26/2018 Problem # Topic Points Score 1 Definitions 20 2 Memory System Performance 10 3 Cache Performance 10 4 Shared Memory Programming

More information