Parallel Models RAM. Parallel RAM aka PRAM. Variants of CRCW PRAM. Advanced Algorithms

Similar documents
Divide and Conquer 1

Parallel Algorithms for (PRAM) Computers & Some Parallel Algorithms. Reference : Horowitz, Sahni and Rajasekaran, Computer Algorithms

5.4 Closest Pair of Points

Closest Pair of Points. Cormen et.al 33.4

Ch5. Divide-and-Conquer

Plan for Today. Finish recurrences. Inversion Counting. Closest Pair of Points

Complexity and Advanced Algorithms Monsoon Parallel Algorithms Lecture 2

CS256 Applied Theory of Computation

EE/CSCI 451 Spring 2018 Homework 8 Total Points: [10 points] Explain the following terms: EREW PRAM CRCW PRAM. Brent s Theorem.

PRAM Divide and Conquer Algorithms

CSE 421 Closest Pair of Points, Master Theorem, Integer Multiplication

Basic Communication Ops

The PRAM (Parallel Random Access Memory) model. All processors operate synchronously under the control of a common CPU.

Chapter 5. Divide and Conquer. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Algorithm Design and Analysis

: Parallel Algorithms Exercises, Batch 1. Exercise Day, Tuesday 18.11, 10:00. Hand-in before or at Exercise Day

Divide-and-Conquer Algorithms

Closest Pair of Points in the Plane. Closest pair of points. Closest Pair of Points. Closest Pair of Points

Lecture 4 CS781 February 3, 2011

Hypercubes. (Chapter Nine)

Divide-and-Conquer. Combine solutions to sub-problems into overall solution. Break up problem of size n into two equal parts of size!n.

Chapter 2 Parallel Computer Models & Classification Thoai Nam

The PRAM model. A. V. Gerbessiotis CIS 485/Spring 1999 Handout 2 Week 2

PRAM (Parallel Random Access Machine)

CSE 202 Divide-and-conquer algorithms. Fan Chung Graham UC San Diego

Chapter 1 Divide and Conquer Algorithm Theory WS 2013/14 Fabian Kuhn

A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 3 Parallel Prefix, Pack, and Sorting

Chapter 1 Divide and Conquer Algorithm Theory WS 2015/16 Fabian Kuhn

1. (a) O(log n) algorithm for finding the logical AND of n bits with n processors

5. DIVIDE AND CONQUER I

CSE 421 Algorithms: Divide and Conquer

Chapter 2 Abstract Machine Models. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Optimal Parallel Randomized Renaming

IE 495 Lecture 3. Septermber 5, 2000

Chapter 1 Divide and Conquer Algorithm Theory WS 2014/15 Fabian Kuhn

each processor can in one step do a RAM op or read/write to one global memory location

COMP Parallel Computing. PRAM (4) PRAM models and complexity

Introduction to Computers and Programming. Today

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. November Parallel Sorting

Parallel Sorting Algorithms

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. Fall Parallel Sorting

COMP Parallel Computing. PRAM (2) PRAM algorithm design techniques

Parallel Models. Hypercube Butterfly Fully Connected Other Networks Shared Memory v.s. Distributed Memory SIMD v.s. MIMD

Advanced Algorithm Homework 4 Results and Solutions

We can use a max-heap to sort data.

Lecture 8 Parallel Algorithms II

Sorting Algorithms. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar

Linear Arrays. Chapter 7

CSE 202 Divide-and-conquer algorithms. Fan Chung Graham UC San Diego

The PRAM Model. Alexandre David

Chapter 1 Divide and Conquer Algorithm Theory WS 2013/14 Fabian Kuhn

Real parallel computers

CSL 730: Parallel Programming

CS1 Lecture 30 Apr. 2, 2018

Paradigms for Parallel Algorithms

Parallel Random Access Machine (PRAM)

CS 598: Communication Cost Analysis of Algorithms Lecture 15: Communication-optimal sorting and tree-based algorithms

Sorting and Searching

Fundamental Algorithms

Algorithms & Data Structures 2

Merge Sort Goodrich, Tamassia Merge Sort 1

More PRAM Algorithms. Techniques Covered

Divide and Conquer Algorithms

Lecture 19 Sorting Goodrich, Tamassia

1 (15 points) LexicoSort

CSC 447: Parallel Programming for Multi- Core and Cluster Systems

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR Stamp / Signature of the Invigilator

Sorting and Searching

CSE 326: Data Structures Sorting Conclusion

CSE 421 Greedy Alg: Union Find/Dijkstra s Alg

Lecture 18. Today, we will discuss developing algorithms for a basic model for parallel computing the Parallel Random Access Machine (PRAM) model.

Lecture 3. Recurrences / Heapsort

Parallel Algorithms. COMP 215 Lecture 22

l Heaps very popular abstract data structure, where each object has a key value (the priority), and the operations are:

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015

CSCE 750, Spring 2001 Notes 3 Page Symmetric Multi Processors (SMPs) (e.g., Cray vector machines, Sun Enterprise with caveats) Many processors

Sorting. Data structures and Algorithms

CSL 730: Parallel Programming. Algorithms

Fundamental Algorithms

A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 3 Parallel Prefix, Pack, and Sorting

CSE 638: Advanced Algorithms. Lectures 18 & 19 ( Cache-efficient Searching and Sorting )

Scan and its Uses. 1 Scan. 1.1 Contraction CSE341T/CSE549T 09/17/2014. Lecture 8

Problem. Input: An array A = (A[1],..., A[n]) with length n. Output: a permutation A of A, that is sorted: A [i] A [j] for all. 1 i j n.

Lecture 9 March 15, 2012

Data Structures and Algorithms

Lecture 12 March 16, 2010

Parallel Random-Access Machines

Data Structures and Algorithms CMPSC 465

Searching a Sorted Set of Strings

Sorting. Divide-and-Conquer 1

LECTURE NOTES OF ALGORITHMS: DESIGN TECHNIQUES AND ANALYSIS

Design and Analysis of Parallel Programs. Parallel Cost Models. Outline. Foster s Method for Design of Parallel Programs ( PCAM )

15-750: Parallel Algorithms

9/10/2018 Algorithms & Data Structures Analysis of Algorithms. Siyuan Jiang, Sept

CSE 530A. B+ Trees. Washington University Fall 2013

Computational Geometry in the Parallel External Memory Model

CPE702 Algorithm Analysis and Design Week 7 Algorithm Design Patterns

CS125 : Introduction to Computer Science. Lecture Notes #40 Advanced Sorting Analysis. c 2005, 2004 Jason Zych

10 Introduction to Distributed Computing

Lecture 3: Sorting 1

Transcription:

Parallel Models Advanced Algorithms Piyush Kumar (Lecture 10: Parallel Algorithms) An abstract description of a real world parallel machine. Attempts to capture essential features (and suppress details?) What other models have we seen so far? Courtesy Baker 05. RAM? External Memory Model? RAM Random Access Machine Model Memory is a sequence of bits/words. Each memory access takes O(1) time. Basic operations take O(1) time: Add/Mul/Xor/Sub/AND/not Instructions can not be modified. No consideration of memory hierarchies. Has been very successful in modelling real world machines. Parallel RAM aka PRAM Generalization of RAM P processors with their own programs (and unique id) MIMD processors : At each point in time the processors might be executing different instructions on different data. Shared Memory Instructions are synchronized among the processors PRAM Shared Memory EREW/ERCW/CREW/CRCW Variants of CRCW Common CRCW: CW iff processors write same value. Arbitrary CRCW Priority CRCW Combining CRCW EREW: A program isnt allowed to access the same memory location at the same time. 1

Why PRAM? Lot of literature available on algorithms for PRAM. One of the most clean models. Focuses on what communication is needed ( and ignores the cost/means to do it) PRAM Algorithm design. Problem 1: Produce the sum of an array of n numbers. RAM =? PRAM =? Problem 2: Prefix Computation Let X = {s 0, s 1,, s n-1 } be in a set S Let be a binary, associative, closed operator with respect to S (usually Q(1) time MIN, MAX, AND, +,...) The result of s 0 s 1 s k is called the k-th prefix Prefix computation Suffix computation is a similar problem. Assumes Binary op takes O(1) In RAM =? Computing all such n prefixes is the parallel prefix computation 1 st prefix 2 nd prefix 3 rd prefix... (n-1)th prefix s 0 s 0 s 1 s 0 s 1 s 2... s 0 s 1... s n-1 Prefix Computation (Akl) 2

EREW PRAM Prefix computation Assume PRAM has n processors and n is a power of 2. Input: s i for i = 0,1,..., n-1. Algorithm Steps: for j = 0 to (lg n) -1, do for i = 2 j to n-1 do h = i - 2 j s i = s h s i endfor endfor Total time in EREW PRAM? Problem 3: Array packing Assume that we have an array of n elements, X = {x 1, x 2,..., x n } Some array elements are marked (or distinguished). The requirements of this problem are to pack the marked elements in the front part of the array. place the remaining elements in the back of the array. While not a requirement, it is also desirable to maintain the original order between the marked elements maintain the original order between the unmarked elements In RAM? How would you do this? Inplace? Running time? Any ideas on how to do this in PRAM? EREW PRAM Algorithm 1. Set s i in P i to 1 if x i is marked and set s i = 0 otherwise. 2. Perform a prefix sum on S =(s 1, s 2,..., s n ) to obtain destination d i = s i for each marked x i. 3. All PEs set m = s n, the total nr of marked elements. 4. P i sets s i to 0 if x i is marked and otherwise sets s i = 1. 5. Perform a prefix sum on S and set d i = s i + m for each unmarked x i. 6. Each P i copies array element x i into address d i in X. Array Packing Assume n processors are used above. Optimal prefix sums requires O(lg n) time. The EREW broadcast of s n needed in Step 3 takes O(lg n) time using a binary tree in memory All and other steps require constant time. Runs in O(lg n) time and is cost optimal. Maintains original order in unmarked group as well Notes: Algorithm illustrates usefulness of Prefix Sums There many applications for Array Packing algorithm Problem 4: PRAM MergeSort RAM Merge Sort Recursion? PRAM Merge Sort recursion? Can we speed up the merging? Merging n elements with n processors can be done in O(log n) time. Assume all elements are distinct Rank(a, A) = number of elements in A smaller than a. For example rank(8, {1,3,5,7,9}) = 4 3

PRAM Merging A = 2,3,10,15,16 B = 1,8,12,14,19 Rank(2)=1 Rank(3)=1 Rank(10)=2 Rank(15)=4 Rank(16)=4 +1 +2 +3 +4 +5 Rank(1)=0 +1 Rank(8)=2 +2 Rank(12)=3 +3 Rank(14)=3 +4 Rank(19)=5 +5 PRAM Merge Sort T(n) = T(n/2) + O(log n) Using the idea of pipelined d&c PRAM Mergesort can be done in O(log n). D&C is one of the most powerful techniques to solve problems in parallel. 1 2 3 8 10 12 14 15 16 19 Problem 5: Closest Pair RAM Version? Closest Pair: RAM Version Closest-Pair(p 1,, p n ) { Compute separation line L such that half the points are on one side and half on the other side. 1 = Closest-Pair(left half) 2 = Closest-Pair(right half) = min( 1, 2 ) O(n log n) 2T(n / 2) L Delete all points further than from separation line L O(n) 12 7 6 4 3 5 21 = min(12, 21) } Sort remaining points by y-coordinate. Scan points in y-order and compare distance between each point and next 11 neighbors. If any of these distances is less than, update. return. O(n log n) O(n) 2 1 Closest Pair: PRAM Version? Closest-Pair(p 1,, p n) { Compute separation line L such that half the points are on one side and half on the other side. Use sorted lists } 1 = Closest-Pair(left half) 2 = Closest-Pair(right half) = min( 1, 2) Delete all points further than from separation line L Use presorting and Sort remaining points by y-coordinate. prefix computation. Scan points in y-order and compare distance between each point and next 11 neighbors. Find min of all these distances, update. return. In parallel Again use prefix computation. O(1) T(n / 2) O(log n) O(1) O(log n) Other Interesting Algorithms Recurrence : T(n) = T(n/2) + O(log n) 4

A List Approximation Algorithms Online Algorithms Learning Algorithms Network Algorithms Advanced Data Structures. Flow Algorithms. Algorithmic Game Theory Quantum Algorithms. Geometric Algorithms Interesting Classes at FSU In case you liked this class: Parallel Algorithms Computational Geometry Advanced Algorithms Next Class Practice Problem Solving for Finals. Extra Office Hours : Wednesday, I will be in office and accessible anytime for questions. 5