Parallel Algorithms for (PRAM) Computers & Some Parallel Algorithms. Reference : Horowitz, Sahni and Rajasekaran, Computer Algorithms

Similar documents
Complexity and Advanced Algorithms Monsoon Parallel Algorithms Lecture 2

CS256 Applied Theory of Computation

We can use a max-heap to sort data.

EE/CSCI 451 Spring 2018 Homework 8 Total Points: [10 points] Explain the following terms: EREW PRAM CRCW PRAM. Brent s Theorem.

PRAM Divide and Conquer Algorithms

Divide-and-Conquer Algorithms

Parallel Random Access Machine (PRAM)

: Parallel Algorithms Exercises, Batch 1. Exercise Day, Tuesday 18.11, 10:00. Hand-in before or at Exercise Day

Parallel Models RAM. Parallel RAM aka PRAM. Variants of CRCW PRAM. Advanced Algorithms

Paradigms for Parallel Algorithms

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

1. (a) O(log n) algorithm for finding the logical AND of n bits with n processors

Problem. Input: An array A = (A[1],..., A[n]) with length n. Output: a permutation A of A, that is sorted: A [i] A [j] for all. 1 i j n.

Merge Sort fi fi fi 4 9. Merge Sort Goodrich, Tamassia

Real parallel computers

Chapter 1 Divide and Conquer Algorithm Theory WS 2014/15 Fabian Kuhn

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

An NC Algorithm for Sorting Real Numbers

Chapter 1 Divide and Conquer Algorithm Theory WS 2013/14 Fabian Kuhn

Chapter 1 Divide and Conquer Algorithm Theory WS 2013/14 Fabian Kuhn

CSL 730: Parallel Programming

Chapter 1 Divide and Conquer Algorithm Theory WS 2015/16 Fabian Kuhn

CS473 - Algorithms I

Question 7.11 Show how heapsort processes the input:

The PRAM Model. Alexandre David

CSL 860: Modern Parallel

Lecture 18. Today, we will discuss developing algorithms for a basic model for parallel computing the Parallel Random Access Machine (PRAM) model.

CSE Introduction to Parallel Processing. Chapter 5. PRAM and Basic Algorithms

Parallel Models. Hypercube Butterfly Fully Connected Other Networks Shared Memory v.s. Distributed Memory SIMD v.s. MIMD

II (Sorting and) Order Statistics

Merge Sort Goodrich, Tamassia Merge Sort 1

Data Structures and Algorithms CMPSC 465

7. Sorting I. 7.1 Simple Sorting. Problem. Algorithm: IsSorted(A) 1 i j n. Simple Sorting

Divide-and-Conquer. Dr. Yingwu Zhu

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, Merge Sort & Quick Sort

Sorting Pearson Education, Inc. All rights reserved.

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015

Parallel Random-Access Machines

The PRAM (Parallel Random Access Memory) model. All processors operate synchronously under the control of a common CPU.

Chapter 3:- Divide and Conquer. Compiled By:- Sanjay Patel Assistant Professor, SVBIT.

Sorting (Chapter 9) Alexandre David B2-206

CS302 Topic: Algorithm Analysis #2. Thursday, Sept. 21, 2006

Parallel Sorting Algorithms

Quick-Sort fi fi fi 7 9. Quick-Sort Goodrich, Tamassia

Basic Communication Ops

Parallel algorithms at ENS Lyon

Sorting (Chapter 9) Alexandre David B2-206

CS 598: Communication Cost Analysis of Algorithms Lecture 15: Communication-optimal sorting and tree-based algorithms

Lecture 3: Sorting 1

Analyzing Complexity of Lists

The PRAM model. A. V. Gerbessiotis CIS 485/Spring 1999 Handout 2 Week 2

1 Format. 2 Topics Covered. 2.1 Minimal Spanning Trees. 2.2 Union Find. 2.3 Greedy. CS 124 Quiz 2 Review 3/25/18

Fundamental Algorithms

Introduction to Computers and Programming. Today

Algorithms for Data Science

each processor can in one step do a RAM op or read/write to one global memory location

Lecture 19 Sorting Goodrich, Tamassia

Approximation Algorithms

Quicksort (Weiss chapter 8.6)

Divide-and-Conquer. The most-well known algorithm design strategy: smaller instances. combining these solutions

COMP Parallel Computing. PRAM (2) PRAM algorithm design techniques

Mergesort again. 1. Split the list into two equal parts

Divide and Conquer Algorithms

Divide-and-Conquer. Divide-and conquer is a general algorithm design paradigm:

IS 709/809: Computational Methods in IS Research. Algorithm Analysis (Sorting)

Fundamental Algorithms

CPE702 Algorithm Analysis and Design Week 7 Algorithm Design Patterns

IE 495 Lecture 3. Septermber 5, 2000

Analysis of Algorithms. Unit 4 - Analysis of well known Algorithms

Comparison Sorts. Chapter 9.4, 12.1, 12.2

Sorting and Selection

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

CSL 730: Parallel Programming. Algorithms

Network Algorithms. Distributed Sorting

CSE Winter 2015 Quiz 2 Solutions

Divide and Conquer. Design and Analysis of Algorithms Andrei Bulatov

Sorting. Data structures and Algorithms

Test 1 Review Questions with Solutions

would be included in is small: to be exact. Thus with probability1, the same partition n+1 n+1 would be produced regardless of whether p is in the inp

Chapter 4. Divide-and-Conquer. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

Randomized Algorithms

Quick-Sort. Quick-Sort 1

Problem Strategies. 320 Greedy Strategies 6

More PRAM Algorithms. Techniques Covered

Quick Sort. CSE Data Structures May 15, 2002

Optimal Parallel Randomized Renaming

Lower Bound on Comparison-based Sorting

Algorithms with numbers (1) CISC4080, Computer Algorithms CIS, Fordham Univ.! Instructor: X. Zhang Spring 2017

CSE 421 Closest Pair of Points, Master Theorem, Integer Multiplication

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. November Parallel Sorting

Algorithms & Data Structures 2

Sorting. Divide-and-Conquer 1

Design and Analysis of Algorithms - - Assessment

Divide and Conquer. Divide and Conquer

Unit-2 Divide and conquer 2016

Algorithms and Applications

The Limits of Sorting Divide-and-Conquer Comparison Sorts II

Sorting. Outline. Sorting with a priority queue Selection-sort Insertion-sort Heap Sort Quick Sort

CS 6463: AT Computational Geometry Spring 2006 Convex Hulls Carola Wenk

Solving NP-hard Problems on Special Instances

Transcription:

Parallel Algorithms for (PRAM) Computers & Some Parallel Algorithms Reference : Horowitz, Sahni and Rajasekaran, Computer Algorithms Part 2 1 3 Maximum Selection Problem : Given n numbers, x 1, x 2,, x n Find the largest number Algorithm : O(1) CRCW algorithm; use n 2 CPUs; assume all numbers are distinct Step 1: For each CPU i,j (for each 1 <= i,j <= n) in parallel : M i,j = 1 if x i < x j ; otherwise, 0 Step 2: For each row, use n CPUs to compute OR of n elements Step 3: (cont from step 2) If i th row is 0, return x i Part 2 2 1

Example : input <3 1 4 5 2 > After Step 1 : Matrix M INDEX 1 2 3 4 5 1 0 0 1 1 0 2 1 0 1 1 1 3 0 0 0 1 0 4 0 0 0 0 0 5 1 0 1 1 0 After Step 2 : Row 4 return 0 After Step 3 : return maximum value 5 Part 2 3 Analysis Total running time : O(1) Total work : n 2 * O(1) = O(n 2 ) Sequential Algorithm : O(n) It is not work optimal! Part 2 4 2

Recursive Algorithm : O(log log n) CRCW algorithm; use n CPUs; Assume n is always a perfect square, ie k 2 = n (or k = n 1/2 ) If this is not true, take the smallest k such that k 2 n Step 1: If n = 1, return x 1 Step 2: Step 3: Partition n elements & n processors into k groups, say, G 1, G 2,, G k (assume k 2 = n) In parallel, call the algorithm recursively to find maximum element m i of each group G i Use previous algorithm with n CPUs to find the maximum of m 1, m 2,, m k Part 2 5 Analysis This algorithm uses divide and conquer strategy In step 2, each sub problem has size k or n 1/2 (n 1/2 processors & n 1/2 elements) In step 3, the running time is O(1) 2 q WLOG, we assume that n = 2 and T(2) = O(1) The total running time T(n) satisfy the recurrence T(n) = T(n 1/2 ) + O(1) T(2) = O(1) which solves to T(n) = (log log n) Part 2 6 3

T(n) = T(n 1/2 ) + O(1) = (T(n 1/4 ) + O(1)) + O(1) = ((T(n 1/8 ) + O(1)) + O(1) ) + O(1) = = T(n 1 / 2**i ) + i O(1) (after i steps) = 2 q Note: n = 2 = T(n 1 / 2**q ) + q O(1) q = T(2) + q O(1) 2 = log n = O(1) + q O(1) q = log log n = O(q) = O(log log n) Total work : n * O(log log n) = O(n log log n) It is still not work optimal! Part 2 7 4 Merging Problem : Given 2 sorted sequences X 1 = k 1, k 2,, k m and X 2 = k m+1, k m+2,, k 2m Assume each sequence has m distinct elements, and m is an integral power of 2 The goal is to produce a sorted sequence of 2m elements Best sequential algorithm is O(m) Part 2 8 4

For each k j X 1, we know that it is the rank #j element in X 1 We allocate a single processor to perform a binary search on X 2 and figure out q (the number of elements in X 2 that are less than k j ) Then we know that k j is the rank #(j+q) element in X 1 X 2 For each element in X 2, a similar procedure can be used to compute its rank in X 1 X 2 We can use 2m processors, one for each element An overall rank can be found for each element using binary search in O(log m) time Merging can be done in O(log m) time It is not work optimal! Theorem: Merging of two sorted sequences each of length m can be completed in O(log m) time using m CREW PRAM processors processors Part 2 9 Recursive EREW Algorithm : Odd-Even Merge using 2m processors Algorithm A Step 1: If m=1, merge two sequence with 1 comparison Step 2: Partition X 1 into their odd and even parts, ie X 1 odd =k 1, k 3, k 5, k m-1 and X 1 even = k 2, k 4,, k m Similarly, partition X 2 into X 2 odd and X 2 even Part 2 10 5

Step 3: Recursively merge X odd 1, X odd 2 (and X even 1, X even 2 ) using m processors Let L 1 = u 1, u 2,, u m (L 2 = u m+1, u m+2,, u 2m) be the result Step 4: Form a sequence L = u 1, u m+1, u 2, u m+2, u 3, u m+3,, u m, u 2m Compare every pair (u m+i, u 1+i ), ie (u m+1, u 2 ), (u m+2, u 3 ), Interchange elements if they are out of order Output the resultant sequence Part 2 11 Example : m = 4 X 1 = ( 2, 5, 8, 11 ) X 2 = ( 4, 9, 12, 18 ) Odd = ( 2, 8 ) even = ( 5, 11) odd = ( 4, 12 ) even = (9, 18) (2, 8) (4, 12) (5, 11) (9, 18) (2) (8) (4) (12) (5) (11) (9) (18) (2) (4) (8) (12) (5) (9) (11) (18) (2, 4) (8, 12) (5, 9) (11, 18) (2, 8, 4, 12) (5, 11, 9, 18) (2, 4, 8, 12) (5, 9, 11, 18) (2, 5, 4, 9, 8, 11, 12, 18) (2, 4, 5, 8, 9, 11, 12, 18) Part 2 12 6

Theorem : The previous algorithm A correctly merge two sorted sequences of arbitrary numbers (Proof: Assignment #1) Analysis This algorithm A uses divide and conquer strategy Step 1, O(1) Step 2, Partition can be done by 2m CPUs at the same time in O(1) Part 2 13 Step 3, There are two sub-problems Using m CPUs in parallel to solve each sub-problem A subproblem has 2 sorted lists and a list is with m/2 elements Step 4, Using m CPUs in parallel in O(1) The total running time is T(m) = T(m/2) + O(1) T(m) = O(log m) Note : T(m) means running time to merge 2 sorted lists, each with m elements Total work : 2m * O(log m) = O(m log m) It is not work optimal! Part 2 14 7

A work optimal CREW merging algorithm Goal : use O(m/log m) CPUs to obtain O(log m) algorithm Algorithm B Step 1: Partition X 1 in (m/log m) parts, A 1, A 2,, A z, where z = (m/log m) Note : each A i has (log m) elements Part 2 15 Step 2: Let u i be the largest element in A i, ie last element of A i Assign a cpu to each u i Use binary search, O(log m), to search the correct position of u i in X 2 This divides X 2 into z parts B 1, B 2,, B z X 1 A 1 A 2 A 3 A z X 2 B 1 B 2 B 3 B z Now : we only need to merge A i with B i for 1 <= i <= z Part 2 16 8

Step 3 : If B i = O(log m), then A i and B i can be merged in O(log m); Otherwise, partition B i in ( B i /(log m)) parts Now, use similar strategy as Step 2, ie assign a cpu to each sub-part of B i, use largest key to find correct position in A i, ie O(log log m) There are at most 2z parts in B 1, B 2,, B z, (think about WHY?) and each part has at most log m elements We need at most 2z CPUs, each pair needs O(log m) to merge Part 2 17 The total running time of Algorithm B is O(log m) Total work : 2(m/log m) * O(log m) = O(m) It is work optimal! Part 2 18 9

5 Sorting Odd Even Merge Sort : Algorithm C Step 1 : Step 2 : Step 3: Step 4: If n <= 1 return X Use n CPUs to partition input X of n elements into 2 lists, X 1 and X 2 Each with n/2 elements Use n/2 CPUs to sort X 1 recursively and n/2 CPUs to sort X 2 recursively Let X * 1 and X * 2 be the result sorted lists Use Odd-Even merge to merge two sorted lists using n CPUs Part 2 19 Analysis of Algorithm C : EREW algorithm using n CPUs T(n) = O(1) + T(n/2) + O(log n) = log(n) + log(n/2) + log(n/4) + + log(n/2 i ) ; i = log n = log(n) + [log(n) log 2] + [log(n) log 4] + + [log(n) log(n)] <= log(n) * log(n) T(n) = O(log 2 n) Total work O(n log 2 n) It is not work optimal Part 2 20 10