CSL 860: Modern Parallel

Similar documents
CSL 730: Parallel Programming

CSL 730: Parallel Programming. Algorithms

CSL 860: Modern Parallel

Example of usage of Prefix Sum Compacting an Array. Example of usage of Prexix Sum Compacting an Array

Parallel Algorithms for (PRAM) Computers & Some Parallel Algorithms. Reference : Horowitz, Sahni and Rajasekaran, Computer Algorithms

We can use a max-heap to sort data.

Algorithm Efficiency & Sorting. Algorithm efficiency Big-O notation Searching algorithms Sorting algorithms

Complexity and Advanced Algorithms Monsoon Parallel Algorithms Lecture 2

Merge Sort Goodrich, Tamassia Merge Sort 1

Analysis of Algorithms. Unit 4 - Analysis of well known Algorithms

Merge Sort fi fi fi 4 9. Merge Sort Goodrich, Tamassia

Question 7.11 Show how heapsort processes the input:

Problem. Input: An array A = (A[1],..., A[n]) with length n. Output: a permutation A of A, that is sorted: A [i] A [j] for all. 1 i j n.

Searching, Sorting. part 1

Computational Geometry

Algorithms and Applications

Lecture 19 Sorting Goodrich, Tamassia

Sorting and Selection

CS256 Applied Theory of Computation

Sorting Goodrich, Tamassia Sorting 1

Parallel Sorting Algorithms

Lesson 1 4. Prefix Sum Definitions. Scans. Parallel Scans. A Naive Parallel Scans

Week 10. Sorting. 1 Binary heaps. 2 Heapification. 3 Building a heap 4 HEAP-SORT. 5 Priority queues 6 QUICK-SORT. 7 Analysing QUICK-SORT.

CISC 621 Algorithms, Midterm exam March 24, 2016

Comparison Sorts. Chapter 9.4, 12.1, 12.2

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015

COMP Parallel Computing. PRAM (2) PRAM algorithm design techniques

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, Merge Sort & Quick Sort

Week 10. Sorting. 1 Binary heaps. 2 Heapification. 3 Building a heap 4 HEAP-SORT. 5 Priority queues 6 QUICK-SORT. 7 Analysing QUICK-SORT.

Exercise 1 : B-Trees [ =17pts]

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

CS 310 Advanced Data Structures and Algorithms

DIVIDE & CONQUER. Problem of size n. Solution to sub problem 1

CS Divide and Conquer

Lecture 2: Divide and Conquer

Quicksort. Repeat the process recursively for the left- and rightsub-blocks.

Scan and Quicksort. 1 Scan. 1.1 Contraction CSE341T 09/20/2017. Lecture 7

Lecture 3: Sorting 1

would be included in is small: to be exact. Thus with probability1, the same partition n+1 n+1 would be produced regardless of whether p is in the inp

Introduction to Algorithms

CSC Design and Analysis of Algorithms

Divide and Conquer Sorting Algorithms and Noncomparison-based

CSC Design and Analysis of Algorithms. Lecture 6. Divide and Conquer Algorithm Design Technique. Divide-and-Conquer

Parallel and Sequential Data Structures and Algorithms Lecture (Spring 2012) Lecture 16 Treaps; Augmented BSTs

COMP Data Structures

Lecture 4 CS781 February 3, 2011

Divide and Conquer. Algorithm Fall Semester

Sorting Algorithms. + Analysis of the Sorting Algorithms

Introduction to Computers and Programming. Today

CS301 - Data Structures Glossary By

Computational Geometry

Parallel Computing: Parallel Algorithm Design Examples Jin, Hai

7. Sorting I. 7.1 Simple Sorting. Problem. Algorithm: IsSorted(A) 1 i j n. Simple Sorting

Advanced Algorithm Design and Analysis (Lecture 12) SW5 fall 2005 Simonas Šaltenis E1-215b

Scan and its Uses. 1 Scan. 1.1 Contraction CSE341T/CSE549T 09/17/2014. Lecture 8

Chapter 4. Divide-and-Conquer. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

Divide-and-Conquer. Dr. Yingwu Zhu

QuickSort

Algorithms and Data Structures (INF1) Lecture 7/15 Hua Lu

CS 315 Data Structures Spring 2012 Final examination Total Points: 80

CS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017

CS Divide and Conquer

Q1 Q2 Q3 Q4 Q5 Q6 Total

Sorting. Data structures and Algorithms

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

Parallel Sorting. Sathish Vadhiyar

Cpt S 122 Data Structures. Sorting

DIVIDE AND CONQUER ALGORITHMS ANALYSIS WITH RECURRENCE EQUATIONS

Wide operands. CP1: hardware can multiply 64-bit floating-point numbers RAM MUL. core

Algorithm classification

Arrays, Vectors Searching, Sorting

CSC Design and Analysis of Algorithms. Lecture 5. Decrease and Conquer Algorithm Design Technique. Decrease-and-Conquer

Lecture #2. 1 Overview. 2 Worst-Case Analysis vs. Average Case Analysis. 3 Divide-and-Conquer Design Paradigm. 4 Quicksort. 4.

BM267 - Introduction to Data Structures

Sorting. Two types of sort internal - all done in memory external - secondary storage may be used

Next. 1. Covered basics of a simple design technique (Divideand-conquer) 2. Next, more sorting algorithms.

Sorting. Riley Porter. CSE373: Data Structures & Algorithms 1

Quicksort (Weiss chapter 8.6)

CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting. Aaron Bauer Winter 2014

Lecture Notes for Advanced Algorithms

Quick-Sort fi fi fi 7 9. Quick-Sort Goodrich, Tamassia

Algorithms and Data Structures

Scan Primitives for GPU Computing

CS 112 Introduction to Computing II. Wayne Snyder Computer Science Department Boston University

Chapter 7 Sorting. Terminology. Selection Sort

Divide and Conquer CISC4080, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang

Chapter 3:- Divide and Conquer. Compiled By:- Sanjay Patel Assistant Professor, SVBIT.

Programming II (CS300)

Convex Hull Algorithms

Divide and Conquer CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Outline

CS 171: Introduction to Computer Science II. Quicksort

Mergesort again. 1. Split the list into two equal parts

Design and Analysis of Algorithms - - Assessment

Visualizing Data Structures. Dan Petrisko

Some features of modern CPUs. and how they help us

CSE 3101: Introduction to the Design and Analysis of Algorithms. Office hours: Wed 4-6 pm (CSEB 3043), or by appointment.

CSE 373 NOVEMBER 8 TH COMPARISON SORTS

TREES. Trees - Introduction

Divide and Conquer Algorithms: Advanced Sorting. Prichard Ch. 10.2: Advanced Sorting Algorithms

Fast Sorting and Selection. A Lower Bound for Worst Case

Transcription:

CSL 860: Modern Parallel Computation

PARALLEL ALGORITHM TECHNIQUES: BALANCED BINARY TREE

Reduction n operands => log n steps Total work = O(n) How do you map? Balance Binary tree technique

Reduction n operands => log n steps How do you map? n/2 i processors per step 0 0 4 0 2 4 6 0 1 2 3 4 5 6 7

Reduction n operands => log n steps Only have p processors per step Agglomerate and Map 0 Processor dependence: Binomial tree 0 2 0 1 2 3 0 0 1 1 2 2 3 3

Binomial Tree B 0 : single node: Root B k : Root with kbinomial subtrees, B 0... B k-1 B 0 B 1 B 2 B 3

Prefix Sums P[0] = A[0] For i= 1 to n-1 P[i] = P[i-1] + A[i]

Recursive Prefix Sums prefixsums(s, x, 0:n) { parallel for iin 0:n/2 y[i] = Op(x[2*i], x[2*i+1]) prefixsum(z, y, 0:n/2) s[0] = x[0] parallel for iin 1:n if(i&1) s[i] = z[i/2] else s[i] = op(z[i/2-1 ], x[i]) } Or op -1 (z[i/2], x[i]) id op invertible

Prefix Sums P[0] = A[0] For i= 1 to n-1 P[i] = P[i-1] + A[i] S(0:n/2] S[n/2:n] S[n/2:3n/4]

Prefix Sums P[0] = A[0] For i= 1 to n-1 P[i] = P[i-1] + A[i] S(0:n/2] S[n/2:n] S(0:n/2] S(0:3n/4]

Non-recursive Prefix Sums parallel for iin 0:n B[0][i] = A[i] for h in 0:log n parallel for iin 0:n/2 h B[h][i] = B[h-1][2i] op B[h-1][2i+1] for h in log n:0 C[h][0] = B[h][0] parallel for iin 1:n/2 h Odd: C[h][i] = C[h+1][i/2] Even: C[h][i] = C[h+1][(i/2-1] op B[h][i]

Prefix Sums: Data flow up B[3][0] B[2][0] B[2][1] B[1][0] B[1][1] B[1][2] B[1][3] B[0][0] B[0][1] B[0][2] B[0][3] B[0][4] B[0][5] B[0][6] B[0][7]

Prefix Sums: Data flow down C[3][0] = B[3][0] C[2][0] C[2][1] C[1][0] C[1][1] C[1][2] C[1][3] C[0][0] C[0][1] C[0][2] C[0][3] C[0][4] C[0][5] C[0][6] C[0][7]

Processor Mapping P0 P0 P1 P0 P0 P1 P1 P0 P0 P0 P0 P1 P1 P1 P1

Balanced Tree Approach Build binary tree on the input Hierarchically divide into groups and groups of groups.. Traverse tree upwards/downwards Useful to think of tree network topology Only for algorithm design Later map sub-trees to processors

PARALLEL ALGORITHM TECHNIQUES: PARTITIONING

Merge Sorted Sequences (A,B) Determine Rank of each element in A U B Rank(x, A U B) = Rank(x, A) + Rank(x, B) Only need one of them, if A and B are each sorted Find Rank(A, B), and similarly Rank(B, A) Find Rank by binary search O(log n) time O(n log n) work

Optimal Merge (A,B) Partition A and B into log n sized blocks Choose from B, elements i* log n, i= 0:n/log n Rank each chosen element of B in A Binary search Merge pairs of sub-sequences If A i = log(n), Sequential merge in time O(log(n) ) Otherwise, partition A i into log n blocks And Recursively subdivide B i into sub-sub-sequences Total time is O(log(n)) Total work is O(n)

Optimal Merge (A,B) Partition A and B into n blocks Choose from B, elements i( n), i=(0: n] Rank each chosen element of B in A Parallel search using n processors each search Recursively merge pairs of sub-sequences Total time: T(n) = O(1)+T(n/2) = O(log logn) Total work: W(n) = O(n)+T(n/2) = O(n log logn) Fast but still need to reduce work

Optimal Merge (A,B) Use the fast, but non-optimal, algorithm on small enough subsets Subdivide A and B into blocks of size log logn A 1, A 2,.. B 1, B 2,.. Select first element of each block A = p 1, p 2.. B = q 1, q 2.. Now merge loglogn sized blocks n/loglogn times

Optimal Merge (A,B) Merge A and B find Rank(A :B ), Rank(B :A ) using fast non-optimal algorithm Time = O(log logn) Work = O(n) Compute Rank(A :B) and Rank(B :A) If Rank(p i, B) is r i, p i lies in block B r i Search sequentially Time = O(log logn) Work = O(n) Compute ranks of remaining elements Sequentially Time = O(log logn) Work = O(n)

Quick Sort Choose the pivot Select median? Subdivide into two groups Group sizes linearly related with high probability Sort each group independently

QuickSortAlgorithm QuickSort(int A[], int first, int last) { } Select random m in [first:last] // A[m] is pivot parallel for i in [first:last] flag[i] = A[i] < A[m]; Split(A); // Separate flag values 0 and 1, A[m] goes to k // Use Prefix Sum Quicksort A[first:k-1] and A[k+1:last]

Quick Sort Choose the pivot Select median? Subdivide into two groups Group sizes linearly related with high probability Sort each group independently Expected O(log n ) rounds Time per round = O(log n) Total work = O(n log n) with high probability

Partitioning Approach Break into p roughly equal sized problems Solve each sub-problem Preferably, independently of each other Focus on subdividing into independent parts

PARALLEL ALGORITHM TECHNIQUES: DIVIDE AND CONQUER

Merge Sort Partition data into two halves Assign half the processors to each half If only one processor remains, sequentially sort Sort each half Merge results More on this later

Convex Hull

Convex Hull

PARALLEL ALGORITHM TECHNIQUES: ACCELERATED CASCADING

Min-find Input: array with n numbers Algorithm A1 using O(n 2 ) processors: parallel for iin (0:n] M[i]:=0 parallel for i,j in 0:n if i j&& C[i] < C[j] M[j]=1 parallel for iin 0:n if M[i]=0 min = A[i] Not optimal: O(n 2 work)

Optimal Min-find Balanced Binary tree O(log n) time O(n) work =>Optimal Use Accelerated cascading Make the tree branch much faster Number of children of node u = n u Where n u is the number of leaves in u ssubtree Works if the operation at each node can be performed in O(1)

From n 2 processors to n n A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 Step 1: Partition into disjoint blocks of size n Step 2: Apply A1 to each block Step 3: Apply A1 to the results from the step 2 n n n

From n nprocessors to n 1+1/4 A2 A2 A2 A2 A2 A2 A2 A2 A2 A2 A2 Step 1: Partition into disjoint blocks of size n Step 2: Apply A2 to each block Step 3: Apply A2 to the results from the step 2 n 1/2 n 3/4 n 3/4

n 2 -> n 1+1/2 -> n 1+1/4 -> n 1+1/8 -> n 1+1/16 -> -> n 1+1/k ~ n 1? Algorithm A k takes O(1) time with n 1+ ε k processors Algorithm A k+1 1. Partition input array C (size n) into disjoint blocks of size n 1/2 each 2. Solve for each block in parallel using algorithm A k 3. Re-apply A k to the results of step 3: n/ n 1/2 minima Doubly logarithmic-depth tree n log logn work, log logn time

Min-Find Review Constant-time algorithm O(n 2 ) work O(log n) Balanced Tree Approach O(n) work Optimal O(loglog n) Doubly-log depth tree Approach O(n loglogn) work Degree is high at the root, reduces going down #Children of node u = (#nodes in tree rooted at u) Depth = O(log logn)

Accelerated Cascading Solve recursively Start bottom-up with the optimal algorithm until the problem sizes is smaller Switch to fast (non-optimal algorithm) A few small problems solved fast but non-workoptimally Min Find: Optimal algorithm for lower loglog n levels Then switch to O(n loglog n)-work algorithm n work, log logn time