Sorting (Chapter 9) Alexandre David B2-206

Similar documents
Sorting (Chapter 9) Alexandre David B2-206

Lecture 3: Sorting 1

Lecture 8 Parallel Algorithms II

Sorting Algorithms. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. Fall Parallel Sorting

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. November Parallel Sorting

CSC 447: Parallel Programming for Multi- Core and Cluster Systems

CSC630/CSC730 Parallel & Distributed Computing

Introduction to Parallel Computing

Algorithms and Applications

Sorting Algorithms. Slides used during lecture of 8/11/2013 (D. Roose) Adapted from slides by

CS475 Parallel Programming

Parallel Sorting Algorithms

Sorting and Selection

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Parallel Sorting Algorithms

John Mellor-Crummey Department of Computer Science Rice University

CSL 860: Modern Parallel

DIVIDE AND CONQUER ALGORITHMS ANALYSIS WITH RECURRENCE EQUATIONS

Information Coding / Computer Graphics, ISY, LiTH

Sorting Algorithms. - rearranging a list of numbers into increasing (or decreasing) order. Potential Speedup

Module 3: Sorting and Randomized Algorithms

Module 3: Sorting and Randomized Algorithms. Selection vs. Sorting. Crucial Subroutines. CS Data Structures and Data Management

CS61BL. Lecture 5: Graphs Sorting

Sorting Shabsi Walfish NYU - Fundamental Algorithms Summer 2006

Module 3: Sorting and Randomized Algorithms

Examination Questions Midterm 2

How many leaves on the decision tree? There are n! leaves, because every permutation appears at least once.

Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION

Hiroki Yasuga, Elisabeth Kolp, Andreas Lang. 25th September 2014, Scientific Programming

Quick-Sort fi fi fi 7 9. Quick-Sort Goodrich, Tamassia

Problem. Input: An array A = (A[1],..., A[n]) with length n. Output: a permutation A of A, that is sorted: A [i] A [j] for all. 1 i j n.

Parallel Sorting. Sathish Vadhiyar

Sorting. Popular algorithms: Many algorithms for sorting in parallel also exist.

SORTING AND SELECTION

The PRAM Model. Alexandre David

Chapter 4: Sorting. Spring 2014 Sorting Fun 1

CS 561, Lecture 1. Jared Saia University of New Mexico

Data Structures and Algorithms

Multiple-choice (35 pt.)

COMP Data Structures

II (Sorting and) Order Statistics

Lecture 5: Sorting Part A

Sorting Algorithms. For special input, O(n) sorting is possible. Between O(n 2 ) and O(nlogn) E.g., input integer between O(n) and O(n)

Question 7.11 Show how heapsort processes the input:

Analytical Modeling of Parallel Programs

Lower bound for comparison-based sorting

O(n): printing a list of n items to the screen, looking at each item once.

CSE 373 NOVEMBER 8 TH COMPARISON SORTS

Sorting Goodrich, Tamassia Sorting 1

Sorting (I) Hwansoo Han

Student Number: CSE191 Midterm II Spring Plagiarism will earn you an F in the course and a recommendation of expulsion from the university.

7. Sorting I. 7.1 Simple Sorting. Problem. Algorithm: IsSorted(A) 1 i j n. Simple Sorting

CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting. Aaron Bauer Winter 2014

SORTING, SETS, AND SELECTION

Quick-Sort. Quick-Sort 1

Comparison Sorts. Chapter 9.4, 12.1, 12.2

Parallel Algorithms for (PRAM) Computers & Some Parallel Algorithms. Reference : Horowitz, Sahni and Rajasekaran, Computer Algorithms

EECS 2011M: Fundamentals of Data Structures

Introduction to Algorithms and Data Structures. Lecture 12: Sorting (3) Quick sort, complexity of sort algorithms, and counting sort

Introduction to Algorithms

CPSC 311 Lecture Notes. Sorting and Order Statistics (Chapters 6-9)

CS 598: Communication Cost Analysis of Algorithms Lecture 15: Communication-optimal sorting and tree-based algorithms

Parallel quicksort algorithms with isoefficiency analysis. Parallel quicksort algorithmswith isoefficiency analysis p. 1

CS240 Fall Mike Lam, Professor. Quick Sort

Lower Bound on Comparison-based Sorting

Data Structures and Algorithms Chapter 4

The Limits of Sorting Divide-and-Conquer Comparison Sorts II

SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 6. Sorting Algorithms

Sorting. Riley Porter. CSE373: Data Structures & Algorithms 1

Computer Science & Engineering 423/823 Design and Analysis of Algorithms

Sorting. Bubble Sort. Pseudo Code for Bubble Sorting: Sorting is ordering a list of elements.

Algorithms and Data Structures for Mathematicians

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015

Data Structures and Algorithms Week 4

Design and Analysis of Algorithms

Algorithms and Data Structures. Marcin Sydow. Introduction. QuickSort. Sorting 2. Partition. Limit. CountSort. RadixSort. Summary

We can use a max-heap to sort data.

With regards to bitonic sorts, everything boils down to doing bitonic sorts efficiently.

Lesson 1 4. Prefix Sum Definitions. Scans. Parallel Scans. A Naive Parallel Scans

Parallel Random-Access Machines

Dense Matrix Algorithms

Quicksort. Repeat the process recursively for the left- and rightsub-blocks.

CSE 3101: Introduction to the Design and Analysis of Algorithms. Office hours: Wed 4-6 pm (CSEB 3043), or by appointment.

Searching, Sorting. part 1

PRAM Divide and Conquer Algorithms

How fast can we sort? Sorting. Decision-tree example. Decision-tree example. CS 3343 Fall by Charles E. Leiserson; small changes by Carola

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

Chapter 5. Quicksort. Copyright Oliver Serang, 2018 University of Montana Department of Computer Science

Algorithms, Spring 2014, CSE, OSU Lecture 2: Sorting

Week 10. Sorting. 1 Binary heaps. 2 Heapification. 3 Building a heap 4 HEAP-SORT. 5 Priority queues 6 QUICK-SORT. 7 Analysing QUICK-SORT.

Scalable Algorithmic Techniques Decompositions & Mapping. Alexandre David

: Parallel Algorithms Exercises, Batch 1. Exercise Day, Tuesday 18.11, 10:00. Hand-in before or at Exercise Day

Quick Sort. CSE Data Structures May 15, 2002

Sorting. Dr. Baldassano Yu s Elite Education

08 A: Sorting III. CS1102S: Data Structures and Algorithms. Martin Henz. March 10, Generated on Tuesday 9 th March, 2010, 09:58

Merge Sort

Data Structures and Algorithms " Sorting!

Data Structures. Sorting. Haim Kaplan & Uri Zwick December 2013

Transcription:

Sorting (Chapter 9) Alexandre David B2-206 1

Sorting Problem Arrange an unordered collection of elements into monotonically increasing (or decreasing) order. Let S = <a 1,a 2,,a n >. Sort S into S = <a 1,a 2,,a n > such that a i a j for 1 i j n and S is a permutation of S. 21-04-2006 Alexandre David, MVP'06 2 The elements to sort (actually used for comparisons) are also called the keys. 2

Recall on Comparison Based Sorting Algorithms Ω(n) Ω(n logn) Bubble sort Selection sort Insertion sort Quick sort Merge sort Heap sort Θ(n 2 ) O(n 2 ) Θ(n logn) 21-04-2006 Alexandre David, MVP'06 3 You should know these complexities from a previous course on algorithms. 3

Characteristics of Sorting Algorithms In-place sorting: No need for additional memory (or only constant size). Stable sorting: Ordered elements keep their original relative position. Internal sorting: Elements fit in process memory. External sorting: Elements are on auxiliary storage. 21-04-2006 Alexandre David, MVP'06 4 We assume internal sorting is possible. 4

Fundamental Distinction Comparison based sorting: Compare-exchange of pairs of elements. Lower bound is Ω(n logn) (proof based on decision trees). Merge & heap-sort are optimal. Non-comparison based sorting: Use information on the element to sort. Lower bound is Ω(n). Counting & radix-sort are optimal. 21-04-2006 Alexandre David, MVP'06 5 We assume comparison based sorting is used. 5

Issues in Parallel Sorting Where to store input & output? One process or distributed? Enumeration of processes used to distribute output. How to compare? How many elements per process? As many processes as element poor performance because of inter-process communication. 21-04-2006 Alexandre David, MVP'06 6 6

Parallel Compare-Exchange Communication cost: t s +t w. Comparison cost much cheaper communication time dominates. 21-04-2006 Alexandre David, MVP'06 7 7

Blocks of Elements Per Process n/p elements per process P 0 P 1 P p-1 n elements Blocks: A 0 A 1 A p-1 21-04-2006 Alexandre David, MVP'06 8 8

Compare-Split For large blocks: Θ(n/p) Exchange: Θ(t s +t w n/p) Merge: Θ(n/p) Split: O(n/p) 21-04-2006 Alexandre David, MVP'06 9 9

Sorting Networks Mostly of theoretical interest. Key idea: Perform many comparisons in parallel. Key elements: Comparators: 2 inputs, 2 outputs. Network architecture: Comparators arranged in columns, each performing a permutation. Speed proportional to the depth. 21-04-2006 Alexandre David, MVP'06 10 10

Comparators 21-04-2006 Alexandre David, MVP'06 11 11

Sorting Networks 21-04-2006 Alexandre David, MVP'06 12 12

Bitonic Sequence Definition A bitonic sequence is a sequence of elements <a 0,a 1,,a n > s.t. 1. i, 0 i n-1 s.t. <a 0,,a i > is monotonically increasing and <a i+1,,a n-1 > is monotonically decreasing, 2.or there is a cyclic shift of indices so that 1) is satisfied. 21-04-2006 Alexandre David, MVP'06 13 Example: <1,2,4,7,6,0> & <8,9,2,1,0,4> are bitonic sequences. 13

Bitonic Sort Rearrange a bitonic sequence to be sorted. Divide & conquer type of algorithm (similar to quicksort) using bitonic splits. Sorting a bitonic sequence using bitonic splits = bitonic merge. But we need a bitonic sequence 21-04-2006 Alexandre David, MVP'06 14 14

Bitonic Split s 2 s 1 <a 0,a 1,,a n/2-1,a n/2,a n/2+1,,a n-1 > s 1 s 2 s 1 & s 2 bitonic! s 1 = <min{a 0,a n/2 },min{a 1,a n/2+1 },,min{a n/2-1,a n-1 }> b i s 2 = <max{a 0,a n/2 },max{a 1,a n/2+1 },,max{a n/2-1,a n-1 }> b i 21-04-2006 Alexandre David, MVP'06 15 And in fact the procedure works even if the original sequence needs a cyclic shift to look like this particular case. 15

Bitonic Merging Network logn stages BM[n] n/2 comparators 21-04-2006 Alexandre David, MVP'06 16 Cost: Θ(logn) obviously. 16

Bitonic Sort Use the bitonic network to merge bitonic sequences of increasing length starting from 2, etc. Bitonic network is a component. 21-04-2006 Alexandre David, MVP'06 17 17

Bitonic Sort logn stages Cost: O(log 2 n). Simulated on a serial computer: O(n log 2 n). 21-04-2006 Alexandre David, MVP'06 18 Not cost optimal compared to the optimal serial algorithm. 18

Mapping to Hypercubes & Mesh Idea Communication intensive, so special care for the mapping. How are the input wires paired? Pairs have their labels differing by only one bit mapping to hypercube straightforward. For a But mesh not lower efficient connectivity, & not scalable several solutions because but worse the sequential than the algorithm hypercube T P =Θ(log is suboptimal. 2 n)+θ( n) for 1 element/process. Block of elements: sort locally (n/p logn/p) & use bitonic merge cost optimal. 21-04-2006 Alexandre David, MVP'06 19 Hypercube: Neighbors differ with each other by one bit. 19

Bubble Sort procedure BUBBLE_SORT(n) begin for i := n-1 downto 1 do for j := 1 to i do compare_exchange(a j,a j+1 ); end Θ(n 2 ) Difficult to parallelize as it is because it is inherently sequential. 21-04-2006 Alexandre David, MVP'06 20 It is difficult to sort n elements in time logn using n processes (cost optimal w.r.t. the best serial algorithm in n logn) but it is easy to parallelize other (less efficient) algorithms. 20

Odd-Even Transposition Sort Θ(n 2 ) (a 1,a 2 ),(a 3,a 4 ) (a 2,a 3 ),(a 4,a 5 ) 21-04-2006 Alexandre David, MVP'06 21 21

21-04-2006 Alexandre David, MVP'06 22 22

Odd-Even Transposition Sort Easy to parallelize! Θ(n) if 1 process/element. Not cost optimal but use fewer processes, an optimal local sort, and compare-splits: T P n = Θ log p n p + Θ ( n) + Θ( n) Cost optimal for p = O(logn) local sort (optimal) + comparisons + communication but not scalable (few processes). 21-04-2006 Alexandre David, MVP'06 23 Write speedup & efficiency to find the bound on p but you can also see it with T P. 23

Improvement: Shellsort 2 phases: Move elements on longer distances. Odd-even transposition but stop when no change. Idea: Put quickly elements near their final position to reduce the number of iterations of odd-even transposition. 21-04-2006 Alexandre David, MVP'06 24 24

21-04-2006 Alexandre David, MVP'06 25 25

Quicksort Average complexity: O(n logn). But very efficient in practice. Average robust. Low overhead and very simple. Divide & conquer algorithm: Partition A[q..r] into A[q..s] A[s+1..r]. Recursively sort sub-arrays. Subtlety: How to partition? 21-04-2006 Alexandre David, MVP'06 26 26

q r 3 2 1 53 8 4 35 7 3 2 1 3 87 4 5 78 31 2 13 3 75 4 57 8 1 2 3 3 54 45 7 8 21-04-2006 Alexandre David, MVP'06 27 Hoare partitioning is better. Check in your algorithm course. 27

BUG 21-04-2006 Alexandre David, MVP'06 28 28

Parallel Quicksort Simple version: Recursive decomposition with one process per recursive call. Not cost optimal: Lower bound = n (initial partitioning). Best we can do: Use O(logn) processes. Need to parallelize the partitioning step. 21-04-2006 Alexandre David, MVP'06 29 29

Parallel Quicksort for CRCW PRAM See execution of quicksort as constructing a binary tree. 3,2,1 3 7,4,5,8 1,2 3 5,4 7 8 1 5 8 2 4 21-04-2006 Alexandre David, MVP'06 30 30

Text & algorithm 9.5: A[p..s] x < A[s+1..q]. Figures & algorithm 9.6: A[p..s] < x A[s+1..q]. BUG 21-04-2006 Alexandre David, MVP'06 31 31

only one succeeds A[i] A[parent i ] 21-04-2006 Alexandre David, MVP'06 32 This algorithm does not correspond exactly to the serial version. Time for partitioning: O(1). 32

12 1 12 16 16 1 12 16 1 3 2 2 3 1 4 5 5 8 6 4 7 3 8 7 2 6 root=1 1 3 2 6 21-04-2006 Alexandre David, MVP'06 33 33

1 1 12 16 5 16 1 12 3 16 5 1 3 2 2 3 1 4 5 5 8 6 4 7 3 8 7 2 6 3 1 5 1 3 2 2 6 4 3 1 7 3 5 8 4 5 8 7 Each step: Θ(1). Average height: Θ(logn). This is cost-optimal but it is only a model. 21-04-2006 Alexandre David, MVP'06 34 34

Parallel Quicksort Shared Address (Realistic) Same idea but remove contention: Choose the pivot & broadcast it. Each process rearranges its block of elements locally. Global rearrangement of the blocks. When the blocks reach a certain size, local sort is used. 21-04-2006 Alexandre David, MVP'06 35 35

21-04-2006 Alexandre David, MVP'06 36 36

21-04-2006 Alexandre David, MVP'06 37 37

Cost Scalability determined by time to broadcast the pivot & compute the prefix-sums. Cost optimal. 21-04-2006 Alexandre David, MVP'06 38 38

MPI Formulation of Quicksort Arrays must be explicitly distributed. Two phases: Local partition smaller/larger than pivot. Determine who will sort the sub-arrays. And send the sub-arrays to the right process. 21-04-2006 Alexandre David, MVP'06 39 39

Final Word Pivot selection is very important. Affects performance. Bad pivot means idle processes. 21-04-2006 Alexandre David, MVP'06 40 40