Searching, Sorting. part 1

Similar documents
Arrays, Vectors Searching, Sorting

Data Structures and Algorithms Week 4

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

Data Structures and Algorithms Chapter 4

Problem. Input: An array A = (A[1],..., A[n]) with length n. Output: a permutation A of A, that is sorted: A [i] A [j] for all. 1 i j n.

Lecture 5: Sorting Part A

CS S-11 Sorting in Θ(nlgn) 1. Base Case: A list of length 1 or length 0 is already sorted. Recursive Case:

Week 10. Sorting. 1 Binary heaps. 2 Heapification. 3 Building a heap 4 HEAP-SORT. 5 Priority queues 6 QUICK-SORT. 7 Analysing QUICK-SORT.

We can use a max-heap to sort data.

Week 10. Sorting. 1 Binary heaps. 2 Heapification. 3 Building a heap 4 HEAP-SORT. 5 Priority queues 6 QUICK-SORT. 7 Analysing QUICK-SORT.

7. Sorting I. 7.1 Simple Sorting. Problem. Algorithm: IsSorted(A) 1 i j n. Simple Sorting

Sorting. Bubble Sort. Pseudo Code for Bubble Sorting: Sorting is ordering a list of elements.

Data Structures and Algorithms. Roberto Sebastiani

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, Merge Sort & Quick Sort

Copyright 2009, Artur Czumaj 1

CPSC 311 Lecture Notes. Sorting and Order Statistics (Chapters 6-9)

Deliverables. Quick Sort. Randomized Quick Sort. Median Order statistics. Heap Sort. External Merge Sort

Quicksort. Repeat the process recursively for the left- and rightsub-blocks.

Next. 1. Covered basics of a simple design technique (Divideand-conquer) 2. Next, more sorting algorithms.

1. Covered basics of a simple design technique (Divideand-conquer) 2. Next, more sorting algorithms.

8. Sorting II. 8.1 Heapsort. Heapsort. [Max-]Heap 6. Heapsort, Quicksort, Mergesort. Binary tree with the following properties Wurzel

Sorting and Selection

Comparison Sorts. Chapter 9.4, 12.1, 12.2

Jana Kosecka. Linear Time Sorting, Median, Order Statistics. Many slides here are based on E. Demaine, D. Luebke slides

Question 7.11 Show how heapsort processes the input:

Divide and Conquer 4-0

Mergesort again. 1. Split the list into two equal parts

Sorting Algorithms. + Analysis of the Sorting Algorithms

CS 5321: Advanced Algorithms Sorting. Acknowledgement. Ali Ebnenasir Department of Computer Science Michigan Technological University

4. Sorting and Order-Statistics

CS 310 Advanced Data Structures and Algorithms

Lower Bound on Comparison-based Sorting

08 A: Sorting III. CS1102S: Data Structures and Algorithms. Martin Henz. March 10, Generated on Tuesday 9 th March, 2010, 09:58

The divide and conquer strategy has three basic parts. For a given problem of size n,

A data structure and associated algorithms, NOT GARBAGE COLLECTION

Quicksort (Weiss chapter 8.6)

CSC Design and Analysis of Algorithms. Lecture 8. Transform and Conquer II Algorithm Design Technique. Transform and Conquer

Tirgul 4. Order statistics. Minimum & Maximum. Order Statistics. Heaps. minimum/maximum Selection. Overview Heapify Build-Heap

CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting. Aaron Bauer Winter 2014

COMP Data Structures

Outline. CS 561, Lecture 6. Priority Queues. Applications of Priority Queue. For NASA, space is still a high priority, Dan Quayle

CSE 3101: Introduction to the Design and Analysis of Algorithms. Office hours: Wed 4-6 pm (CSEB 3043), or by appointment.

Design and Analysis of Algorithms

EECS 2011M: Fundamentals of Data Structures

SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 6. Sorting Algorithms

Lecture #2. 1 Overview. 2 Worst-Case Analysis vs. Average Case Analysis. 3 Divide-and-Conquer Design Paradigm. 4 Quicksort. 4.

CISC 621 Algorithms, Midterm exam March 24, 2016

Module 2: Classical Algorithm Design Techniques

Chapter 4. Divide-and-Conquer. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

Sorting Shabsi Walfish NYU - Fundamental Algorithms Summer 2006

Comparisons. Θ(n 2 ) Θ(n) Sorting Revisited. So far we talked about two algorithms to sort an array of numbers. What is the advantage of merge sort?

Sorting Algorithms. For special input, O(n) sorting is possible. Between O(n 2 ) and O(nlogn) E.g., input integer between O(n) and O(n)

Total Points: 60. Duration: 1hr

Lecture 2: Divide&Conquer Paradigm, Merge sort and Quicksort

CS 171: Introduction to Computer Science II. Quicksort

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

Data Structures and Algorithms

Comparisons. Heaps. Heaps. Heaps. Sorting Revisited. Heaps. So far we talked about two algorithms to sort an array of numbers

Quick Sort. CSE Data Structures May 15, 2002

Sorting. Two types of sort internal - all done in memory external - secondary storage may be used

CS 171: Introduction to Computer Science II. Quicksort

Sorting. Dr. Baldassano Yu s Elite Education

Module 3: Sorting and Randomized Algorithms

Heaps, Heapsort, Priority Queues

Algorithms and Data Structures

Chapter 3:- Divide and Conquer. Compiled By:- Sanjay Patel Assistant Professor, SVBIT.

CS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017

Sorting. Sorting. Stable Sorting. In-place Sort. Bubble Sort. Bubble Sort. Selection (Tournament) Heapsort (Smoothsort) Mergesort Quicksort Bogosort

Algorithms and Data Structures for Mathematicians

CS 361 Data Structures & Algs Lecture 9. Prof. Tom Hayes University of New Mexico

CPSC 320 Midterm 2 Thursday March 13th, 2014

Lecture 7. Transform-and-Conquer

DIVIDE AND CONQUER ALGORITHMS ANALYSIS WITH RECURRENCE EQUATIONS

2/26/2016. Divide and Conquer. Chapter 6. The Divide and Conquer Paradigm

g(n) time to computer answer directly from small inputs. f(n) time for dividing P and combining solution to sub problems

Chapter 6 Heap and Its Application

COMP Analysis of Algorithms & Data Structures

CS61B Lectures # Purposes of Sorting. Some Definitions. Classifications. Sorting supports searching Binary search standard example

having any value between and. For array element, the plot will have a dot at the intersection of and, subject to scaling constraints.

Sorting (I) Hwansoo Han

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Divide and Conquer

DIVIDE & CONQUER. Problem of size n. Solution to sub problem 1

Module 3: Sorting and Randomized Algorithms

Module 3: Sorting and Randomized Algorithms. Selection vs. Sorting. Crucial Subroutines. CS Data Structures and Data Management

IS 709/809: Computational Methods in IS Research. Algorithm Analysis (Sorting)

CSC Design and Analysis of Algorithms

Analysis of Algorithms. Unit 4 - Analysis of well known Algorithms

CSC Design and Analysis of Algorithms. Lecture 6. Divide and Conquer Algorithm Design Technique. Divide-and-Conquer

SORTING AND SELECTION

CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019

CSC Design and Analysis of Algorithms. Lecture 6. Divide and Conquer Algorithm Design Technique. Divide-and-Conquer

Programming II (CS300)

Transform & Conquer. Presorting

CS Divide and Conquer

Introduction to Algorithms 3 rd edition

Lecture 9: Sorting Algorithms

CS240 Fall Mike Lam, Professor. Quick Sort

Divide-and-Conquer. The most-well known algorithm design strategy: smaller instances. combining these solutions

Divide and Conquer CISC4080, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang

COSC242 Lecture 7 Mergesort and Quicksort

Transcription:

Searching, Sorting part 1

Week 3 Objectives Searching: binary search Comparison-based search: running time bound Sorting: bubble, selection, insertion, merge Sorting: Heapsort Comparison-based sorting time bound

Brute force/linear search Linear search: look through all values of the array until the desired value/event/condition found Running Time: linear in the number of elements, call it O(n) Advantage: in most situations, array does not have to be sorted

Binary Search Array must be sorted Search array A from index b to index e for value V Look for value V in the middle index m = (b+e)/2 - That is V with A[m]; if equal return index m - If V<A[m] search the first half of the array - If V>A[m] search the second half of the array V=3 b m e -4-1 0 0 1 1 3 19 29 47 A[m]=1 < V=3 => search moves to the right half

Binary Search Efficiency every iteration/recursion - ends the procedure if value is found - if not, reduces the problem size (search space) by half worst case : value is not found until problem size=1 - how many reductions have been done? - n / 2 / 2 / 2 /.... / 2 = 1. How many 2-s do I need? - if k 2-s, then n= 2 k, so k is about log(n) - worst running time is O(log n)

Search: tree of comparisons tree of comparisons : essentially what the algorithm does

Search: tree of comparisons tree of comparisons : essentially what the algorithm does - each program execution follows a certain path

Search: tree of comparisons tree of comparisons : essentially what the algorithm does - each program execution follows a certain path - red nodes are terminal / output - the algorithm has to have at least n output nodes... why?

Search: tree of comparisons tree depth=5 tree of comparisons : essentially what the algorithm does - each program execution follows a certain path - red nodes are terminal / output - the algorithm has to have n output nodes... why? - if tree is balanced, longest path = tree depth = log(n)

Bubble Sort Simple idea: as long as there is an inversion, swap the bubble - inversion = a pair of indices i<j with A[i]>A[j] - swap A[i]<->A[j] - directly swap (A[i], A[j]); - code it yourself: aux = A[i]; A[i]=A[j];A[j]=aux; how long does it take? - worst case : how many inversions have to be swapped? - O(n 2 )

Insertion Sort partial array is sorted 1 5 8 20 49 get a new element V=9

Insertion Sort partial array is sorted 1 5 8 20 49 get a new element V=9 find correct position with binary search i=3

Insertion Sort partial array is sorted 1 5 8 20 49 get a new element V=9 find correct position with binary search i=3 move elements to make space for the new element 1 5 8 20 49

Insertion Sort partial array is sorted 1 5 8 20 49 get a new element V=9 find correct position with binary search i=3 move elements to make space for the new element 1 5 8 20 49 insert into the existing array at correct position 1 5 8 9 20 49

Insertion Sort - variant partial array is sorted 1 5 8 20 49

Insertion Sort - variant partial array is sorted 1 5 8 20 49

Insertion Sort - variant partial array is sorted 1 5 8 20 49 get a new element V=9; put it at the end of the array 1 5 8 20 49 9

Insertion Sort - variant partial array is sorted 1 5 8 20 49 get a new element V=9; put it at the end of the array 1 5 8 20 49 9 Move in V=9 from the back until reaches correct position 1 5 8 20 9 49

Insertion Sort - variant partial array is sorted 1 5 8 20 49 get a new element V=9; put it at the end of the array 1 5 8 20 49 9 Move in V=9 from the back until reaches correct position 1 5 8 20 9 49 1 5 8 9 20 49

Insertion Sort Running Time For one element, there might be required to move O(n) elements (worst case (n)) - O(n) insertion time Repeat insertion for each element of the n elements gives n*o(n) = O(n 2 ) running time

Selection Sort used A C sort array A[] into a new array C[] while (condition) - find minimum element x in A at index i, ignore "used" elements - write x in next available position in C - mark index i in A as "used" so it doesn't get picked up again Insertion/Selection Running Time = O(n 2 ) 10-1 -5 12-1 9

Selection Sort used A C sort array A[] into a new array C[] while (condition) - find minimum element x in A at index i, ignore "used" elements - write x in next available position in C - mark index i in A as "used" so it doesn't get picked up again Running Time = O(n 2 ) 10-1 -5 12-1 9-5

Selection Sort used A C sort array A[] into a new array C[] while (condition) - find minimum element x in A at index i, ignore "used" elements - write x in next available position in C - mark index i in A as "used" so it doesn't get picked up again Running Time = O(n 2 ) 10-1 -5 12-1 9-5 -1

Selection Sort used A C sort array A[] into a new array C[] while (condition) - find minimum element x in A at index i, ignore "used" elements - write x in next available position in C - mark index i in A as "used" so it doesn't get picked up again Running Time = O(n 2 ) 10-1 -5 12-1 9-5 -1-1

Selection Sort used A C sort array A[] into a new array C[] while (condition) - find minimum element x in A at index i, ignore "used" elements - write x in next available position in C - mark index i in A as "used" so it doesn't get picked up again Running Time = O(n 2 ) 10-1 -5 12-1 9-5 -1-1 9

Selection Sort used A C sort array A[] into a new array C[] while (condition) - find minimum element x in A at index i, ignore "used" elements - write x in next available position in C - mark index i in A as "used" so it doesn't get picked up again Running Time = O(n 2 ) 10-1 -5 12-1 9-5 -1-1 9 10

Selection Sort sort array A[] into a new array C[] while (condition) - find minimum element x in A at index i, ignore "used" elements - write x in next available position in C - mark index i in A as "used" so it doesn't get picked up again Running Time = O(n 2 ) used A 10-1 -5 12-1 9 C -5-1 -1 9 10 12

Merge two sorted arrays two sorted arrays - A[] = { 1, 5, 10, 100, 200, 300}; B[] = {2, 5, 6, 10}; merge them into a new array C index i for array A[], j for B[], k for C[] init i=j=k=0; while (what_condition_?) if (A[i] <= B[j]) { C[k]=A[i], i++ } //advance i in A else {C[k]=B[j], j++} // advance j in B advance k end_while

Merge two sorted arrays complete pseudocode index i for array A[], j for B[], k for C[] init i=j=k=0; while (k < size(a)+size(b)+1) if(i>size(a) {C[k]=B[j], j++} // copy elem from B else if (j>size(b) {C[k]=A[i], i++} // copy elem from A else if (A[i] <= B[j]) { C[k]=A[i], i++ } //advance i else {C[k]=B[j], j++} // advance j k++ //advance k end_while

MergeSort divide and conquer strategy MergeSort array A - divide array A into two halves A-left, A-right - MergeSort A-left (recursive call) - MergeSort A-right (recursive call) - Merge (A-left, A-right) into a fully sorted array running time : O(nlog(n))

MergeSort running time T(n) = 2T(n/2) + (n) - 2 sub-problems of size n/2 each, and a linear time to combine results - Master Theorem case 2 (a=2, b=2, c=1) - Running time T(n) = (n logn)

Heap DataStructure 2 4 1 (a) 1 2 3 4 5 6 7 8 9 10 16 14 10 8 7 9 3 1 2 3 4 5 6 7 8 9 10 16 14 10 8 7 9 3 2 4 1 (b) binary tree max-heap property : parent > children

Max Heap property 16 14 10 16 i 4 10 14 7 9 3 2 8 1 (a) 1 2 3 4 5 6 7 8 9 10 i 4 7 9 3 2 8 1 (b) 1 2 3 4 5 6 7 8 9 10 8 16 14 10 8 7 9 3 i 2 4 1 1 2 3 4 5 6 7 9 10 (c) Assume the Left and Right subtrees satisfy the Max- Heap property, but the top node does not Float down the node by consecutively swapping it with higher nodes below it.

Building a heap Representing the heap as array datastructure - Parent(i) = i/2 - Left_child(i)=2i - Right_child(i) = 2i+1 A = input array has the last half elements leafs MAX-HEAPIFY the first half of A, reverse order for i=size(a)/2 downto 1 MAX-HEAPIFY (A,i)

Heapsort Build a Max-Heap from input array LOOP - swap heap_root (max) with a leaf - output (take out) the max element; reduce size - MAX-HEAPIFY from the root to maintain the heap property END LOOP the output is in order

HeapSort running time Max-Heapify procedure time is given by recurrence - T(n) T(2n/3) + (1) - master Theorem T(n)=O(logn) Build Max-Heap : running n times the Max-Heapify procedure gives the running time O(nlogn) Extracting values: again run n times the Max- Heapify procedure gives the running time O(nlogn) Total O(nlogn)

Sorting : tree of comparisons tree depth tree of comparisons : essentially what the algorithm does - each program execution follows a certain path - red nodes are terminal / output - the algorithm has to have n! output nodes... why? - if tree is balanced, longest path = tree depth = n log(n)

QuickSort - pseudocode QuickSort(A,b,e) //array%a%,%sort%between%indices%b%and%e - q = Partition(A,b,e) //%returns%pivot%q,%b<=q<=e - %//%Partition%also%rearranges%A%so%that%if%i<q then A[i]<=A[q] - //%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%and%if%i>q then A[i]>=A[q]% - if(b<q-1) QuickSort(A,b,q-1) - if(q+1<e) QuickSort(A,q+1,e) After Partition the pivot index contains the right value: b=0 q=3 e=9-3 0 5 7 18 8 7 29 21 10

QuickSort Partition TASK: rearrange A and find pivot q, such that - all elements before q are smaller than A[q] - all elements after q are bigger than A[q] Partition (A, b, e) - x=a[e]//pivot value - i=b-1 - for j=b TO e-1 if A[j]<=x then i++; swap A[i]<->A[j] - swap A[i+1]<->A[e] - q=i+1; return q

Partition Example set pivot value x = A[e], // x=4 - i =index of last value < x - i+1 = index of first value > x run j through array indices b to e-1 - if A[j] <= x //see steps (d),(e) - swap (A[j], A[i+1]); - i++; //advance i move pivot in the right place - swap (pivot=a[e], A[i+1]) return pivot index - return i+1

QuickSort time Partition runs in linear time - If pivot position is q, the QuickSort recurrence is T(n) = n + T(q) + T (n-q) Best case q is always in the middle - T(n)=n+2T(n/2), overall Θ(n*logn) Worst case: q is always at extreme, 1 or n - T(n) =n + T(1) + T(n-1), overall Θ(n 2 )

QuickSort Running Time Depends on the Partition balance Worst case: Partition produces unbalanced split n = (1, n-1) most of the time - results in O(n 2 ) running time Average case: most of the time split balance is not worse than n = (cn, (1-c)n) for a fixed c - for example c=0.99 means balance not worse than (1/100*n, 99/100*n) - results in O(n*logn) running time - can prove that on expectation (average), if pivot value is chosen randomly, running time is Θ(n*logn), see book.

Median Stats Task: find k-th element - k=n is same as find MAX, or find highest - k=2 means find second-smalles - k=1 is same as finding MIN naive approach, based on selection sort: - find first smallest (MIN) - then find second smallest, third smallest, etc; until the k-th smallest element - Running Time: average case k=θ(n), and each finding min takes Θ (n) time, so total Θ(n 2 )

Median Stats find k-th element better approach, based on QuickSort Median(A,b,e,k) //%find%k>th%greatest%in%array%a%,%sort%between%indices%b=1%and%e=n - q = Partition(A,b,e) //%returns%pivot%index%q,%b<=q<=e - %//%Partition%also%rearranges%A%so%that%if%i<q then A[i]<=A[q] - //%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%and%if%i>q then A[i]>=A[q]% - if(q==k) return A[q] //%found%the%k>th%greatest - if(q>k) Median(A,b,q-1,k) - else Median(A,q+1,e,q-k) Not like Quiksort, Median recursion goes only on one side, depending on the pivot why the second Median call has k (new) =q-k (old)?

Median Stats Running Time of Median the recursive calls makes T(n) =n + max( T(q), T(n-q)) - max : assuming the recursion has to call the longer side - just like QuickSort, average case is when q is balanced, i.e. cn<q< (1-c)n for some constant 0<c<1 - balanced case: T(n) = n + T(cn); Master Theorem gives linear time Θ (n) - expected (average) case can be proven linear time (see book); worst case Θ(n 2 ) worst case can run in linear time with a rather complicated choice of the pivot value before each partition call (see book)

Linear-time Sorting: Counting Sort Counting Sort (A[]) : count values, NO comparisons STEP 1 : build array C that counts A values init C[]=0 ; - run index i through A value = A[i] - C[value] ++; //counts each value occurrence STEP 2: assign values to counted positions init position=0; for value=0:range for i=1:c[value] position = position+1; OUTPUT[position]=value;

Counting Sort n elements with values in k-range of {v 1,v 2,...v k } - for example: 100,000 people sorted by age: n=100,000; k = {1,2,3,...170} since 170 is maximum reasonable age in years. Linear Time Θ(n+k) - Beats the bound? YES, linear Θ(n), not Θ(n*logn), if k is a constant - Definitely appropriate when k is constant or increases very slowly - Not good when k can be large. Example: sort pictures by their size; n=10000 (typical picture collection), size range k can be any number from 200Bytes to 40MBytes. Stable (equal input elements preserve original order)