CSE373: Data Structure & Algorithms Lecture 23: More Sorting and Other Classes of Algorithms. Catie Baker Spring 2015

Similar documents
CSE 373: Data Structures and Algorithms

CSE 373: Data Structures and Algorithms

CSE 373 MAY 24 TH ANALYSIS AND NON- COMPARISON SORTING

Sorting. Riley Porter. CSE373: Data Structures & Algorithms 1

CSE 373: Data Structures & Algorithms More Sor9ng and Beyond Comparison Sor9ng

CSE 373: Data Structures and Algorithms

CSE 326: Data Structures Sorting Conclusion

Announcements. HW4: Due tomorrow! Final in EXACTLY 2 weeks. Start studying. Summer 2016 CSE373: Data Structures & Algorithms 1

CSE 373: Data Structures and Algorithms

CSE 373: Data Structures and Algorithms

CSE373: Data Structure & Algorithms Lecture 18: Comparison Sorting. Dan Grossman Fall 2013

CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting. Aaron Bauer Winter 2014

CSE373: Data Structures & Algorithms Lecture 28: Final review and class wrap-up. Nicki Dell Spring 2014

Sorting. Sorting in Arrays. SelectionSort. SelectionSort. Binary search works great, but how do we create a sorted array in the first place?

CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019

Algorithm classification

The Limits of Sorting Divide-and-Conquer Comparison Sorts II

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Dynamic Programming I Date: 10/6/16

CSE 373 Lecture 19: Wrap-Up of Sorting

CS240 Fall Mike Lam, Professor. Quick Sort

Recursion & Performance. Recursion. Recursion. Recursion. Where Recursion Shines. Breaking a Problem Down

CSE 373 NOVEMBER 8 TH COMPARISON SORTS


II (Sorting and) Order Statistics

CLASS: II YEAR / IV SEMESTER CSE CS 6402-DESIGN AND ANALYSIS OF ALGORITHM UNIT I INTRODUCTION

Sorting (I) Hwansoo Han

COT 5407: Introduction to Algorithms. Giri Narasimhan. ECS 254A; Phone: x3748

CS 206 Introduction to Computer Science II

Divide and Conquer Sorting Algorithms and Noncomparison-based

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Module 02 Lecture - 45 Memoization

Chapter 7 Sorting. Terminology. Selection Sort

Lecture 3, Review of Algorithms. What is Algorithm?

COS 226 Midterm Exam, Spring 2009

Practice Problems for the Final

1 ICS 161: Design and Analysis of Algorithms Lecture notes for January 23, Bucket Sorting

PSD1A. DESIGN AND ANALYSIS OF ALGORITHMS Unit : I-V

Intro to Algorithms. Professor Kevin Gold

CS1 Lecture 30 Apr. 2, 2018

SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 6. Sorting Algorithms

Sankalchand Patel College of Engineering - Visnagar Department of Computer Engineering and Information Technology. Assignment

Lecture 12: Dynamic Programming Part 1 10:00 AM, Feb 21, 2018

O(n): printing a list of n items to the screen, looking at each item once.

Lecture 14: General Problem-Solving Methods

Jana Kosecka. Linear Time Sorting, Median, Order Statistics. Many slides here are based on E. Demaine, D. Luebke slides

Trees, Trees and More Trees

Sorting. Dr. Baldassano Yu s Elite Education

Sorting and Selection

Notes on Minimum Spanning Trees. Red Rule: Given a cycle containing no red edges, select a maximum uncolored edge on the cycle, and color it red.

The Knapsack Problem an Introduction to Dynamic Programming. Slides based on Kevin Wayne / Pearson-Addison Wesley

CS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017

Lecture 6: Recursion RECURSION

APCS-AB: Java. Recursion in Java December 12, week14 1

Algorithms. Ch.15 Dynamic Programming

Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Why Sort? Why Sort? select... order by

Recursive-Fib(n) if n=1 or n=2 then return 1 else return Recursive-Fib(n-1)+Recursive-Fib(n-2)

Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION

( ) + n. ( ) = n "1) + n. ( ) = T n 2. ( ) = 2T n 2. ( ) = T( n 2 ) +1

6.001 Notes: Section 4.1

Design and Analysis of Algorithms

Sorting. Sorting. Stable Sorting. In-place Sort. Bubble Sort. Bubble Sort. Selection (Tournament) Heapsort (Smoothsort) Mergesort Quicksort Bogosort

Hiroki Yasuga, Elisabeth Kolp, Andreas Lang. 25th September 2014, Scientific Programming

Sorting. CSE 143 Java. Insert for a Sorted List. Insertion Sort. Insertion Sort As A Card Game Operation. CSE143 Au

Sorting. CPSC 259: Data Structures and Algorithms for Electrical Engineers. Hassan Khosravi

Advanced Algorithms and Data Structures

Recursive Search with Backtracking

CMPSCI 250: Introduction to Computation. Lecture #24: General Search, DFS, and BFS David Mix Barrington 24 March 2014

Advanced Algorithms and Data Structures

Merge Sort. Algorithm Analysis. November 15, 2017 Hassan Khosravi / Geoffrey Tien 1

CS : Data Structures

Week - 04 Lecture - 01 Merge Sort. (Refer Slide Time: 00:02)

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I

CSE373 Spring 2015: Final

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.

Algorithms Interview Questions

Virtual University of Pakistan

CS 2230 CS II: Data structures. Limits of comparison sorting, beyond comparison sorting Brandon Myers University of Iowa

Data Structures and Algorithm Analysis in C++

Recursive Definitions

Lecture 6 CS2110 Spring 2013 RECURSION

Introduction to Algorithms

CS 380 ALGORITHM DESIGN AND ANALYSIS

CSE 143. Two important problems. Searching and Sorting. Review: Linear Search. Review: Binary Search. Example. How Efficient Is Linear Search?

Sorting Algorithms. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

CS 151. Recursion (Review!) Wednesday, September 19, 12

Algorithms. Lecture Notes 5

Notes on Non-Chronologic Backtracking, Implication Graphs, and Learning

5105 BHARATHIDASAN ENGINEERING COLLEGE

CPSC 311 Lecture Notes. Sorting and Order Statistics (Chapters 6-9)

1 More on the Bellman-Ford Algorithm

9. The expected time for insertion sort for n keys is in which set? (All n! input permutations are equally likely.)

Introduction to Programming I

Admin. How's the project coming? After these slides, read chapter 13 in your book. Quizzes will return

CS61BL. Lecture 5: Graphs Sorting

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Sorting (Weiss chapter )

5.1 The String reconstruction problem

Sorting is a problem for which we can prove a non-trivial lower bound.

Advanced Database Systems

Algorithm Design Techniques part I

11.1 Segmentation: Generalized Base/Bounds

Transcription:

CSE373: Data Structure & Algorithms Lecture 23: More Sorting and Other Classes of Algorithms Catie Baker Spring 2015

Admin No class on Monday Extra time for homework 5 2

Sorting: The Big Picture Surprising amount of neat stuff to say about sorting: Simple algorithms: O(n 2 ) Fancier algorithms: O(n log n) Comparison lower bound: (n log n) Specialized algorithms: O(n) Handling huge data sets Insertion sort Selection sort Shell sort Heap sort Merge sort Quick sort Bucket sort Radix sort External sorting 3

Bucket Sort (a.k.a. BinSort) If all values to be sorted are known to be integers between 1 and K (or any small range): Create an array of size K Put each element in its proper bucket (a.k.a. bin) If data is only integers, no need to store more than a count of how times that bucket has been used Output result via linear pass through array of buckets count array 1 3 2 1 3 2 4 2 Example: K=5 input (5,1,3,4,3,2,1,1,5,4,5) output: 1,1,1,2,3,3,4,4,5,5,5 5 3 4

Analyzing Bucket Sort Overall: O(n+K) Linear in n, but also linear in K (n log n) lower bound does not apply because this is not a comparison sort Good when K is smaller (or not much larger) than n We don t spend time doing comparisons of duplicates Bad when K is much larger than n Wasted space; wasted time during linear O(K) pass For data in addition to integer keys, use list at each bucket 5

Bucket Sort with Data Most real lists aren t just keys; we have data Each bucket is a list (say, linked list) To add to a bucket, insert in O(1) (at beginning, or keep pointer to last element) count array 1 Rocky V Example: Movie ratings; scale 1-5;1=bad, 5=excellent Input= 2 5: Casablanca 3 Harry Potter 3: Harry Potter movies 4 5 Casablanca Star Wars 5: Star Wars Original Trilogy 1: Rocky V Result: 1: Rocky V, 3: Harry Potter, 5: Casablanca, 5: Star Wars Easy to keep stable ; Casablanca still before Star Wars Spring 2015 CSE373: Data Structures 6 & Algorithms

Visualization http://www.cs.usfca.edu/~galles/visualization/countingsort.html 7

Radix sort Origins go back to the 1890 U.S. census Radix = the base of a number system Examples will use 10 because we are used to that In implementations use larger numbers For example, for ASCII strings, might use 128 Idea: Bucket sort on one digit at a time Number of buckets = radix Starting with least significant digit Keeping sort stable Do one pass per digit Invariant: After k passes (digits), the last k digits are sorted 8

Example Radix = 10 0 1 2 3 4 5 6 7 8 9 721 3 143 537 67 478 38 9 Input: 478 537 9 721 3 38 143 67 First pass: bucket sort by ones digit Order now: 721 3 143 537 67 478 38 9 9

Example 0 1 721 2 3 3 143 4 5 6 7 537 67 8 478 38 9 9 Radix = 10 0 1 2 3 4 5 6 7 8 9 3 9 721 537 38 143 67 478 Order was: 721 3 Second pass: stable bucket sort by tens digit Order now: 3 9 143 721 537 537 67 38 478 143 38 67 9 478 10

0 1 2 3 4 5 6 7 8 9 Example 3 9 721 537 38 143 67 478 Radix = 10 Order was: 3 9 721 537 38 143 67 478 0 3 9 38 67 1 143 Third pass: 2 3 4 478 stable bucket sort by 100s digit Order now: 3 11 5 537 6 7 721 8 9 9 38 67 143 478 537 721

Analysis Input size: n Number of buckets = Radix: B Number of passes = Digits : P Work per pass is 1 bucket sort: O(B+n) Total work is O(P(B+n)) Compared to comparison sorts, sometimes a win, but often not Example: Strings of English letters up to length 15 Run-time proportional to: 15*(52 + n) This is less than n log n only if n > 33,000 Of course, cross-over point depends on constant factors of the implementations And radix sort can have poor locality properties 12

Sorting: The Big Picture Surprising amount of neat stuff to say about sorting: Simple algorithms: O(n 2 ) Fancier algorithms: O(n log n) Comparison lower bound: (n log n) Specialized algorithms: O(n) Handling huge data sets Insertion sort Selection sort Shell sort Heap sort Merge sort Quick sort Bucket sort Radix sort External sorting 13

Sorting massive data Need sorting algorithms that minimize disk/tape access time: Quicksort and Heapsort both jump all over the array, leading to expensive random disk accesses Merge sort scans linearly through arrays, leading to (relatively) efficient sequential disk access Merge sort is the basis of massive sorting Merge sort can leverage multiple disks Fall 2013 CSE373: Data Structures & Algorithms 14

External Merge Sort Sort 900 MB using 100 MB RAM Read 100 MB of data into memory Sort using conventional method (e.g. quicksort) Write sorted 100MB to temp file Repeat until all data in sorted chunks (900/100 = 9 total) Read first 10 MB of each sorted chuck, merge into remaining 10MB writing and reading as necessary Single merge pass instead of log n Additional pass helpful if data much larger than memory Parallelism and better hardware can improve performance Distribution sorts (similar to bucket sort) are also used 15

Last Slide on Sorting Simple O(n 2 ) sorts can be fastest for small n Selection sort, Insertion sort (latter linear for mostly-sorted) Good for below a cut-off to help divide-and-conquer sorts O(n log n) sorts Heap sort, in-place but not stable nor parallelizable Merge sort, not in place but stable and works as external sort Quick sort, in place but not stable and O(n 2 ) in worst-case Often fastest, but depends on costs of comparisons/copies (n log n) is worst-case and average lower-bound for sorting by comparisons Non-comparison sorts Bucket sort good for small number of possible key values Radix sort uses fewer buckets and more phases Best way to sort? It depends! 16

Done with sorting! (phew..) Moving on. There are many many algorithm techniques in the world We ve learned a few What are a few other classic algorithm techniques you should at least have heard of? And what are the main ideas behind how they work? 17

Algorithm Design Techniques Greedy Shortest path, minimum spanning tree, Divide and Conquer Divide the problem into smaller subproblems, solve them, and combine into the overall solution Often done recursively Quick sort, merge sort are great examples Dynamic Programming Brute force through all possible solutions, storing solutions to subproblems to avoid repeat computation Backtracking A clever form of exhaustive search 18

Dynamic Programming: Idea Divide a bigger problem into many smaller subproblems If the number of subproblems grows exponentially, a recursive solution may have an exponential running time Dynamic programming to the rescue! Often an individual subproblem may occur many times! Store the results of subproblems in a table and re-use them instead of recomputing them Technique called memoization 19

Fibonacci Sequence: Recursive The fibonacci sequence is a very famous number sequence 0, 1, 1, 2, 3, 5, 8, 13, 21, 34,... The next number is found by adding up the two numbers before it. Recursive solution: fib(int n) { if (n == 1 n == 2) { return 1 } return fib(n 2) + fib(n 1) } Exponential running time! A lot of repeated computation 20

Repeated computation f(7) f(5) f(6) f(3) f(4) f(5) f(1) f(2) f(2) f(3) f(4) f(1) f(2) f(4) f(2) f(1) f(3) f(2) f(1) f(3) f(2) f(2) f(1) f(3) f(2) 21

Fibonacci Sequence: memoized fib(int n) { Map results = new Map() results.put(1, 1) results.put(2, 1) return fibhelper(n, results) } fibhelper(int n, Map results) { if (!results.contains(n)) { results.put(n, fibhelper(n-2)+fibhelper(n-1)) } return results.get(n) } Now each call of fib(x) only gets computed once for each x! 22

Comments Dynamic programming relies on working from the bottom up and saving the results of solving simpler problems These solutions to simpler problems are then used to compute the solution to more complex problems Dynamic programming solutions can often be quite complex and tricky Dynamic programming is used for optimization problems, especially ones that would otherwise take exponential time Only problems that satisfy the principle of optimality are suitable for dynamic programming solutions i.e. the subsolutions of an optimal solution of the problem are themselves optimal solutions for their subproblems Since exponential time is unacceptable for all but the smallest problems, dynamic programming is sometimes essential 23 Spring 2015 CSE373: Data Structures & Algorithms

Algorithm Design Techniques Greedy Shortest path, minimum spanning tree, Divide and Conquer Divide the problem into smaller subproblems, solve them, and combine into the overall solution Often done recursively Quick sort, merge sort are great examples Dynamic Programming Brute force through all possible solutions, storing solutions to subproblems to avoid repeat computation Backtracking A clever form of exhaustive search 24

Portion B Backtracking: Idea Backtracking is a technique used to solve problems with a large search space, by systematically trying and eliminating possibilities. A standard example of backtracking would be going through a maze. At some point, you might have two options of which direction to go: Portion A Junction 25

Portion A Backtracking One strategy would be to try going through Portion A of the maze. If you get stuck before you find your way out, then you "backtrack" to the junction. Portion B At this point in time you know that Portion A will NOT lead you out of the maze, so you then start searching in Portion B 26

Backtracking Clearly, at a single junction you could have even more than 2 choices. The backtracking strategy says to try each choice, one after the other, if you ever get stuck, "backtrack" to the junction and try the next choice. If you try all choices and never found a way out, then there IS no solution to the maze. C A B 27

Backtracking (animation)? dead end dead end dead end start??? dead end dead end? success! 28 Spring 2015 CSE373: Data Structures & Algorithms

Backtracking Dealing with the maze: From your start point, you will iterate through each possible starting move. From there, you recursively move forward. If you ever get stuck, the recursion takes you back to where you were, and you try the next possible move. Make sure you don't try too many possibilities, Mark which locations in the maze have been visited already so that no location in the maze gets visited twice. (If a place has already been visited, there is no point in trying to reach the end of the maze from there again. 29

Backtracking The neat thing about coding up backtracking is that it can be done recursively, without having to do all the bookkeeping at once. Instead, the stack of recursive calls does most of the bookkeeping (i.e., keeps track of which locations we ve tried so far.) 30

Backtracking: The 8 queens problem Find an arrangement of 8 queens on a single chess board such that no two queens are attacking one another. In chess, queens can move all the way down any row, column or diagonal (so long as no pieces are in the way). Due to the first two restrictions, it's clear that each row and column of the board will have exactly one queen. 31

Backtracking The backtracking strategy is as follows: 1) Place a queen on the first available square in row 1. 2) Move onto the next row, placing a queen on the first available square there (that doesn't conflict with the previously placed queens). 3) Continue in this fashion until either: a) You have solved the problem, or b) You get stuck. When you get stuck, remove the queens that got you there, until you get to a row where there is another valid square to try. Q Q Q Q Q Continue Q Animated Example: http://www.hbmeyer.de/backt rack/achtdamen/eight.htm#u p 32

Backtracking 8 queens Analysis Another possible brute-force algorithm is generate all possible permutations of the numbers 1 through 8 (there are 8! = 40,320), Use the elements of each permutation as possible positions in which to place a queen on each row. Reject those boards with diagonal attacking positions. The backtracking algorithm does a bit better constructs the search tree by considering one row of the board at a time, eliminating most non-solution board positions at a very early stage in their construction. because it rejects row and diagonal attacks even on incomplete boards, it examines only 15,720 possible queen placements. 15,720 is still a lot of possibilities to consider Sometimes we have no other choice but to do the best we can 33

Algorithm Design Techniques Greedy Shortest path, minimum spanning tree, Divide and Conquer Divide the problem into smaller subproblems, solve them, and combine into the overall solution Often done recursively Quick sort, merge sort are great examples Dynamic Programming Brute force through all possible solutions, storing solutions to subproblems to avoid repeat computation Backtracking A clever form of exhaustive search 34