CS1 Lecture 30 Apr. 2, 2018

Similar documents
Faster Sorting Methods

Programming II (CS300)

Sorting. Riley Porter. CSE373: Data Structures & Algorithms 1

CSE 373: Data Structures and Algorithms

CS 171: Introduction to Computer Science II. Quicksort

Mergesort again. 1. Split the list into two equal parts

Data Structures and Algorithms

CS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017

Sorting. CSE 143 Java. Insert for a Sorted List. Insertion Sort. Insertion Sort As A Card Game Operation. CSE143 Au

We can use a max-heap to sort data.

CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting. Aaron Bauer Winter 2014

Quicksort (Weiss chapter 8.6)

Sorting. CSC 143 Java. Insert for a Sorted List. Insertion Sort. Insertion Sort As A Card Game Operation CSC Picture

Mergesort again. 1. Split the list into two equal parts

CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019

Introduction to Computers and Programming. Today

Sorting. Sorting. Stable Sorting. In-place Sort. Bubble Sort. Bubble Sort. Selection (Tournament) Heapsort (Smoothsort) Mergesort Quicksort Bogosort

CSC Design and Analysis of Algorithms

CSC Design and Analysis of Algorithms. Lecture 6. Divide and Conquer Algorithm Design Technique. Divide-and-Conquer

Sorting. Sorting in Arrays. SelectionSort. SelectionSort. Binary search works great, but how do we create a sorted array in the first place?

Lecture 19 Sorting Goodrich, Tamassia

CSE 373: Data Structures and Algorithms

CS125 : Introduction to Computer Science. Lecture Notes #40 Advanced Sorting Analysis. c 2005, 2004 Jason Zych

L14 Quicksort and Performance Optimization

Programming II (CS300)

CS240 Fall Mike Lam, Professor. Quick Sort

CSC 273 Data Structures

Real-world sorting (not on exam)

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, Merge Sort & Quick Sort

CSE373: Data Structure & Algorithms Lecture 18: Comparison Sorting. Dan Grossman Fall 2013

SORTING AND SELECTION

Object-oriented programming. and data-structures CS/ENGRD 2110 SUMMER 2018

Divide and Conquer CISC4080, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang

Better sorting algorithms (Weiss chapter )

9/10/12. Outline. Part 5. Computational Complexity (2) Examples. (revisit) Properties of Growth-rate functions(1/3)

Divide and Conquer CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Outline

KF5008 Algorithm Efficiency; Sorting and Searching Algorithms;

08 A: Sorting III. CS1102S: Data Structures and Algorithms. Martin Henz. March 10, Generated on Tuesday 9 th March, 2010, 09:58

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

Lecture 9: Sorting Algorithms

DIVIDE AND CONQUER ALGORITHMS ANALYSIS WITH RECURRENCE EQUATIONS

Comparison Sorts. Chapter 9.4, 12.1, 12.2

CS 112 Introduction to Computing II. Wayne Snyder Computer Science Department Boston University

CS 171: Introduction to Computer Science II. Quicksort

Data structures. More sorting. Dr. Alex Gerdes DIT961 - VT 2018

IS 709/809: Computational Methods in IS Research. Algorithm Analysis (Sorting)

DATA STRUCTURES AND ALGORITHMS

COMP 250. Lecture 7. Sorting a List: bubble sort selection sort insertion sort. Sept. 22, 2017

Sorting and Searching Algorithms

Sorting and Selection

Sorting Algorithms. + Analysis of the Sorting Algorithms

Sorting Pearson Education, Inc. All rights reserved.

CSE 373: Data Structures & Algorithms More Sor9ng and Beyond Comparison Sor9ng

CSC Design and Analysis of Algorithms. Lecture 6. Divide and Conquer Algorithm Design Technique. Divide-and-Conquer

CSE 373 MAY 24 TH ANALYSIS AND NON- COMPARISON SORTING

Divide-and-Conquer. Dr. Yingwu Zhu

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

CSE 373: Data Structures and Algorithms

The divide and conquer strategy has three basic parts. For a given problem of size n,

The Limits of Sorting Divide-and-Conquer Comparison Sorts II

About this exam review

CompSci 105. Sorting Algorithms Part 2

Algorithm Efficiency & Sorting. Algorithm efficiency Big-O notation Searching algorithms Sorting algorithms

CISC 1100: Structures of Computer Science

Sorting. Bubble Sort. Pseudo Code for Bubble Sorting: Sorting is ordering a list of elements.

We will stamp HW Block day:

Algorithms and Data Structures (INF1) Lecture 7/15 Hua Lu

Lecture 8: Mergesort / Quicksort Steven Skiena

DATA STRUCTURE AND ALGORITHM USING PYTHON

CSE 373 NOVEMBER 8 TH COMPARISON SORTS

Algorithmic Analysis and Sorting, Part Two

SORTING, SETS, AND SELECTION

Sorting. Task Description. Selection Sort. Should we worry about speed?

What is an algorithm? CISC 1100/1400 Structures of Comp. Sci./Discrete Structures Chapter 8 Algorithms. Applications of algorithms

QuickSort

Divide and Conquer Algorithms: Advanced Sorting

Sorting Algorithms. Slides used during lecture of 8/11/2013 (D. Roose) Adapted from slides by

What is sorting? Lecture 36: How can computation sort data in order for you? Why is sorting important? What is sorting? 11/30/10

Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION

Question 7.11 Show how heapsort processes the input:

Chapter 7 Sorting. Terminology. Selection Sort

2/14/13. Outline. Part 5. Computational Complexity (2) Examples. (revisit) Properties of Growth-rate functions(1/3)

Quicksort. Repeat the process recursively for the left- and rightsub-blocks.

Topic 17 Fast Sorting

Sorting. Weiss chapter , 8.6

input sort left half sort right half merge results Mergesort overview

Lecture Notes 14 More sorting CSS Data Structures and Object-Oriented Programming Professor Clark F. Olson

DATA STRUCTURES AND ALGORITHMS

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

Divide-and-Conquer. The most-well known algorithm design strategy: smaller instances. combining these solutions

Key question: how do we pick a good pivot (and what makes a good pivot in the first place)?

COMP2012H Spring 2014 Dekai Wu. Sorting. (more on sorting algorithms: mergesort, quicksort, heapsort)

Parallel Sorting Algorithms

A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 3 Parallel Prefix, Pack, and Sorting

CSE 373: Data Structure & Algorithms Comparison Sor:ng

Copyright 2009, Artur Czumaj 1

CS 61B Summer 2005 (Porter) Midterm 2 July 21, SOLUTIONS. Do not open until told to begin

Quick Sort. CSE Data Structures May 15, 2002

Divide and Conquer. Algorithm Fall Semester

SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 6. Sorting Algorithms

Transcription:

CS1 Lecture 30 Apr. 2, 2018 HW 7 available very different than others you need to produce a written document based on experiments comparing sorting methods If you are not using a Python (like Anaconda) that has pylab installed, get one. You need it for HW7 Discussion sections tomorrow are important! Will cover basics of making graphs with Pylab as needed for HW7. We won t cover this in lecture except for a few minutes today! Exam 2: Thursday, April 19, 6:30-8:00pm

Last time Sorting introduction selection sort, insertion sort Today Insertion sort and faster sorting methods

HW7 Compare sorting methods use Pylab to make charts/graphs of their running time behavior write about results making meaningful graphs is often not easy You must experiment to find good sizes for data You should test on large enough data to clearly understand differences/similarities (for some sorts, you should use lists hundreds of thousands and/or millions long) You probably can t put all results on one graph At a certain size level, growth differences of some algorithms will be clear but distinctions between others not yet clear. Drop some and continue with larger sizes for ones not yet clearly distinguished You should test on data of various kinds unsorted vs already sorted This all takes time and thought

HW7: you can use functions like this: def testinsertionsort(minn= 1000, maxn=20000, step=2000): listsizes = list(range(minn, maxn, step)) runtimes = [] for listsize in listsizes: listtosort = mixup(list(range(listsize))) starttime = time.time() insertionsort(listtosort) endtime = time.time() runtimes.append(endtime-starttime) pylab.figure(1) pylab.clf() pylab.plot(listsizes, runtimes, 'ro-') lec30.py

Sorting ( https://www.youtube.com/watch?v=k4rri_ntqc8 ) It s mostly a solved problem available as excellent built-in functions so why study? The variety of sorting algorithms demonstrate a variety of important computer science algorithmic design and analysis techniques. Sorting has been studied for a long time. Many algorithms: selection sort, insertion sort, bubble sort, radix short, Shell short, quicksort, heapsort, counting sort, Timsort, comb sort, bucket sort, bead sort, pancake sort, spaghetti sort (see, e.g., wikipedia: sorting algorithm) Why sort? Searching a sorted list is very fast, even for very large lists (log n is your friend). So if you are going to do a lot of searching So, should you always sort? (Python makes it so easy ) We can search an unsorted list in O(n), so answer depends on how fast we can sort. We certainly can t sort fast than linear time (must look at, and maybe move, each item). In fact, in general we cannot sort in O(n). Best comparison-based sorting algorithms are O(n log n) So, when should you sort? If, for example, you have many searches to do. Suppose we have n/2 searches to do. n/2 linear searches à n/2 * O(n) à O(n 2 ) sort, followed by n/2 binary searches à O(n log n) + n/2 * O(log n) à O(n log n) + O(n log n) à O(n log n) for large n, this is much faster

Sorting selection sort Sorted and in final position i Unsorted def selectionsort(l): for i in range(len(l)): # swap min item in unsorted region with ith # item Sorted and in final position i Unsorted

Sorting insertion sort Slightly different main step picture than for selection sort Sorted, not yet in final position i Unsorted Given: L[0:i] sorted (but not necessarily in final position) L[i:] unsorted How do we grow solution? Move L[i] into correct spot (shifting larger ones in L[0:i] one slot to the right

Idea: repeatedly move first item in unsorted part and to proper place in sorted part 5 23-2 15 100 1 8 2 Sorted Not yet sorted 5 23-2 15 100 1 8 2 5 23-2 15 100 1 8 2 5 23-2 15 100 1 8 2-2 5 23 15 100 1 8 2-2 5 15 23 100 1 8 2-2 5 15 23 100 1 8 2-2 1 5 15 23 100 8 2-2 1 5 8 15 23 100 2-2 1 2 5 8 15 23 100

Running time of selection sort and insertion sort Selection sort O(n 2 ) always worst, best, average case. It always searches the entire unsorted portion of the list to find the next min. No distinction between best/worst/average cases. Insertion sort In best case, while loop never executes, so O(n) In worst case, while loop moves ith item all the way to L[0]. This yields the familiar sum, 0 + 1 + 2 + + n, once again. Thus, O(n 2 ). Average case is also O(n 2 ) Among O(n 2 ) sorts, insertion sort is good one to remember. In practice, it works well on almost sorted data, which is common. It is sometimes used as a finish the job component of hybrid sorting methods use an O(n log n) sorting method until the list is almost sorted, then switch to insertion sort to finish.

How might we sort faster than insertion sort, selection sort, and other simple sorts? Can we do better than O(n^2)? Try a divide-and-conquer approach: Divide problem into subproblems Solve subproblems recursively Combine results Binary search is a special case of divide and conquer. Check middle element and create one subproblem of half the size This yielded speed-up from O(n) to O(log n) Many problems benefit from divide and conquer approach

How can we use divide-and-conquer to sort? 1. divide list into two (almost) equal halves 2. sort each half recursively! 3. combine two sorted halves into fully sorted result unsorted unsorted unsorted sorted sorted sorted Is it clear how to implement each step?

Sorting by divide and conquer 1. divide list into two (almost) equal halves O(1) 2. recursively sort each half Make two recursive calls 3. combine two sorted halves into fully sorted result Merge algorithm -> O(n)

Merge sort Implementation in lec30.py Running time? In more advanced computing classes, learn about things like recurrence equations. Can express running time via: T(1) = c T(n) = time for recursive calls + divide time + combine time = 2 * T(n/2) + O(1) + O(n) we can also analyze by looking at recursion tree and adding up times Get?

Recursion tree for analyzing merge sort running time T div T merge 1 * 1 + 1 * n = O(n) 2 * 1 + 2 * n/2 = O(n) How deep does this go? How many levels? Log n n/2 two-element lists n one-element lists (base cases) n/2 * 1 + n/2 * 2 = O(n) n * O(1) = O(n) Total: O(n) * # of levels =? O(n log n)

Mergesort Merge sort is very famous sorting algorithm. Classic example of power of divide and conquer. Yields optimal O(n log n) sort. Much faster than O(n 2 ) algorithms. Can be implemented non-recursively (but most people find the rec. version much easier to implement) One possible concern? Standard implementations require O(n) additional space Python s built-in sort is (in most Python implementations) Timsort (named after Tim Peters), a hybrid drawing from mergesort and insertion sort. Has O(n log n) average and worst-case time (like mergesort) and O(n) best case time (like insertion sort) news in 2015?! Bug found in Timsort implementations in Python, Java, Android! Has been there for a while

Another divide-and-conquer style sorting algorithm: quicksort 1. divide list into two parts, one containing all elements < some number, the other containing all elements >= that number 2. Recursively sort parts 3. combine two sorted halves into fully sorted result unsorted < pivot item >= pivot item sorted sorted sorted Is it clear this time how to implement each step?

Quicksort Divide step mergesort: easy just cut in half O(1) quicksort: takes work scan through entire list putting elements in > or < (vs pivot) part O(n) Combine step mergesort: takes work step through subproblem solutions, merging them O(n) quicksort: easy, we re done! Just put parts together O(1) Subproblems mergesort: the two subproblems always half the size quicksort: subproblem size depends of value of item you choose as pivot. What pivot would yield half-sized subproblems? Can you find that pivot item easily? poor pivot choices can yield poor sorting performance good practical pivot choices: 1) median of first, middle, last items, 2) random item

Quicksort Quicksort has worst case running time of O(n 2 ) BUT average case O(n log n) and if we choose pivot properly we can make worst case very very unlikely. (The detailed mathematical analysis of this is not so easy). Unlike mergesort, is in place does not require O(n) extra space). When implemented properly is excellent and very commonly used sorting method in the real world. Be careful if you implement it yourself. Easy to get slightly wrong. E.g. the code at the first (and second) result returned when I googled - quicksort python is correct (in that it properly sorts) but a poor implementation because it chooses pivot badly (try it on an already sorted list of, say, 10000 or more elements, or even a list containing many many identical elements!?) qsbad.py Good standard pivot choice median of three choose as pivot the median value among first, middle, and last elements in the given list. E.g. for [15, 4, 2, 99, 6, 3, 25, 26, 8], choose median of 15, 6, 8, which is 8 In this case, good 4, 2, 3, 6 will be in less-than partition, 15, 99, 25, 26 will be in greater-than partition.

Many visualizations of sorting algorithms on the web: http://www.sorting-algorithms.com, http://sorting.at, https://www.cs.usfca.edu/~galles/visualization/compari sonsort.html https://www.youtube.com/watch?v=kpra0w1kecg https://www.youtube.com/watch?v=roalu379l3u (dance group demonstrating sorting algorithms )

Next time Finish sorting Brief discussion of greedy algorthms and optimization before starting graphs/graph algorithms