Computer Science 210 Data Structures Siena College Fall Topic Notes: Searching and Sorting

Similar documents
Computer Science 252 Problem Solving with Java The College of Saint Rose Spring Topic Notes: Searching and Sorting

What did we talk about last time? Finished hunters and prey Class variables Constants Class constants Started Big Oh notation

(Refer Slide Time: 01.26)

Week 12: Running Time and Performance

Lecture 7: Searching and Sorting Algorithms

Sorting is ordering a list of objects. Here are some sorting algorithms

CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting. Aaron Bauer Winter 2014

Outline: Search and Recursion (Ch13)

Computer Science 210 Data Structures Siena College Fall Topic Notes: Complexity and Asymptotic Analysis

Lecture. Algorithm Design and Recursion. Richard E Sarkis CSC 161: The Art of Programming

Selection (deterministic & randomized): finding the median in linear time

Measuring algorithm efficiency

Algorithm Design and Recursion. Search and Sort Algorithms

Hi everyone. Starting this week I'm going to make a couple tweaks to how section is run. The first thing is that I'm going to go over all the slides

CS 112 Introduction to Computing II. Wayne Snyder Computer Science Department Boston University

Principles of Algorithm Design

COMP 161 Lecture Notes 16 Analyzing Search and Sort

Intro to Algorithms. Professor Kevin Gold

Algorithms and Data Structures

CS125 : Introduction to Computer Science. Lecture Notes #38 and #39 Quicksort. c 2005, 2003, 2002, 2000 Jason Zych

MPATE-GE 2618: C Programming for Music Technology. Unit 4.2

Algorithm Analysis. College of Computing & Information Technology King Abdulaziz University. CPCS-204 Data Structures I

(Refer Slide Time: 1:27)

Sorting. Sorting in Arrays. SelectionSort. SelectionSort. Binary search works great, but how do we create a sorted array in the first place?

Week - 04 Lecture - 01 Merge Sort. (Refer Slide Time: 00:02)

CS 261 Data Structures. Big-Oh Analysis: A Review

Divisibility Rules and Their Explanations

CMPSCI 187: Programming With Data Structures. Lecture #34: Efficient Sorting Algorithms David Mix Barrington 3 December 2012

10/5/2016. Comparing Algorithms. Analyzing Code ( worst case ) Example. Analyzing Code. Binary Search. Linear Search

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Divide and Conquer

CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019

MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: April 1, 2015

About this exam review

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute

Admin. How's the project coming? After these slides, read chapter 13 in your book. Quizzes will return

COE428 Lecture Notes Week 1 (Week of January 9, 2017)

CSE 373 MAY 24 TH ANALYSIS AND NON- COMPARISON SORTING

CS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017

Repetition Through Recursion

Computer Science 210 Data Structures Siena College Fall Topic Notes: Priority Queues and Heaps

Searching for Information. A Simple Method for Searching. Simple Searching. Class #21: Searching/Sorting I

CSE373: Data Structure & Algorithms Lecture 18: Comparison Sorting. Dan Grossman Fall 2013

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

CSE373: Data Structures and Algorithms Lecture 4: Asymptotic Analysis. Aaron Bauer Winter 2014

What is an algorithm?

Lecture Notes on Quicksort

Computer Science 210 Data Structures Siena College Fall 2018

Sorting & Growth of Functions

The Limits of Sorting Divide-and-Conquer Comparison Sorts II

COS 226 Midterm Exam, Spring 2009

Lecture Notes on Quicksort

CSE373 Fall 2013, Final Examination December 10, 2013 Please do not turn the page until the bell rings.

Sorting. Order in the court! sorting 1

Sorting and Selection

(Refer Slide Time: 00:50)

Module 08: Searching and Sorting Algorithms

Sorting. Order in the court! sorting 1

CMPSCI 187: Programming With Data Structures. Lecture 5: Analysis of Algorithms Overview 16 September 2011

COSC 311: ALGORITHMS HW1: SORTING

KF5008 Algorithm Efficiency; Sorting and Searching Algorithms;

5.5 Completing the Square for the Vertex

Lecture 7 Quicksort : Principles of Imperative Computation (Spring 2018) Frank Pfenning

08 A: Sorting III. CS1102S: Data Structures and Algorithms. Martin Henz. March 10, Generated on Tuesday 9 th March, 2010, 09:58

Sorting. Sorting. Stable Sorting. In-place Sort. Bubble Sort. Bubble Sort. Selection (Tournament) Heapsort (Smoothsort) Mergesort Quicksort Bogosort

Range Queries. Kuba Karpierz, Bruno Vacherot. March 4, 2016

Intro. Scheme Basics. scm> 5 5. scm>

CSE373 Fall 2013, Final Examination December 10, 2013 Please do not turn the page until the bell rings.

Lecture 8: Mergesort / Quicksort Steven Skiena

Algorithm Analysis. Big Oh

Computer Science 4U Unit 1. Programming Concepts and Skills Algorithms

CS61B, Spring 2003 Discussion #15 Amir Kamil UC Berkeley 4/28/03

Introduction to Computers and Programming. Today

Sorting. Task Description. Selection Sort. Should we worry about speed?

Lecture Notes 14 More sorting CSS Data Structures and Object-Oriented Programming Professor Clark F. Olson

Lecture 15 : Review DRAFT

Chapter 7 Sorting. Terminology. Selection Sort

Sorting L7.2 Recall linear search for an element in an array, which is O(n). The divide-and-conquer technique of binary search divides the array in ha

S O R T I N G Sorting a list of elements implemented as an array. In all algorithms of this handout the sorting of elements is in ascending order

CS 171: Introduction to Computer Science II. Quicksort

Sorting: Quick Sort. College of Computing & Information Technology King Abdulaziz University. CPCS-204 Data Structures I

Searching and Sorting (Savitch, Chapter 7.4)

Table of Laplace Transforms

Faster Sorting Methods

Sorting is a problem for which we can prove a non-trivial lower bound.

Algorithm Efficiency & Sorting. Algorithm efficiency Big-O notation Searching algorithms Sorting algorithms

Topic 14 Searching and Simple Sorts

Lesson 12: Recursion, Complexity, Searching and Sorting. Modifications By Mr. Dave Clausen Updated for Java 1_5

University of the Western Cape Department of Computer Science

Chapter Fourteen Bonus Lessons: Algorithms and Efficiency

Variables and Data Representation

Algorithms and Data Structures (INF1) Lecture 7/15 Hua Lu

CS125 : Introduction to Computer Science. Lecture Notes #40 Advanced Sorting Analysis. c 2005, 2004 Jason Zych

Math 340 Fall 2014, Victor Matveev. Binary system, round-off errors, loss of significance, and double precision accuracy.

Definition: A data structure is a way of organizing data in a computer so that it can be used efficiently.

Analysis of Algorithms. Unit 4 - Analysis of well known Algorithms

CS1 Lecture 22 Mar. 6, 2019

Mergesort again. 1. Split the list into two equal parts

CSE 143. Two important problems. Searching and Sorting. Review: Linear Search. Review: Binary Search. Example. How Efficient Is Linear Search?

Sorting. CPSC 259: Data Structures and Algorithms for Electrical Engineers. Hassan Khosravi

CS 206 Introduction to Computer Science II

Transcription:

Computer Science 10 Data Structures Siena College Fall 016 Topic Notes: Searching and Sorting Searching We all know what searching is looking for something. In a computer program, the search could be: Looking in a collection of values for some specific value (e.g., where is the 17 in this array ofint?). Looking for a value with a specific property (e.g., which object in a graphics window contains the location where I clicked the mouse?). Looking for a record in a database (e.g., what is the tax history for the last four years for the taxpayer with SSN 101-11-1001?). Searching for text in some document or collection of documents (what web pages contain the text searching and sorting algorithms? ). What known amino acid sequences best match this sequence gathered from proteins in a given virus? We have done some searching this semester. Remember the test to see which spell was specified in the SpellsArray example. spellnum = -1; for (int spellindex = 0; spellindex < spells.length; spellindex++) { if (spellname.equals(spells[spellindex].getkey())) { spellnum = spellindex; break; if (spellnum >= 0) { System.out.println(spells[spellnum].getValue()); else { System.out.println("Your wand doesn t know that one. It explodes. Bye!

We have to search through our collection of objects (Associations) to see which one, if any, contains the matching key. How do we know that we re done searching? In this case, we need only search until we find the first matching entry. But in many cases, we keep looking until we get to the end of our array. Let s consider how much work it takes for us to get an answer. As a rough estimate of work, we will count how many times we call theequals method of astring to compare the key. If we havenassociationss in the array, how many calls to thestringequals method will we have to make before we know the answer? It depends on how quickly we find the answer. If none of theassociations contains the matching key at all, we need to check all n before we know the answer. If one does contain a match for the key, we can stop as soon as we find the first one that matches it (thebreak statement in the loop achieves this). It might be the first, it might be the last we just don t know. Assuming that there s an equal probability that the Association that contains the matching key is at any of the n positions, we have to compare,associations, on average, n times. In this case, we can t do any better. Perhaps if we decided to check in some other order rather than always examining the first, then the second, and so on. We are searching in an array, where we have the option to look at any element directly. We will consider an array ofint, though most of what we discuss applies to a wider range of searchable items. A method to do this: /* * Search for num in array. Return the index of the number, or * -1 if it is not found. */ int getindexofnum(int[] array, int num) { for (int index = 0; index < array.length; index++) { if (array[index] == num) { return index; return -1; The procedure here is a lot like the searches we have seen. We have no way of knowing that we re done until we either find the number we re looking for, or until we get to the end of the array. So again, if the array contains n numbers, we have to examine all n in an unsuccessful search, and, on average, n for a successful search. We could instead search from the end to the front, and we would have no reason to believe that we d do any better or worse, on average. Now, suppose the array has been sorted in ascending order.

We could still do the same type of search start at the beginning and keep looking for the number. In the case of a successful search, we still stop when we find it. But now, we can also determine that a search is unsuccessful as soon as we encouter any number larger than our search number. Assuming that our search number is, on average, is going to be found near the median value of the array, our unsuccessful search is now going to require that we examine, on average, n items. This sounds great, but in fact is not a really significant gain, as we will see. These are all examples of a linear search we examine items one at a time in some linear order until we find the search item or until we can determine that we will not find it. To understand a better way to search for a number, think back to the children s number guessing game. I pick a number between 1 and 100 and you have to guess what it is. The game usually goes something like this: Me: Guess my number. You: 50. Me: Too High. You: 5. Me: Too Low. You 37. Me: Too High. You 31. Me: That s right. Why? The person trying to guess the number is trying to narrow down the possible range of values as quickly as possible. By guessing in the middle at each step, half of the range of values can be eliminated. This idea leads directly to a better way to search. If you know that there is an order where do you start your search? In the middle, since then even if you don t find it, you can look at the value you found and see if the search item is smaller or larger. From that, you can decide to look only in the bottom half of the array or in the top half of the array. You could then do a linear search on the appropriate half or better yet repeat the procedure and cut the half in half, and so on. This is a binary search. It is an example of a divide and conquer algorithm, because at each step, it divides the problem in half. A Java method to do this: /* * Binary Search for num in array. */ int getindexofnum(int[] array, int num) { int mid; int left = 0; int right = array.length - 1; while (left < right) { mid = (low + high) / ; 3

if (array[mid] == num) { // num is same as middle number return mid; else if (num < array[mid]) { // num is smaller than middle number right = mid - 1; else { // num is larger than middle number left = mid + 1; return -1; This algorithm is expressed very naturally in recursive form, which we will see in an upcoming example. How many steps are needed for this? Each time, we cut the part of the array we still need to search in half. How many times can divide number in half before you get to 1? If you start with n, you divide to get n then n, n,... and eventually get 1. 4 8 Let s suppose that n = k, then divide to k 1, k, k 3,..., 0 = 1; divide k times by. In general, we can divide n by at mostlog n times to get down to 1. So how much better is this, really? In the case of a small array, the difference is not really significant. But as the size grows... Maximum Number of Comparisons array size 10 100 1000 1,000,000 linear search 10 100 1000 1,000,000 binary search 8 14 0 40 That s pretty huge. Even if you think about the search really needing on average n steps, for the 1000-element case, the binary search is still winning 500 to 0. The logarithmic factor is really important. We can see this better by looking at graphs of n vs.logn and n. The difference is large, and gets larger and larger asngets larger. Even if we multiply by constant factors in an attempt to make the logn graph as large as the n graph, there will always be a value of n large enough that the scaled function for n will be larger than the scaled function for logn. More on this later. 4

See Example: BinSearch Comparable Objects If we are going to deal with Objects rather than primitive types for a binary search, we need a way to compare them. We need a more general analog of the equals method. We can write a method that compares an Object to another, like the compareto() method of Strings. However, there is nocompareto method inobject. Fortunately, Java provides an interface that does exactly this, the Comparable interface. Any object that implementscomparable will have acompareto method, so if we write our search (and next up, sorting) routines to operate oncomparables, we will be all set. We return to our previous example and consider the generic binary search methods. See Example: BinSearch Note the weird syntax in the headers of the generic methods: public static <T extends Comparable> int search (T[] a, T val) Unlike when we considered generic classes likegenericpair andvector, this program does not specify a generic type for the class, just for the methods that use the generic type. The<T extends Comparable> means that any class can be used for the type of the array and search element, as long as the array was declared and constructed as some type that implements thecomparable interface. Several standard Java classes implement the Comparable interface, including things like Integer, Double, andstring. So we can write methods that expect objects that extend Comparable, and be guaranteed that an appropriatecompareto method will be provided. Sorting We ll now look at sorting. One of the many motivations for sorting data is that we need sorted data to be able to use a binary search. As we will see, sorting is a fairly expensive process, but if we are going to be searching a large array a lot, the savings obtained by using binary search over linear will more than make up for the cost of sorting the array once. Suppose that our goal is to take a shuffled deck of cards and to sort it in ascending order. We ll ignore suits, so there is a four-way tie at each rank. Describing a sorting algorithm precisely can be difficult, but that precision is necessary if we re going to write code to do sorting. We will consider a few algorithms: 5

selection sort insertion sort merge sort Selection Sort Let s think about the first procedure as it would work to sort a shuffled stack of 100 cards with unique numbers in ascending order. Search for the smallest card among all 100, and move it to the front of the deck. Next, ignoring the that smallest card in the first position, search for the next smallest card among the 99 remaining cards, and move it to the second position in the deck. Next, ignoring the those two smallest cards in the first and second positions, search for the next smallest card among the 98 remaining, and move it to the third position in the deck.... Next, ignoring the those 98 smallest cards in the first 98 positions, search for the next smallest card of the remaining cards, and move it to the 99th position in the deck. The remaining card goes into the 100th position (it s the largest). The procedure described is a form of a selection sort at each step, we select the item that goes into the next position of the array, and put it there. This gets us one step closer to a solution. public void selectionsort(int[] array) { for (int i = 0; i < array.length - 1; i++) { int smallestpos = i; for (int j = i+1; j < array.length; j++) { if (array[j] < array[smallestpos]) { smallestpos = j; int temp = array[smallestpos]; array[smallestpos] = array[i]; array[i] = temp; 6

How long does this algorithm take? As we did with searching, we won t try to calculate an exact time, but we will estimate the cost by comparison counting determining how many comparisons need to be done in sorting an array. For an array with n > 1 elements, it takes n 1 comparisons to find the smallest element of the array (compare the first with the second, the largest of those with the third, etc.). In general, the number of comparisons needed to find the smallest element is one less than the number of elements to be sorted. Once this element has been put into the first slot of the array, we need to sort the remaining n 1 elements of the array. By the argument above, it takes n comparisons to find the largest of these. We continue with successive stages takingn 3,n 4, all the way down to the last pass through when there are only two elements and it takes only 1 comparison. (Once we get down to 1 element there is nothing to be done.) Thus it takes S = (n 1)+(n )+(n 3)+...+3++1 comparisons to sort an array of n elements. In case you aren t already familiar with the summation formula that can give us that value, let s see how we can compute this sum by writing the list forwards and backwards, and then adding the columns: S = (n-1) + (n-) + (n-3) +... + 3 + + 1 S = 1 + + 3 +... + (n-3) + (n-) + (n-1) ------------------------------------------------------- S = n + n + n +... + n + n + n = (n-1)*n Therefore, as you might already have known, S = n n. The graph of this as n increases looks liken a parabola. Therefore, selection sort takeso(n ) time, which is much worse than the cost of the searching algorithms we saw earlier. A recursive implementation of selection sort, but where each pass finds the largest element and places it at the end of the unsorted portion of the array, is found in this example: See Example: SortingComparisons Insertion Sort The selection sort builds up the sorted array by finding the smallest (or largest) element and putting it into the first (or last) position, then the second smallest and putting it into the second (or second to last) position, etc., until the entire array is sorted. Insertion sort takes a different approach. It builds up a sorted array using the fact that we can build a sorted array of size n+1 by taking a sorted array of size n and inserting the n+1 st element in its correct position. We will not look at this algorithm in great detail here. Like selection sort, insertion sort takes O(n ) time in most cases, though it has a best case behavior ofo(n) when the input array is sorted or nearly sorted. An example recursive implementation is in the same example: See Example: SortingComparisons 7

Merge Sort Our next sorting algorithm, the merge sort, proceeds as follows: First, our base case: If the array contains 0 or 1 elements, there is nothing to do. It is already sorted. If the array has two or more elements in it, we will break it in half, sort the two halves, and then go through and merge the elements. A recursive implementation of this is also in the sorting comparisons example. See Example: SortingComparisons The method merge takes the sorted elements in array[low..middle] and array[middle+1..high] and merges then together using the arraytemp, and then copies them back intoarray. Again we d like to count the number of comparisons necessary in order to sort an array of n elements with merge sort. This is complicated by the fact that the mergesortrecursive method doesn t include any element comparisons at all all of the comparisons are in themerge method. Even without looking at the details of the code inmerge, we can estimate the number of comparisons made. If we are trying to merge two sorted arrays, every time we compare two elements at the ends of the arrays we will put one in its correct position. When we run out of the elements in one of the arrays, we put the remaining elements into the last slots of the sorted array. As a result, merging two arrays which have a total ofnelements requires at most n 1 comparisons. Suppose we start with an array of n elements. Let T(n) be a function telling us the number of comparisons necessary to perform merge sort on an array with n elements. As we noted above, we break the array in half, merge sort each half, and then merge the two (now sorted) pieces. Thus the total amount of comparisons needed are the number of comparisons to perform merge sort on each half plus the number of comparisons necessary to merge the two halves. By the remarks above, the number of comparisons to do the final merge is no more than n 1. Thus T(n) <= T(n/)+T(n/)+n 1. For simplicity we ll replace the n 1 comparisons for the merging by the even larger n in order to make it easier to see how to approximate this result. We have T(n) = T(n/)+n and if we find a function that satisfies that equation, then we have an upper bound on the number of comparisons made during a mergesort. Looking at our algorithm, no comparisons are necessary when the size of the array is 0 or 1. Thus T(0) = T(1) = 0. Let us see if we can solve this for small values ofn. Because we are constantly dividing the number of elements in half it will be most convenient to start with values of n which are a power of two. Here we list a table of values: 8

n T(n) 1 = 0 0 = 1 T(1)+ = = 1 4 = T()+4 = 8 = 4 8 = 3 T(4)+8 = 4 = 8 3 16 = 4 T(8)+16 = 64 = 16 4 3 = 5 T(16)+3 = 160 = 3 5...... n = k T( n )+n = n k Notice that ifn = k thenk = log n. ThusT(n) = n log n. In fact this works as an upper bound for the number of comparisons for merge sort even if n is not even. If we graph this we see that it grows much, much slower than the graph for a quadratic (for example, the one corresponding to the number of comparison for selection sort). This explains why, when we run the algorithms on larger values ofn, the time for merge sort is far smaller than that of selection sort. 9