COMP 250 Fall recurrences 2 Oct. 13, 2017

Similar documents
COMP 250 Fall binary trees Oct. 27, 2017

COMP 250 Fall priority queues, heaps 1 Nov. 9, 2018

Binary Trees

Trees : Part 1. Section 4.1. Theory and Terminology. A Tree? A Tree? Theory and Terminology. Theory and Terminology

COMP 250 Fall heaps 2 Nov. 3, 2017

Sorting. Sorting in Arrays. SelectionSort. SelectionSort. Binary search works great, but how do we create a sorted array in the first place?

COMP Data Structures

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Divide and Conquer

CS61B Lecture #20: Trees. Last modified: Mon Oct 8 21:21: CS61B: Lecture #20 1

Reading for this lecture (Goodrich and Tamassia):

COMP 250 Fall graph traversal Nov. 15/16, 2017

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019

Algorithms and Data Structures (INF1) Lecture 7/15 Hua Lu

Computer Science 210 Data Structures Siena College Fall Topic Notes: Trees

Programming II (CS300)

II (Sorting and) Order Statistics

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs

Sorting and Selection

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

CS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

Solution to CSE 250 Final Exam

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

INF2220: algorithms and data structures Series 1

Basic Data Structures (Version 7) Name:

CS301 - Data Structures Glossary By

Binary Search Trees Treesort

CSE373: Data Structure & Algorithms Lecture 18: Comparison Sorting. Dan Grossman Fall 2013

Principles of Algorithm Design

Binary Trees, Binary Search Trees

Divide-and-Conquer. Dr. Yingwu Zhu

CSCI2100B Data Structures Trees

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree

Lecture 15 : Review DRAFT

Discussion 2C Notes (Week 8, February 25) TA: Brian Choi Section Webpage:

Intro to Algorithms. Professor Kevin Gold

ECE250: Algorithms and Data Structures Midterm Review

CSE 250 Final Exam. Fall 2013 Time: 3 hours. Dec 11, No electronic devices of any kind. You can open your textbook and notes

Algorithms in Systems Engineering ISE 172. Lecture 16. Dr. Ted Ralphs

Trees. (Trees) Data Structures and Programming Spring / 28

CSC148 Week 6. Larry Zhang

Selection (deterministic & randomized): finding the median in linear time

Design and Analysis of Algorithms Lecture- 9: B- Trees

lecture notes September 2, How to sort?

Binary Trees. Height 1

Programming II (CS300)

Lecture 26. Introduction to Trees. Trees

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Week 02 Module 06 Lecture - 14 Merge Sort: Analysis

CSC Design and Analysis of Algorithms. Lecture 6. Divide and Conquer Algorithm Design Technique. Divide-and-Conquer

Trees. Eric McCreath

CSC148H Week 8. Sadia Sharmin. July 12, /29

Mergesort again. 1. Split the list into two equal parts

CSE 373: Data Structures and Algorithms

Exercise 1 : B-Trees [ =17pts]

Run Times. Efficiency Issues. Run Times cont d. More on O( ) notation

CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting. Aaron Bauer Winter 2014

Multi-way Search Trees! M-Way Search! M-Way Search Trees Representation!

Data Structures Question Bank Multiple Choice

Insertion Sort: an algorithm for sorting an array

COMP 250 Winter generic types, doubly linked lists Jan. 28, 2016

CS61B Lecture #20: Trees. Last modified: Wed Oct 12 12:49: CS61B: Lecture #20 1

Motivation for B-Trees

Quicksort (Weiss chapter 8.6)

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard

Analysis of Algorithms. Unit 4 - Analysis of well known Algorithms

CSE 332 Autumn 2013: Midterm Exam (closed book, closed notes, no calculators)

Questions. 6. Suppose we were to define a hash code on strings s by:

Trees. CSE 373 Data Structures

Lec 17 April 8. Topics: binary Trees expression trees. (Chapter 5 of text)

COMP 250 Fall Homework #4

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi

CSC Design and Analysis of Algorithms

Lower Bound on Comparison-based Sorting

Computer Science E-22 Practice Final Exam

Quick Sort. CSE Data Structures May 15, 2002

Lecture 19 Sorting Goodrich, Tamassia

COSC 2011 Section N. Trees: Terminology and Basic Properties

The Limits of Sorting Divide-and-Conquer Comparison Sorts II

7. Sorting I. 7.1 Simple Sorting. Problem. Algorithm: IsSorted(A) 1 i j n. Simple Sorting

Hash Tables. CS 311 Data Structures and Algorithms Lecture Slides. Wednesday, April 22, Glenn G. Chappell

Parallel and Sequential Data Structures and Algorithms Lecture (Spring 2012) Lecture 16 Treaps; Augmented BSTs

Faculty of Science FINAL EXAMINATION

Lecture Notes for Advanced Algorithms

For searching and sorting algorithms, this is particularly dependent on the number of data elements.

Why Do We Need Trees?

Largest Online Community of VU Students

Algorithm Efficiency & Sorting. Algorithm efficiency Big-O notation Searching algorithms Sorting algorithms

COMP 250 Fall Solution - Homework #4

Data Structures and Algorithms

TREES. Trees - Introduction

Sorting Pearson Education, Inc. All rights reserved.

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time

Operations on Heap Tree The major operations required to be performed on a heap tree are Insertion, Deletion, and Merging.

Sorting. CPSC 259: Data Structures and Algorithms for Electrical Engineers. Hassan Khosravi

CSE 373 APRIL 17 TH TREE BALANCE AND AVL

Trees! Ellen Walker! CPSC 201 Data Structures! Hiram College!

Programming II (CS300)

CSE 373 OCTOBER 11 TH TRAVERSALS AND AVL

Transcription:

COMP 250 Fall 2017 15 - recurrences 2 Oct. 13, 2017 Here we examine the recurrences for mergesort and quicksort. Mergesort Recall the mergesort algorithm: we divide the list of things to be sorted into two approximately equal size sublists, sort each of them, and then merge the result. Merging two sorted lists of size n 2 takes time proportional to n, since the merging requires iterating through the elements of each list. If n is even, then there are n + n = n elements in the two lists. If n is odd then one of the lists has 2 2 size one greater than the other, but there are still n steps to the merge. Let s assume that n is a power of 2. This keeps the math simpler since we don t have to deal with the case that the two sublists are not exactly the same length. In this case, the recurrence relation for mergesort is: t(n) = cn + 2t( n 2 ). [ASIDE: If we were to consider a general n, then the correct way to write the recurrence would be: t(n) = t( n 2 ) + t( n 2 ) + cn where the n means floor(n/2) and n means ceiling(n/2), or round down and round up, 2 2 respectively. That is, rather than treating n/2 as an integer division and ignoring the remainder (rounding down always), we would be treating it as n/2.0 and either rounding up or down. In the lecture, I gave the example of n = 13 so n = 6 and n = 7. In COMP 250, we don t concern 2 2 ourselves with this level of detail since nothing particualrly interesting happens in the general case, and we would just be getting bogged down with notation. The interesting aspect of mergesort is most simply expressed by considering the case that n is a power of 2. ] Let s solve the mergesort recurrent using backwards substitution: t(n) = c n + 2 t( n 2 ) = c n + 2 ( c n 2 + 2t(n 4 )) = c n + c n + 4 t( n 4 ) = c n + c n + 4 (c n 4 + 2 t( n 8 )) = c n + c n + c n + 8 t( n 8 ) = c n k + 2 k t( n 2 k ) = c n log 2 n + n t(1), when n = 2 k which is O(n log 2 n) since the dominant term that depends on n is n log 2 n. What is the intuition for why mergesort is O(n log 2 n)? Think of the merge phases. The list of size n is ultimately partitioned down into n lists of size 1. If n is a power of 2, then these n lists are merged into n 2 lists of size 2, which are merged into n 4 lists of size 4, etc. So there are log 2 n levels of merging, and each require O(n) work. Hence, Notice that the bottleneck of the algorithm is the merging part, which takes a total of cn operations at each level of the recursion. There are log n levels. last updated: 25 th Oct, 2017 at 13:41 1 c Michael Langer

COMP 250 Fall 2017 15 - recurrences 2 Oct. 13, 2017 On the base case of mergesort In the mergesort algorithm last week, we had a base case of n = 1. What if we had stopped the recursion at a larger base case? For example, suppose that when the list size has been reduced to 4 or less, we switch to running bubble sort instead of mergesort. Since bubble sort is O(n 2 ), one might ask whether this would cause the mergesort algorithm to increase from O(n log 2 n) to O(n 2 ). Let s solve the recurrence for mergesort by assuming t(n) = 2t( n 2 ) + c 1n when n > 4 and but that some other t(n) holds for n 4. Assume n is a power of 2 (to simplify the argument). We want to stop the backsubstitution at t(4) on the right side. So we let k be such that n 2 k = 4, that is, 2 k = n 4. t(n) = c n + 2 t( n 2 ) =... = c n k + 2 k t( n 2 ), and letting k 2k = n 4 gives... = c n(log 2 n 2) + n 4 t(4) = c n log 2 n 2cn + n 4 t(4) which is still O(n log 2 n) since the dominant term that depends on n is n log 2 n. The subtle part part of this problem is that you may be thinking that if we are doing say bubblesort when n 4 then we need to somehow put an n 2 dependence in somehwhere. But you don t need to do this! Since we are only switching to bubblesort when n 4, the term t(4) is a constant. This is the time it takes to solve a problem of size 4. Note: while this might seem like a toy problem, it makes an important point. Sometimes when we write recursive methods, we find that the base case can be tricky to encode. If there is a slower method available that can be used for small instances of the problem, and this slower method is easy to encode, then use it! Quicksort Let s now turn to the recurrence for quicksort. Recall the main idea of quicksort. We remove some element called the pivot, and then partition the remaining elements based on whether they are smaller than or greater than the pivot, recursively quicksort these two lists, and then concatentate the two, putting the pivot in between. In the best case, the partition produces two roughly equal sized lists. This is the best case because then one only needs about log n levels of the recursion and approximately the same recurrence as mergesort can be written and solved. What about the worst case performance? If the element chosen as the pivot happens to be smaller then all the elements in the list, or larger then all the elements in the list, then the two lists are of size 0 and n 1. If this poor splitting happens at every level of the recursion, then performance degenerates to that of the O(n 2 ) sorting algorithms we saw earlier, namely the recurrence becomes t(n) = cn + t(n 1). last updated: 25 th Oct, 2017 at 13:41 2 c Michael Langer

COMP 250 Fall 2017 15 - recurrences 2 Oct. 13, 2017 Solving this by backstitution (see last lecture) gives n(n + 1) t(n) = c + t(0)n 2 which is O(n 2 ). Why is quicksort called quick when its worst case is O(n 2 )? In particular, it would seem that mergesort would be quicker since mergesort is O(n log n), regardless of best or worst case. There are two basic reasons why quicksort is quick. One reason is that the first case is easy to avoid in practice. For example, if one is a bit more clever about choosing the pivot, then one can make the worst case situation happen with very low probability One idea for choosing a good pivot is to examine three particular elements in the list: the first element, the middle element, and the last element (list[0], list[mid], list[size-1]. For the pivot, one sorts these three elements and takes the middle value (the median) as the pivot. The idea is that it is very unlikely for all three of these elements to be among the three smallest (or three largest). In particlar, if the list happens to be close to sorted (or sorted in the wrong direction) then the median of three will tend to be close the median of the entire list. Note that the best split occurs if we take the pivot to be the median of the whole list. In practice, such a simple idea works very well, and partitions have close to even size. The second reason that quicksort is quick is that, if one uses an array list to represent the list, then it is possible to do the partition in place, that is, without using extra space. One needs to be a bit clever to set this up, but it is straightforward to implement. I ve decided not to cover the details this year, but if you feel like you are missing out, then do check it out: https: //en.wikipedia.org/wiki/quicksort The in place property of quicksort is a big advantage, since it reduces the number of copies one needs to do. By contrast, the straightforward implementation of mergesort requires that we use a second array and merge the elements into it. This leads to lots of copying, which tends to be slow. There are clever ways to implement mergesort which make it run faster, but the details are way beyond the scope of the course. Besides, experimental results have shown that, in practice, quicksort tends to be faster than mergesort even if the in place mergesort method is used. Quicksort truly is very quick (if done in place). Exercise In the lecture, I asked you to solve this one: t(n) = t( n 2 ) + n This is similar to binary search, but now we have to do n operations during each call. What do you predict? Is this O(log 2 n) or O(n) or O(n 2 ) or what? Here is the solution: last updated: 25 th Oct, 2017 at 13:41 3 c Michael Langer

COMP 250 Fall 2017 15 - recurrences 2 Oct. 13, 2017 t(n) = n + t( n 2 ) = n + n 2 + t(n 4 ) = n + n 2 + n 4 + t(n 8 ) = n + n 2 + n 4 +... n 2 k 1 + t( n 2 k ) = n + n 2 + n 4 + + 4 + 2 + t(1), when 2k = n = n + n 2 + n 4 + + 4 + 2 + 1 1 + t(1) log 2 n = n 2 i + t(1) 1 i=0 = (2 (log 2 n)+1 1)/(2 1) + t(1) 1 = 2n 1 + t(1) 1 last updated: 25 th Oct, 2017 at 13:41 4 c Michael Langer

COMP 250 Fall 2017 20 - tree traversal Oct. 25/26, 2017 Tree traversal Often we wish to iterate through or traverse all the nodes of the tree. We generally use the term tree traversal for this. There are two aspects to traversing a tree. One is that we need to follow references from parent to child, or child to its sibling. The second is that we may need to do something at each node. I will use the term visit for the latter. Visiting a node means doing some computation at that node. We will see some examples later. Depth first traversal The first two traversals that we consider are called depth first. In these traversals, a node and all its descendents are visited before the next sibling is visited. There are two ways to do depthfirst-traversal of a tree, depending on whether you visit a node before its descendents or after its descendents. In a pre-order traversal, you visit a node, and then visit all its children. In a post-order traversal, you visit the children of the node (and their children, recursively) and then visit the node. depthfirst_preorder(root){ if (root is not empty){ visit root for each child of root depthfirst_preorder(child) An example is illustrated in the lectures slides. Suppose we have a file system. The directories and files define a tree whose internal nodes are directories and whose leaves are either empty directories or files. We first wish to print out the root directories, namely list the subdirectories and files in the root directory. For each subdirectory, we also print its subdirectory and files, and so on. This is all done using a pre-order traversal. The visit would be the print statement. Here is a example of what the output of the print might look like. (This is similar to what you get on Windows when browsing files in the Folders panel.) My Documents My Music Raffi Shake My Sillies Out (file) Baby Beluga (file) Eminem Lose Yourself (file) My Videos : (file) Work COMP250 : last updated: 25 th Oct, 2017 at 13:41 5 c Michael Langer

COMP 250 Fall 2017 20 - tree traversal Oct. 25/26, 2017 For a postorder traversal, one visits a node after having visited all the children of the node. depthfirst_postorder(root){ if (root is not empty){ for each child of root depthfirst_postorder(child){ visit root Let s look at an example of post-order traversal. Suppose we want to calculate how many bytes are stored in all the files within some directory including all its sub-directories. This is post-order because in order to know the total bytes in some directory we first need to know the total number of bytes in all the subdirectories. Hence, we need to visit the subdirectories first. Here is an algorithm for computing the number of bytes. It traverses the tree in postorder in the sense that it computes the sum of bytes in each subdirectory by summing the bytes at each child node of that directory. (Frankly, in this example it is a big vague what I mean by visit, since computing sum doesn t just happen after visiting the children but rather involves steps that occur before, during, and after visiting the children. The way I think of this is that if we were to store sum as a field in the node, then we could only do this after visiting the children.) numbytes(root){ if root is a leaf return number of bytes at root else{ sum = 0 // local variable for each child of root{ sum += numbytes(child) return sum Depth first traversal without recursion As we have discussed already in this course, recursive algorithms are implemented using a call stack which keep track of information needed in each call. (Recall the stack lecture.) You can sometimes avoid recursion by using an explicit stack instead. Here is an algorithm for doing a depth first traversal which uses a stack rather than recursion. As you can see by running an example (see lecture slides), this algorithm visits the list of children of a node in the opposite order to that defined by the for loop. Is this algorithm preorder or postorder? The visit cur statement occurs prior to the for loop. You might think this automatically makes it preorder. However, the situation is more subtle than that. If we were to move the visit cur statement to be after the for loop, the order of the visits of the nodes would be the same. Moreover, nodes would still be visited before their children. So, this would still be a preorder traversal. last updated: 25 th Oct, 2017 at 13:41 6 c Michael Langer

COMP 250 Fall 2017 20 - tree traversal Oct. 25/26, 2017 treetraversalusingstack(root){ s.push(root) while!s.isempty(){ cur = s.pop() visit cur for each child of cur s.push(child) // visit cur could be put here instead Breadth first traversal What happens if we use a queue instead of a stack in the previous algorithm? treetraversalusingqueue(root){ q = empty queue q.enqueue(root) while!q.isempty() { cur = q.dequeue() visit cur for each child of cur q.enqueue(child) As shown in the example in the lecture, this algorithm visits all the nodes at each depth, before proceeding to the next depth. This is called breadth first traversal. The queue-based algorithm effectively does the following: for i = 0 to height visit all nodes at level i You should work through the example in the slides to make sure you understand why using queue here is diffrent from using a stack. A note about implementation Recall first-child/next-sibling data structure for representing a tree, which we saw last lecture. Using this implementation, you can replace the line for each child of cur... with the following child = child.firstchild while (child!= null){... // maybe do something at that child child = child.nextsibling last updated: 25 th Oct, 2017 at 13:41 7 c Michael Langer

COMP 250 Fall 2017 20 - tree traversal Oct. 25/26, 2017 Here we are iterating through a (singly linked) list of children. last updated: 25 th Oct, 2017 at 13:41 8 c Michael Langer