Advanced Algorithms and Data Structures

Similar documents
Advanced Algorithms and Data Structures

Lecture 2: Getting Started

6/12/2013. Introduction to Algorithms (2 nd edition) Overview. The Sorting Problem. Chapter 2: Getting Started. by Cormen, Leiserson, Rivest & Stein

The divide-and-conquer paradigm involves three steps at each level of the recursion: Divide the problem into a number of subproblems.

II (Sorting and) Order Statistics

Introduction to Algorithms Third Edition

Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest. Introduction to Algorithms

Design and Analysis of Algorithms. Comp 271. Mordecai Golin. Department of Computer Science, HKUST

Decreasing a key FIB-HEAP-DECREASE-KEY(,, ) 3.. NIL. 2. error new key is greater than current key 6. CASCADING-CUT(, )

CS583 Lecture 01. Jana Kosecka. some materials here are based on Profs. E. Demaine, D. Luebke A.Shehu, J-M. Lien and Prof. Wang s past lecture notes

Worst-case running time for RANDOMIZED-SELECT

Approximation Algorithms

Data Structures and Algorithms

10/24/ Rotations. 2. // s left subtree s right subtree 3. if // link s parent to elseif == else 11. // put x on s left

V Advanced Data Structures

Sorting & Growth of Functions

CSCE 210/2201 Data Structures and Algorithms. Prof. Amr Goneid. Fall 2018

22 Elementary Graph Algorithms. There are two standard ways to represent a

V Advanced Data Structures

COMP251: Algorithms and Data Structures. Jérôme Waldispühl School of Computer Science McGill University

Approximation Algorithms

Lecture Notes for Chapter 2: Getting Started

22 Elementary Graph Algorithms. There are two standard ways to represent a

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

16 Greedy Algorithms

Virtual University of Pakistan

Algorithms and Data Structures

CSE548, AMS542: Analysis of Algorithms, Fall 2012 Date: October 16. In-Class Midterm. ( 11:35 AM 12:50 PM : 75 Minutes )

CSCE 210/2201 Data Structures and Algorithms. Prof. Amr Goneid

CSE 3101: Introduction to the Design and Analysis of Algorithms. Office hours: Wed 4-6 pm (CSEB 3043), or by appointment.

Problem Set 4 Solutions

Organisation. Assessment

Ph.D. Written Examination Syllabus

Ensures that no such path is more than twice as long as any other, so that the tree is approximately balanced

Jana Kosecka. Linear Time Sorting, Median, Order Statistics. Many slides here are based on E. Demaine, D. Luebke slides

Algorithm Analysis and Design

COE428 Lecture Notes Week 1 (Week of January 9, 2017)

9/24/ Hash functions

Sorting. Sorting in Arrays. SelectionSort. SelectionSort. Binary search works great, but how do we create a sorted array in the first place?

18.3 Deleting a key from a B-tree

Lecture 1. Introduction / Insertion Sort / Merge Sort

Lecture 5: Sorting Part A

We assume uniform hashing (UH):

Sorting. There exist sorting algorithms which have shown to be more efficient in practice.

Introduction to Algorithms 6.046J/18.401J/SMA5503

Data Structures and Algorithms

CS 445: Data Structures Final Examination: Study Guide

COMP Analysis of Algorithms & Data Structures

III Data Structures. Dynamic sets

We augment RBTs to support operations on dynamic sets of intervals A closed interval is an ordered pair of real

CSC 273 Data Structures

School of Computing and Information Sciences. Course Title: Data Structures Date: 3/30/2010 Course Number: COP 3530 Number of Credits: 3

CSCE 321/3201 Analysis and Design of Algorithms. Prof. Amr Goneid. Fall 2016

Searching Algorithms/Time Analysis

15.4 Longest common subsequence

Lecture 3. Recurrences / Heapsort

( ) 1 B. 1. Suppose f x

( ) ( ) C. " 1 n. ( ) $ f n. ( ) B. " log( n! ) ( ) and that you already know ( ) ( ) " % g( n) ( ) " #&

Analysis of Algorithms - Introduction -

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

17/05/2018. Outline. Outline. Divide and Conquer. Control Abstraction for Divide &Conquer. Outline. Module 2: Divide and Conquer

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

9/10/2018 Algorithms & Data Structures Analysis of Algorithms. Siyuan Jiang, Sept

Using Templates to Introduce Time Efficiency Analysis in an Algorithms Course

n 2 ( ) ( ) + n is in Θ n logn

Multiple Choice. Write your answer to the LEFT of each problem. 3 points each

Basic Data Structures (Version 7) Name:

Partha Sarathi Mandal

Sankalchand Patel College of Engineering - Visnagar Department of Computer Engineering and Information Technology. Assignment

ECE250: Algorithms and Data Structures Midterm Review

CSC Design and Analysis of Algorithms

CSci 231 Final Review

ECE250: Algorithms and Data Structures Final Review Course

CSE 373: Data Structures and Algorithms

Algorithms Interview Questions

Let the dynamic table support the operations TABLE-INSERT and TABLE-DELETE It is convenient to use the load factor ( )

COS 226 Algorithms and Data Structures Fall Midterm

Greedy Algorithms 1 {K(S) K(S) C} For large values of d, brute force search is not feasible because there are 2 d {1,..., d}.

Data Structures and Algorithms CMPSC 465

Algorithms and Data Structures, or

( ) + n. ( ) = n "1) + n. ( ) = T n 2. ( ) = 2T n 2. ( ) = T( n 2 ) +1

Chapter 8 Sorting in Linear Time

Introduction to Algorithms October 12, 2005 Massachusetts Institute of Technology Professors Erik D. Demaine and Charles E. Leiserson Quiz 1.

n 2 C. Θ n ( ) Ο f ( n) B. n 2 Ω( n logn)

CS302 Topic: Algorithm Analysis. Thursday, Sept. 22, 2005

Solutions to Exam Data structures (X and NV)

Algorithms and Data Structures (INF1) Lecture 7/15 Hua Lu

Course Review for Finals. Cpt S 223 Fall 2008

Introduction to Algorithms 6.046J/18.401J

( D. Θ n. ( ) f n ( ) D. Ο%

Computer Algorithms. Introduction to Algorithm

University of Toronto Department of Electrical and Computer Engineering. Midterm Examination. ECE 345 Algorithms and Data Structures Fall 2012

Lecture: Analysis of Algorithms (CS )

Data Structures and Algorithms. Roberto Sebastiani

D. Θ nlogn ( ) D. Ο. ). Which of the following is not necessarily true? . Which of the following cannot be shown as an improvement? D.

Binary Search Trees Treesort

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs

CS1800 Discrete Structures Fall 2016 Profs. Aslam, Gold, Ossowski, Pavlu, & Sprague December 16, CS1800 Discrete Structures Final

Lecture 1 (Part 1) Introduction/Overview

Outline for Today. How can we speed up operations that work on integer data? A simple data structure for ordered dictionaries.

Transcription:

Advanced Algorithms and Data Structures Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Prerequisites A seven credit unit course Replaced OHJ-2156 Analysis of Algorithms We take things a bit further than OHJ-2156 We will assume familiarity with Necessary mathematics Elementary data structures Programming MAT-72006 AADS, Fall 2015 27-Aug-15 2 1

Course Basics There will be 4 hours of lectures per week Weekly exercises start in two weeks time We will not have a programming exercise this year (unless you demand to have one) We might consider organizing a seminar with voluntary presentations (yielding extra points) at the end of the course MAT-72006 AADS, Fall 2015 27-Aug-15 3 Organization & Timetable Lectures: Prof. Tapio Elomaa Tue & Thu 2 4 PM in TB223 (& TB219 in PII) Aug. 25 Dec. 3, 2015 Period break Oct. 12 18, 2015 Exercises: M.Sc. Juho Lauri & M.Sc. Teemu Heinimäki, Mon 10 12 TC315, Start: Sept. 9 Exam: Fri Dec. 18, 2015 MAT-72006 AADS, Fall 2015 27-Aug-15 4 2

Guest Lecture Dr. Timo Aho from Solita In October Big data Definitions of data analytics Methods Case studies MAT-72006 AADS, Fall 2015 27-Aug-15 5 Course Grading Exam: Maximum of 30 points Weekly exercises yield extra points 40% of questions answered: 1 point 80% answered: 6 points In between: linear scale (so that decimals are possible) Final grading depends on what we agree as course components MAT-72006 AADS, Fall 2015 27-Aug-15 6 3

Material The textbook of the course is Cormen, Leiserson, Rivest, Stein: Introduction to Algorithms, 3rd ed., MIT Press, 2009 There is no prepared material, the slides appear in the web as the lectures proceed http://www.cs.tut.fi/~elomaa/teach/72006.html The exam is based on the lectures (i.e., not on the slides only) MAT-72006 AADS, Fall 2015 27-Aug-15 7 Content (Plan) I Foundations II (Sorting and) Order Statistics III Data Structures IV Advanced Design and Analysis Techniques V Advanced Data Structures VI Graph Algorithms VII Selected Topics MAT-72006 AADS, Fall 2015 27-Aug-15 8 4

I Foundations The Role of Algorithms in Computing Getting Started Growth of Functions Recurrences Probabilistic Analysis and Randomized Algorithms II (Sorting and) Order Statistics Heapsort Quicksort Sorting in Linear Time Medians and Order Statistics 5

III Data Structures Elementary Data Structures Hash Tables Binary Search Trees Red-Black Trees IV Advanced Design and Analysis Techniques Dynamic Programming Greedy Algorithms Amortized Analysis 6

V Advanced Data Structures B-Trees Binomial Heaps Fibonacci Heaps VI Graph Algorithms MAT-62756 GRAPH THEORY Elementary Graph Algorithms Minimum Spanning Trees Single-Source Shortest Paths All-Pairs Shortest Paths Maximum Flow 7

VII Selected Topics Matrix Operations Linear Programming Number-Theoretic Algorithms Approximation Algorithms Part of these could also be student presentation topics Online ChiMerge In machine learning and data mining algorithms one often needs to discretize numerical attributes There are initial intervals that cannot be divided (examples that have the same value for the attribute in question) CHIMERGE is intended as a preprocessing technique for learning algorithms It combines adjacent initial intervals bottom-up to come up with the final discretization of the value range of the numerical attribute in question MAT-72006 AADS, Fall 2015 27-Aug-15 16 8

Combine? Combine? Combine? Combine? Combine? Combine? Combine? Combine? Combine? MAT-72006 AADS, Fall 2015 27-Aug-15 17 The worst-case time complexity of the batch algorithm is lg ) The main requirement for the online version of CHIMERGE (OCM) is a processing time that is, per each received training example, logarithmic in the number of intervals Thus, we altogether match the lg ) time requirement of the batch version of CHIMERGE MAT-72006 AADS, Fall 2015 27-Aug-15 18 9

If each new example drawn from DATASOURCE would update the current discretization immediately, this would yield a slowing down of OCM by a factor as compared to the batch version Therefore, we update the active discretization only periodically, freezing it for the intermediate period Examples received during this time can be handled at a logarithmic time The price to be paid is maintaining some other data structures in addition to the BST MAT-72006 AADS, Fall 2015 27-Aug-15 19 OCM uses a balanced binary search tree into which each example drawn from DATASOURCE is (eventually) submitted The algorithm continuously takes in new examples from DATASOURCE, but instead of just storing them to, it updates the required data structures simultaneously We do not halt the ow of the data stream because of extra processing of the data structures, but amortize the required work to normal example handling BST plus a queue, two doubly linked lists, and a priority queue (heap) are maintained MAT-72006 AADS, Fall 2015 27-Aug-15 20 10

MAT-72006 AADS, Fall 2015 27-Aug-15 21 The discretization process works in three phases In Phase 1 (of length time steps), the examples from DATASOURCE are cached for temporary storage to the queue Meanwhile the initial intervals are collected one at a time from to a list that holds the intervals in their numerical order For each pair of two adjacent intervals, the value is computed and inserted into the priority queue The queue is needed to freeze for the duration of the Phase 1 MAT-72006 AADS, Fall 2015 27-Aug-15 22 11

Procedure OCM ) Input: Integer giving the number of examples on which initial discretization is based on and real which is the threshold for interval merging Data structures: A BST, a queue, two doubly linked lists and, and a priority queue 1. Initialize,,, and as empty; 2. Read training examples into tree ; 3. phase 1; TREEMINIM UM); 4. Make an empty priority queue of size 1; 5. while DATASOURCE() nil do MAT-72006 AADS, Fall 2015 27-Aug-15 23 6. if phase = 1 then ENQUEUE(,) 7. else TREEINSERT fi; 8. if phase = 1 then ---------------------- 9. Insert a copy of as the last item in ; 10. if >1then 11. Compute the -value of the two last intervals in ; 12. Using the obtained value as a key, add a pointer to the two intervals in into ; 13. fi 14. TREESUCCESSOR ); 15. if = nil then phase 2 fi MAT-72006 AADS, Fall 2015 27-Aug-15 24 12

After seeing examples, all the initial intervals in have been inserted to and all the values have been computed and added to At this point holds (unprocessed) examples At the end of Phase 1 all required preparations for interval merging have been done Phase 2 implements the merging as in batch CHIMERGE, but without stopping the processing of DATASOURCE As a result of each merging, the intervals in as well as the related -values in are updated MAT-72006 AADS, Fall 2015 27-Aug-15 25 In Phase 2 the example received from DATASOURCE is submitted directly to along with another, previously received example that is cached in The lowest -value,, corresponding to some adjacent pair of intervals and, is drawn from If it exceeds the merging threshold, Phase 2 is complete Otherwise, the intervals and are merged to obtain a new interval, and the -values on both sides of are updated into MAT-72006 AADS, Fall 2015 27-Aug-15 26 13

16. else if phase = 2 then ------------------- 17. DEQUEUE); TREEINSERT ); 18., EXTRACTMIN); 19. if nil and then 20. Remove from P the D-values of I and J; 21. LIST-DELETE ); LISTDELETE ); 22. MERGE ); 23. Compute the -values of with its neighbors in ; 24. Update the -values into priority queue ; 25. else 26. phase 3; fi MAT-72006 AADS, Fall 2015 27-Aug-15 27 27. else -------------Phase = 3 ------------ 28. DEQUEUE); TREE-INSERT ); 29. LIST-DELETE,HEAD ); 30. LIST-INSERT ); 31. if EMPTY) then 32. phase 1; TREE-MINIM UM); 33. Make an empty priority queue, size 1; 34. fi 35. fi 36. od MAT-72006 AADS, Fall 2015 27-Aug-15 28 14

Time Complexity of OCM The insertion of the received example to clearly takes constant time, as well as the insertion of an interval from to Also computing the -value can be considered a constant-time operation Inserting the obtained value to requires time (lg) Finding a successor in also takes time (lg) Thus, the total time consumption of one iteration of Phase 1 is (lg) MAT-72006 AADS, Fall 2015 27-Aug-15 29 Insertion of the new example and one from the queue to the BST both take time (lg) Extracting the lowest -value from the priority queue is also an lg time operation As contains pointers to the items holding intervals and in, removing them can be implemented in constant time Merging of and to obtain can be considered a constant-time operation as well as computing the new values for and its neighbors in Updating each of the new -values into takes lg time Thus, the total time consumption of one iteration of Phase 2 is also lg MAT-72006 AADS, Fall 2015 27-Aug-15 30 15

The sorting problem Input: A sequence of numbers,, Output: A permutation (reordering),, of the input sequence such that The numbers that we wish to sort are also known as keys MAT-72006 AADS, Fall 2015 27-Aug-15 31 INSERTION-SORT) 1. for 2to 2. 3. // Insert ] into the sorted sequence [1.. 1] 4. 5. while >0and ] > 6. + 1] ] 7. 1 8. [ + 1] MAT-72006 AADS, Fall 2015 27-Aug-15 32 16

5 2 4 6 1 3 2 5 4 6 1 3 2 4 5 6 1 3 2 4 5 6 1 3 1 2 4 5 6 3 1 2 3 4 5 6 MAT-72006 AADS, Fall 2015 27-Aug-15 33 Correctness of the Algorithm The following loop invariant helps us understand why the algorithm is correct: At the start of each iteration of the for loop of lines 1 8, the subarray [1.. 1] consists of the elements originally in [1.. 1] but in sorted order MAT-72006 AADS, Fall 2015 27-Aug-15 34 17

Initialization The loop invariant holds before the first loop iteration, when =2: The subarray, therefore, consists of just the single element [1] It is in fact the original element in [1] This subarray is trivially sorted Therefore, the loop invariant holds prior to the first iteration of the loop MAT-72006 AADS, Fall 2015 27-Aug-15 35 Maintenance Each iteration maintains the loop invariant: The body of the for loop works by moving 1], 2], 3], by one position to the right until the proper position for is found (lines 4 7) At this point the value of ] is inserted (line 8) The subarray [1.. ] then consists of the elements originally in [1..], but in sorted order MAT-72006 AADS, Fall 2015 27-Aug-15 36 18

Termination The condition causing the for loop to terminate is that Because each loop iteration increases by 1, we must have +1at that time Substituting +1for j in the wording of loop invariant, we have that the subarray [1.. ] consists of the elements originally in [1..], but in sorted order [1.. ]is the entire array MAT-72006 AADS, Fall 2015 27-Aug-15 37 Analysis of insertion sort The time taken by the INSERTION-SORT depends on the input: sorting a thousand numbers takes longer than sorting three numbers Moreover, the procedure can take different amounts of time to sort two input sequences of the same size depending on how nearly sorted they already are MAT-72006 AADS, Fall 2015 27-Aug-15 38 19

Input size The time taken by an algorithm grows with the size of the input Traditional to describe the running time of a program as a function of the size of its input For many problems, such as sorting most natural measure for input size is the number of items in the input i.e., the array size MAT-72006 AADS, Fall 2015 27-Aug-15 39 For, e.g., multiplying two integers, the best measure is the total number of bits needed to represent the input in binary notation Sometimes, more appropriate to describe the size with two numbers rather than one E.g., if the input to an algorithm is a graph, the input size can be described by the numbers of vertices and edges in it MAT-72006 AADS, Fall 2015 27-Aug-15 40 20

Running time Running time of an algorithm on an input: The number of primitive operations ( steps ) executed Step as machine-independent as possible For the moment: Constant amount of time to execute each line of pseudocode We assume that each execution of the th line takes time, where is a constant MAT-72006 AADS, Fall 2015 27-Aug-15 41 INSERTION-SORT) cost times 1 for 2to 1 2 ] 2 1 3 // Insert into the sorted sequence [1..] 0 1 4 4 1 5 while >0and ] > 5 6 + 1] ] 6 1) 7 7 1) 8 + 1] 8 1 MAT-72006 AADS, Fall 2015 27-Aug-15 42 21

denotes the number of times the while loop test in line 5 is executed for that value of When a for or while loop exits in the usual way, the test is executed one time more than the loop body Comments are not executable statements, so they take no time Running time of the algorithm is the sum of those for each statement executed MAT-72006 AADS, Fall 2015 27-Aug-15 43 To compute ), the running time of INSERTION-SORT on an input of values, we sum the products of the cost and times columns, obtaining 1 + 1 + 1) 1 + 1) MAT-72006 AADS, Fall 2015 27-Aug-15 44 22

Best case The best case occurs if the array is already sorted For each = 2,3,,, we then nd that ] < in line 5 when has its initial value of 1 Thus =1for = 2,3,,, and the best-case running time is 1 + 1 + 1 + 1 ) MAT-72006 AADS, Fall 2015 27-Aug-15 45 Worst case We can express this as for constants and that depend on the statement costs It is a linear function of The worst case results when the array is in reverse sorted order in decreasing order We must compare each element ] with each element in the entire sorted subarray [1.. 1], and so for = 2,3,, MAT-72006 AADS, Fall 2015 27-Aug-15 46 23

Note that and = 1 = + 1) 2 1 1) 2 by the summation of an arithmetic series + 1) = 2 MAT-72006 AADS, Fall 2015 27-Aug-15 47 The worst-case running time of INSERTION- SORT is 1 + 1 + 1) 1) 1 + 2 2 1) 2 1) = 2 + 2 + 2 + ( ++ ) MAT-72006 AADS, Fall 2015 27-Aug-15 48 24

We can express this worst-case running time as 2 for constants,, and that depend on the statement costs It is a quadratic function of The rate of growth, or order of growth, of the running time really interests us We consider only the leading term of a formula ( 2 ); the lower-order terms are relatively insignicant for large values of MAT-72006 AADS, Fall 2015 27-Aug-15 49 We also ignore the leading term s coefcient, constant factors are less signicant than the rate of growth in determining computational efciency for large inputs For insertion sort, we are left with the factor of 2 from the leading term We write that insertion sort has a worst-case running time of 2 ) ( theta of -squared ) MAT-72006 AADS, Fall 2015 27-Aug-15 50 25

2.3 Designing algorithms Insertion sort is an incremental approach: having sorted [1.. 1], we insert ] into its proper place, yielding sorted subarray [1.. ] Let us examine an alternative design approach, known as divide-and-conquer We design a sorting algorithm whose worst-case running time is much lower The running times of divide-and-conquer algorithms are often easily determined MAT-72006 AADS, Fall 2015 27-Aug-15 51 The divide-and-conquer approach Many useful algorithms are recursive: to solve a problem, they call themselves to deal with closely related subproblems These algorithms typically follow a divideand-conquer approach: Break the problem into subproblems that resemble the original problem but are smaller, Solve the subproblems recursively, Combine these solutions to create a solution to the original problem MAT-72006 AADS, Fall 2015 27-Aug-15 52 26

The paradigm involves three steps at each level of the recursion: 1. Divide the problem into a number of subproblems that are smaller instances of the same problem 2. Conquer the subproblems by solving them recursively If the sizes are small enough, just solve the subproblems in a straightforward manner 3. Combine the solutions to the subproblems into the solution for the original problem MAT-72006 AADS, Fall 2015 27-Aug-15 53 The merge sort algorithm Divide: Divide the -element sequence into two subsequences of /2 elements each Conquer: Sort the two subsequences recursively using merge sort Combine: Merge the two sorted subsequences to produce the sorted answer Recursion bottoms out when the sequence to be sorted has length 1: a sequence of length 1 is already in sorted order MAT-72006 AADS, Fall 2015 27-Aug-15 54 27

The key operation is the merging of two sorted sequences in the combine step We call auxiliary procedure MERGE), where is an array and,, and are indices such that The procedure assumes that the subarrays.. and + 1.. ] are in sorted order and merges them to form a single sorted subarray that replaces the current subarray..] MAT-72006 AADS, Fall 2015 27-Aug-15 55 MERGE) 1. 1 +1 2. 2 3. Let [1.. 1 + 1] and [1.. 2 + 1]be new arrays 4. for 1to 1 5. 1] 6. for 1to 2 7. 8. 1 +1] 9. 2 +1] 10. 1 11. 1 12. for to 13. if 14. ] 15. +1 16. else ] 17. +1 MAT-72006 AADS, Fall 2015 27-Aug-15 56 28

Line 1 computes the length 1 of the subarray..];similarly for 2 and +1..]on line 2 Line 3 creates arrays (left) and (right), of lengths 1 +1and 2 +1, respectively the extra position will hold the sentinel The for loop of lines 4 5 copies.. into [1.. ]; Lines 6 7 copy + 1.. into [1.. 2 ] Lines 8 9 put the sentinels at the ends of and MAT-72006 AADS, Fall 2015 27-Aug-15 57 Lines 10 17 perform the +1basic steps by maintaining the following loop invariant: At the start of each iteration of the for loop of lines 12 17,.. 1]contains the smallest elements of [1.. 1 + 1]and [1.. 2 + 1], in sorted order Moreover, ] and ] are the smallest elements of their arrays that have not been copied back into MAT-72006 AADS, Fall 2015 27-Aug-15 58 29

MERGE(, 9, 12, 16) 8 9 10 11 12 13 14 15 16 17 2 4 5 7 1 2 3 6 1 2 3 4 5 2 4 5 7 1 2 3 4 5 1 2 3 6 8 9 10 11 12 13 14 15 16 17 1 4 5 7 1 2 3 6 1 2 3 4 5 2 4 5 7 1 2 3 4 5 1 2 3 6 MAT-72006 AADS, Fall 2015 27-Aug-15 59 8 9 10 11 12 13 14 15 16 17 1 2 5 7 1 2 3 6 1 2 3 4 5 2 4 5 7 1 2 3 4 5 1 2 3 6 8 9 10 11 12 13 14 15 16 17 1 2 2 7 1 2 3 6 1 2 3 4 5 2 4 5 7 1 2 3 4 5 1 2 3 6 MAT-72006 AADS, Fall 2015 27-Aug-15 60 30

8 9 10 11 12 13 14 15 16 17 1 2 2 3 1 2 3 6 1 2 3 4 5 2 4 5 7 i 1 2 3 4 5 1 2 3 6 8 9 10 11 12 13 14 15 16 17 1 2 2 3 4 2 3 6 1 2 3 4 5 2 4 5 7 1 2 3 4 5 1 2 3 6 MAT-72006 AADS, Fall 2015 27-Aug-15 61 8 9 10 11 12 13 14 15 16 17 1 2 2 3 4 5 3 6 1 2 3 4 5 2 4 5 7 1 2 3 4 5 1 2 3 6 8 9 10 11 12 13 14 15 16 17 1 2 2 3 4 5 6 6 1 2 3 4 5 2 4 5 7 1 2 3 4 5 1 2 3 6 MAT-72006 AADS, Fall 2015 27-Aug-15 62 31

8 9 10 11 12 13 14 15 16 17 1 2 2 3 4 5 6 7 1 2 3 4 5 2 4 5 7 1 2 3 4 5 1 2 3 6 The needed +1iterations of the last for loop have been executed: [9.. 16] is sorted, and the two sentinels in and are the only two elements in these arrays that have not been copied into MAT-72006 AADS, Fall 2015 27-Aug-15 63 MERGE procedure runs in ) time, where +1 each of lines 1 3 and 8 11 takes constant time the for loops of lines 4 7 take 1 2 = ) time there are iterations of the for loop of lines 12 17, each of which takes constant time MAT-72006 AADS, Fall 2015 27-Aug-15 64 32

Merge sort The procedure MERGE-SORT sorts the elements in.. If, the subarray has at most one element and is therefore already sorted Otherwise, the divide step computes an index that partitions.. into two subarrays:..], containing /2elements +1..],containing /2elements MAT-72006 AADS, Fall 2015 27-Aug-15 65 MERGE-SORT) 1. if 2. )/2 3. MERGE-SORT) 4. MERGE-SORT+1,) 5. MERGE) MAT-72006 AADS, Fall 2015 27-Aug-15 66 33

Analysis of merge sort Our analysis assumes that the original problem size is a power of 2 Each divide step then yields two subsequences of size exactly /2 We set up the recurrence for ), the worstcase running time of merge sort on numbers Merge sort on just one element takes constant time MAT-72006 AADS, Fall 2015 27-Aug-15 67 When we have >1elements, we break down the running time as follows: Divide: The step just computes the middle of the subarray, which takes constant time: ) = (1) Conquer: We recursively solve two subproblems, each of size /2, which contributes /2) to the running time Combine: the MERGE procedure on an element array takes time ), and so ) = ) MAT-72006 AADS, Fall 2015 27-Aug-15 68 34

When we add the ) and ), we are adding functions that are ) and (1) This sum is a linear function of Adding it to the /2) term from the conquer step gives the recurrence for ): (1) = 2) + ) if=1 if>1 MAT-72006 AADS, Fall 2015 27-Aug-15 69 To intuitively see that the solution to the recurrence is ) = lg),where lg stands for log 2, let us rewrite it as if=1 = 2) + if>1 where constant represents time required to solve problems of size 1 and that per array element of the divide and combine steps MAT-72006 AADS, Fall 2015 27-Aug-15 70 35

MAT-72006 AADS, Fall 2015 27-Aug-15 71 36