CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe
Algorthm Effcency SORTING 2
Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal to sort the lst, then use bnary search Many sortng algorthms of dfferng complexty (.e. faster or slower) Sortng provdes a "classcal" study of algorthm analyss because there are many mplementatons wth dfferent pros and cons Lst ndex Lst ndex 0 1 2 3 4 5 Orgnal 0 1 2 3 4 5 Sorted 3
Sortng Stablty A sort s stable f the order of equal tems n the orgnal lst s mantaned n the sorted lst Good for searchng wth multple crtera Example: Spreadsheet search Lst of students n alphabetcal order frst Then sort based on test score I'd want student's wth the same test score to appear n alphabetcal order stll As we ntroduce you to certan sort algorthms consder f they are stable or not Lst ndex Lst ndex Lst ndex 7,a 3,b 5,e 8,c 5,d 0 1 2 3 4 Orgnal 3,b 5,e 5,d 7,a 8,c 0 1 2 3 4 Stable Sortng 3,b 5,d 5,e 7,a 8,c 0 1 2 3 4 Unstable Sortng 4
Bubble Sortng Man Idea: Fnd and move largest number to top of the lst then repeat on lst of sze n-1 Have one loop to count each pass, (a.k.a. ) to dentfy whch ndex we need to stop at Have an nner loop start at the lowest ndex and count up to the stoppng locaton fndng the maxmum as we go Lst Lst Lst Lst Lst Orgnal 3 7 6 5 1 8 After Pass 1 3 6 5 1 7 8 After Pass 2 3 5 1 6 7 8 After Pass 3 3 1 5 6 7 8 Lst After Pass 4 1 3 5 6 7 8 After Pass 5 5
Bubble Sort Algorthm vod bsort(vector<nt> mylst) { nt ; for(=mylst.sze()-1; > 0; --){ for(=0; < ; ++){ f(mylst[] > mylst[+1]) { (, +1) Pass 1 Pass 2 Pass n-2 3 7 6 5 1 8 3 1 5 6 7 8 3 7 8 6 5 1 3 7 6 5 1 8 no 1 3 5 6 7 8 3 7 8 6 5 1 no 3 6 7 5 1 8 3 7 6 8 5 1 3 6 5 7 1 8 3 7 6 5 8 1 3 6 5 1 7 8 3 7 6 5 1 8 6
Bubble Sort Value Courtesy of wkpeda.org Lst Index 7
Bubble Sort Analyss Best Case Complexty: When already but stll have to O( ) Worst Case Complexty: When O( ) vod bsort(vector<nt> mylst) { nt ; for(=mylst.sze()-1; > 0; --){ for(=0; < ; ++){ f(mylst[] > mylst[+1]) { (, +1) 8
Bubble Sort Analyss Best Case Complexty: When already sorted (no s) but stll have to do all compares O(n 2 ) Worst Case Complexty: When sorted n descendng order O(n 2 ) vod bsort(vector<nt> mylst) { nt ; for(=mylst.sze()-1; > 0; --){ for(=0; < ; ++){ f(mylst[] > mylst[+1]) { (, +1) 9
Loop Invarants Loop nvarant s a statement about what s true ether before an teraton begns or after one ends Consder bubble sort and look at the data after each teraton (pass) What can we say about the patterns of data after the k-th teraton? vod bsort(vector<nt> mylst) { nt ; for(=mylst.sze()-1; > 0; --){ for(=0; < ; ++){ f(mylst[] > mylst[+1]) { (, +1) Pass 1 3 7 8 6 5 1 3 7 8 6 5 1 no Pass 2 3 7 6 5 1 8 3 7 6 5 1 8 3 6 7 5 1 8 no 3 7 6 8 5 1 3 6 5 7 1 8 3 7 6 5 8 1 3 6 5 1 7 8 3 7 6 5 1 8 10
Loop Invarants What s true after the k-th teraton? All data at ndces n-k and above, n k: All data at ndces below n-k are, < n k: vod bsort(vector<nt> mylst) { nt ; for(=mylst.sze()-1; > 0; --){ for(=0; < ; ++){ f(mylst[] > mylst[+1]) { (, +1) Pass 1 3 7 8 6 5 1 Pass 2 3 7 6 5 1 8 3 7 6 5 1 8 no 3 7 8 6 5 1 no 3 6 7 5 1 8 3 7 6 8 5 1 3 6 5 7 1 8 3 7 6 5 8 1 3 6 5 1 7 8 3 7 6 5 1 8 11
Loop Invarants What s true after the k-th teraton? All data at ndces n-k and above are sorted, n k: a < a + 1 All data at ndces below n- k are less than the value at n-k, < n k: a < a n k vod bsort(vector<nt> mylst) { nt ; for(=mylst.sze()-1; > 0; --){ for(=0; < ; ++){ f(mylst[] > mylst[+1]) { (, +1) Pass 1 3 7 8 6 5 1 Pass 2 3 7 6 5 1 8 3 7 6 5 1 8 no 3 7 8 6 5 1 no 3 6 7 5 1 8 3 7 6 8 5 1 3 6 5 7 1 8 3 7 6 5 8 1 3 6 5 1 7 8 3 7 6 5 1 8 12
Selecton Sort Selecton sort does away wth the many s and ust records where the mn or max value s and performs one at the end The lst/array can agan be thought of n two parts Sorted Unsorted The problem starts wth the whole array unsorted and slowly the sorted porton grows 13
Selecton Sort Algorthm vod ssort(vector<nt> mylst) { for(=0; < mylst.sze()-1; ++){ nt mn = ; for(=+1; < mylst.sze; ++){ f(mylst[] < mylst[mn]) { mn = (mylst[], mylst[mn]) Pass 1 Pass 2 Pass n-2 mn=0 mn=1 mn=4 mn=1 1 3 8 6 5 7 mn=1 1 3 5 6 7 8 mn=4 mn=1 mn=1 mn=1 mn=5 1 3 8 6 5 7 mn=1 1 3 8 6 5 7 mn=1 1 3 8 6 5 7 mn=1 1 3 8 6 5 7 1 3 8 6 5 7 14
Selecton Sort Value Courtesy of wkpeda.org Lst Index 15
Best Case Complexty: O( ) Worst Case Complexty: O( ) Selecton Sort Analyss vod ssort(vector<nt> mylst) { for(=0; < mylst.sze()-1; ++){ nt mn = ; for(=+1; < mylst.sze; ++){ f(mylst[] < mylst[mn]) { mn = (mylst[], mylst[mn]) 16
Best Case Complexty: Sorted already O(n 2 ) Worst Case Complexty: When sorted n descendng order O(n 2 ) Selecton Sort Analyss vod ssort(vector<nt> mylst) { for(=0; < mylst.sze()-1; ++){ nt mn = ; for(=+1; < mylst.sze; ++){ f(mylst[] < mylst[mn]) { mn = (mylst[], mylst[mn]) 17
Loop Invarant What s true after the k-th teraton? All data at ndces less than k are, < k: All data at ndces k and above are, k: vod ssort(vector<nt> mylst) { for(=0; < mylst.sze()-1; ++){ nt mn = ; for(=+1; < mylst.sze; ++){ f(mylst[] < mylst[mn]) { mn = (mylst[], mylst[mn]) Pass 1 mn=0 mn=1 Pass 2 mn=1 1 3 8 6 5 7 mn=1 1 3 8 6 5 7 mn=1 mn=1 mn=1 1 3 8 6 5 7 mn=1 mn=1 1 3 8 6 5 7 mn=1 mn=5 1 3 8 6 5 7 1 3 8 6 5 7 18
Loop Invarant What s true after the k-th teraton? All data at ndces less than k are sorted, < k: a < a + 1 All data at ndces k and above are greater than the value at k, k: a k < a vod ssort(vector<nt> mylst) { for(=0; < mylst.sze()-1; ++){ nt mn = ; for(=+1; < mylst.sze; ++){ f(mylst[] < mylst[mn]) { mn = (mylst[], mylst[mn]) Pass 1 mn=0 mn=1 Pass 2 mn=1 1 3 8 6 5 7 mn=1 1 3 8 6 5 7 mn=1 mn=1 mn=1 1 3 8 6 5 7 mn=1 mn=1 1 3 8 6 5 7 mn=1 mn=5 1 3 8 6 5 7 1 3 8 6 5 7 19
Inserton Sort Algorthm Imagne we pck up one element of the array at a tme and then ust nsert t nto the rght poston Smlar to how you sort a hand of cards n a card game You pck up the frst (t s by nature sorted) You pck up the second and nsert t at the rght poston, etc. Start 1 st Card 2 nd Card 3 rd Card 4 th Card 5 th Card????? 7???? 7 3??? 3 7 8?? 3 7 8 6? 3 6 7 8 5 3 7??? 3 7 8?? 3 6 7 8? 3 5 6 7 8 20
Inserton Sort Algorthm vod sort(vector<nt> mylst) { for(nt =1; < mylst.sze()-1; ++){ nt val = mylst[]; hole = whle(hole > 0 && val < mylst[hole-1]){ mylst[hole] = mylst[hole-1]; hole--; mylst[hole] = val; Pass 1 Pass 2 Pass 3 Pass 4 h val=3 3 7 8 6 5 1 h val=8 3 7 8 6 5 1 h val=6 3 6 7 8 5 1 h val=5 7 7 8 6 5 1 h 3 7 8 6 5 1 3 7 8 8 5 1 h 3 6 7 8 8 1 h 3 7 8 6 5 1 h 3 7 7 8 5 1 h 3 6 7 7 8 1 h 3 6 7 8 5 1 h 3 6 6 7 8 1 h 3 5 6 7 8 1 h 21
Inserton Sort Value Courtesy of wkpeda.org Lst Index 22
Best Case Complexty: Sorted already Worst Case Complexty: When sorted n descendng order Inserton Sort Analyss vod sort(vector<nt> mylst) { for(nt =1; < mylst.sze()-1; ++){ nt val = mylst[]; hole = whle(hole > 0 && val < mylst[hole-1]){ mylst[hole] = mylst[hole-1]; hole--; mylst[hole] = val; 23
Inserton Sort Analyss Best Case Complexty: Sorted already O(n) Worst Case Complexty: When sorted n descendng order O(n 2 ) Consder nserton sort on an array or lnk-based lst mplementaton, what are the pros and cons Same pros and cons of runnng nsert() on an array-based or lnkbased lst vod sort(vector<nt> mylst) { for(nt =1; < mylst.sze()-1; ++){ nt val = mylst[]; hole = whle(hole > 0 && val < mylst[hole-1]){ mylst[hole] = mylst[hole-1]; hole--; mylst[hole] = val; 24
Loop Invarant What s true after the k-th teraton? All data at ndces less than, Can we make a clam about data at k+1 and beyond? vod sort(vector<nt> mylst) { for(nt =1; < mylst.sze()-1; ++){ nt val = mylst[]; hole = whle(hole > 0 && val < mylst[hole-1]){ mylst[hole] = mylst[hole-1]; hole--; mylst[hole] = val; h Pass 1 7 7 8 6 5 1 h Pass 2 val=3 3 7 8 6 5 1 h 3 7 8 6 5 1 val=8 3 7 8 6 5 1 h 25
Loop Invarant What s true after the k-th teraton? All data at ndces less than k+1 are sorted, < k + 1: a < a + 1 Can we make a clam about data at k+1 and beyond? No, t's not guaranteed to be smaller or larger than what s n the sorted lst vod sort(vector<nt> mylst) { for(nt =1; < mylst.sze()-1; ++){ nt val = mylst[]; hole = whle(hole > 0 && val < mylst[hole-1]){ mylst[hole] = mylst[hole-1]; hole--; mylst[hole] = val; h Pass 1 7 7 8 6 5 1 h 3 7 8 6 5 1 h Pass 2 val=3 3 7 8 6 5 1 h 3 7 8 6 5 1 val=8 26
MERGESORT 27
Merge Two Sorted Lsts Consder the problem of mergng two sorted lsts nto a new combned sorted lst Can be done n O(n) Can we merge n place or need an output array? 0 1 2 3 3 7 6 8 0 1 2 3 3 6 7 8 Inputs Lsts Merged Result r1 r2 r1 r2 r1 r2 r1 r2 r1 r2 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 3 7 6 8 3 7 6 8 3 7 6 8 3 7 6 8 3 7 6 8 w w w w w 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 3 6 7 8 3 6 7 8 3 6 7 8 3 6 7 8 3 6 7 8 28
Recursve Sort (MergeSort) Break sortng problem nto smaller sortng problems and merge the results at the end Mergesort(0..n) If lst s sze 1, return Else Mergesort(0..n/2-1) Mergesort(n/2.. n) Combne each sorted lst of n/2 elements nto a sorted n- element lst Mergesort(0,2) Mergesort(2,4) Mergesort(4,6) Mergesort(6,8) Mergesort(0,8) 0 1 2 3 4 5 6 7 0 4 2 Mergesort(0,4) Mergesort(4,8) 0 1 2 3 4 5 6 7 0 4 2 0 1 2 3 4 5 6 7 0 4 2 0 1 2 3 4 5 6 7 0 4 2 0 1 2 3 4 5 6 7 3 7 6 8 5 10 2 4 0 1 2 3 4 5 6 7 3 6 7 8 2 4 5 10 0 1 2 3 4 5 6 7 2 3 4 5 6 7 8 10 29
Recursve Sort (MergeSort) Run-tme analyss # of recurson levels = Log 2 (n) Total operatons to merge each level = n operatons total to merge two lsts over all recursve calls at a partcular level Mergesort = O(n * log 2 (n) ) Mergesort(0,2) Mergesort(2,4) Mergesort(4,6) Mergesort(6,8) Usually has hgh constant factors due to extra array needed for merge Mergesort(0,8) 0 1 2 3 4 5 6 7 0 4 2 Mergesort(0,4) Mergesort(4,8) 0 1 2 3 4 5 6 7 0 4 2 0 1 2 3 4 5 6 7 0 4 2 0 1 2 3 4 5 6 7 0 4 2 0 1 2 3 4 5 6 7 3 7 6 8 5 10 2 4 0 1 2 3 4 5 3 6 7 8 2 4 6 7 5 10 0 1 2 3 4 5 6 7 2 3 4 5 6 7 8 10 30
MergeSort Run Tme Let's prove ths more formally: T(1) = Θ(1) T(n) = 31
MergeSort Run Tme Let's prove ths more formally: T(1) = Θ(1) T(n) = 2*T(n/2) + Θ(n) k=1 T(n) = 2*T(n/2) + Θ(n) T(n/2) = 2*T(n/4) + Θ(n/2) k=2 k=3 = 2*2*T(n/4) + 2*Θ(n) = 8*T(n/8) + 3*Θ(n) = 2 k *T(n/2 k ) + k*θ(n) Stop @ T(1) [.e. n = 2 k ] k=log 2 n = 2 k *T(n/2 k ) + k*θ(n) = 2 log2(n) *Θ(1) + log 2 *Θ(n) = n+log 2 *Θ(n) = Θ(n*log 2 n) 32
Merge Sort Value Courtesy of wkpeda.org Lst Index 33
Recursve Sort (MergeSort) vod mergesort(vector<nt>& mylst) { msort(mylst, 0, mylst.sze() ); vod msort(vector<nt>& mylst, nt start, nt end) { // base case f(start >= end) return; // recursve calls nt md = (start+end)/2; msort(mylst, start, md); msort(mylst, md, end); // merge merge(mylst, start, md, md, end); vod merge(vector<nt>& mylst, nt s1, nt e1, nt s2, nt e2) {... 34
Dvde & Conquer Strategy Mergesort s a good example of a strategy known as "dvde and conquer" 3 Steps: Dvde Splt problem nto smaller versons (usually partton the data somehow) Recurse Solve each of the smaller problems Combne Put solutons of smaller problems together to form larger soluton Another example of Dvde and Conquer? Bnary Search 35
QUICKSORT 36
Partton & QuckSort Partton algorthm (arbtrarly) pcks one number as the 'pvot' and puts t nto the 'correct' locaton left rght rght left unsorted numbers p < pvot p > pvot nt partton(vector<nt> mylst, nt start, nt end, nt p) { nt pvot = mylst[p]; (mylst[p], mylst[end]); // move pvot out of the //way for now nt left = start; nt rght = end-1; whle(left < rght){ whle(mylst[left] <= pvot && left < rght) left++; whle(mylst[rght] >= pvot && left < rght) rght--; f(left < rght) (mylst[left], mylst[rght]); (mylst[rght], mylst[end]); // put pvot back return rght; Partton(mylst,0,6,5) 3 6 8 1 5 7 l r,p 3 6 8 1 5 7 l r p 3 6 5 1 8 7 l r p 3 6 5 1 8 7 l,r p 3 6 5 1 7 8 l,r p 37
Partton & QuckSort Partton algorthm (arbtrarly) pcks one number as the 'pvot' and puts t nto the 'correct' locaton left rght rght left unsorted numbers p < pvot p > pvot nt partton(vector<nt> mylst, nt start, nt end, nt p) { nt pvot = mylst[p]; (mylst[p], mylst[end]); // move pvot out of the //way for now nt left = start; nt rght = end-1; whle(left < rght){ whle(mylst[left] <= pvot && left < rght) left++; whle(mylst[rght] >= pvot && left < rght) rght--; f(left < rght) (mylst[left], mylst[rght]); (mylst[rght], mylst[end]); // put pvot back return rght; Partton(mylst,0,5,5) 3 6 8 1 5 7 l p,r 3 6 8 1 5 7 l r p 3 6 5 1 8 7 l r p 3 6 5 1 8 7 l,r p 3 6 5 1 7 8 l,r p 38
QuckSort Use the partton algorthm as the bass of a sort algorthm Partton on some number and the recursvely call on both sdes < pvot p > pvot vod qsort(vector<nt>& mylst, nt start, nt end) { // base case f(start >= end) return; 3 6 8 1 5 7 l p,r // pck a random pvot locaton [0..sze-1] nt p = rand() % (end); // partton nt loc = partton(mylst,start,end,p) // recurse on both sdes qsort(mylst,start,loc); qsort(mylst,loc+1,end); 3 6 8 1 5 7 l r p 3 6 5 1 8 7 l r p 3 6 5 1 8 7 l,r p 3 6 5 1 7 8 l,r p 39
Quck Sort Value Courtesy of wkpeda.org Lst Index 40
QuckSort Analyss Worst Case Complexty: When pvot chosen ends up beng Runtme: 3 6 8 1 5 7 3 6 1 5 7 8 3 6 8 1 5 7 3 1 5 6 8 7 Best Case Complexty: Pvot pont chosen ends up beng the Runtme: 41
QuckSort Analyss Worst Case Complexty: When pvot chosen ends up beng mn or max tem Runtme: T(n) = T(n-1) + T(1) + Θ(n) 3 6 8 1 5 7 3 6 1 5 7 8 3 6 8 1 5 7 3 1 5 6 8 7 Best Case Complexty: Pvot pont chosen ends up beng the medan tem Runtme: Smlar to MergeSort T(n) = 2T(n/2) + Θ(n) 42
QuckSort Analyss Average Case Complexty: O(n*log(n)) choose a pvot 3 6 8 1 5 7 43
QuckSort Analyss Worst Case Complexty: When pvot chosen ends up beng max or mn of each lst O(n 2 ) Best Case Complexty: Pvot pont chosen ends up beng the mddle tem O(n*lg(n)) Average Case Complexty: O(n*log(n)) Randomly choose a pvot Pvot and qucksort can be slower on small lsts than somethng lke nserton sort Many qucksort algorthms use pvot and qucksort recursvely untl lsts reach a certan sze and then use nserton sort on the small peces 44
Comparson Sorts Bg O of comparson sorts It s mathematcally provable that comparson-based sorts can never perform better than O(n*log(n)) So can we ever have a sortng algorthm that performs better than O(n*log(n))? Yes, but only f we can make some meanngful assumptons about the nput 45
OTHER SORTS 46
Sortng n Lnear Tme Radx Sort Sort numbers one dgt at a tme startng wth the least sgnfcant dgt to the most. Bucket Sort Assume the nput s generated by a random process that dstrbutes elements unformly over the nterval [0, 1) Countng Sort Assume the nput conssts of an array of sze N wth ntegers n a small range from 0 to k. 47
Other Resources http://www.youtube.com/watch?v=vxenklcs2tw http://flowngdata.com/2010/09/01/what-dfferent-sortngalgorthms-sound-lke/ http://www.math.ucla.edu/~rcompton/muscal_sortng_algo rthms/muscal_sortng_algorthms.html http://sortng.at/ Awesome muscal accompanment: https://www.youtube.com/watch?v=epfmtym8cw 48