Table ADT and Sorting. Algorithm topics continuing (or reviewing?) CS 24 curriculum

Similar documents
Chapter 10. Sorting and Searching Algorithms. Fall 2017 CISC2200 Yanjun Li 1. Sorting. Given a set (container) of n elements

Searching and Sorting

Hash Tables. CS 311 Data Structures and Algorithms Lecture Slides. Wednesday, April 22, Glenn G. Chappell

Selection, Bubble, Insertion, Merge, Heap, Quick Bucket, Radix

Today: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation

Programming II (CS300)

CS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017

UNIT III BALANCED SEARCH TREES AND INDEXING

COMP2012H Spring 2014 Dekai Wu. Sorting. (more on sorting algorithms: mergesort, quicksort, heapsort)

Algorithm Efficiency & Sorting. Algorithm efficiency Big-O notation Searching algorithms Sorting algorithms

Programming II (CS300)

Cpt S 122 Data Structures. Sorting

About this exam review

17/05/2018. Outline. Outline. Divide and Conquer. Control Abstraction for Divide &Conquer. Outline. Module 2: Divide and Conquer

algorithm evaluation Performance

Hash Tables. Hashing Probing Separate Chaining Hash Function

Data Structures. Topic #6

Sorting Goodrich, Tamassia Sorting 1

Data Structures - CSCI 102. CS102 Hash Tables. Prof. Tejada. Copyright Sheila Tejada

CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019

ECE242 Data Structures and Algorithms Fall 2008

Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION

Sorting. Task Description. Selection Sort. Should we worry about speed?

DIVIDE & CONQUER. Problem of size n. Solution to sub problem 1

Algorithmic Analysis and Sorting, Part Two

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

We can use a max-heap to sort data.

CS301 - Data Structures Glossary By

Lecture 2: Divide&Conquer Paradigm, Merge sort and Quicksort

CS 151 Name Final Exam December 17, 2014

SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 6. Sorting Algorithms

Mergesort again. 1. Split the list into two equal parts

Implementation with ruby features. Sorting, Searching and Haching. Quick Sort. Algorithm of Quick Sort

Question Bank Subject: Advanced Data Structures Class: SE Computer

CSE373: Data Structure & Algorithms Lecture 18: Comparison Sorting. Dan Grossman Fall 2013

Open Addressing: Linear Probing (cont.)

CS61B, Spring 2003 Discussion #15 Amir Kamil UC Berkeley 4/28/03

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

CSE 214 Computer Science II Searching

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

QuickSort

Quick Sort. CSE Data Structures May 15, 2002

Lecture 19 Sorting Goodrich, Tamassia

CMSC 341 Hashing (Continued) Based on slides from previous iterations of this course

IS 709/809: Computational Methods in IS Research. Algorithm Analysis (Sorting)

Chapter 10 Sorting and Searching Algorithms

On my honor I affirm that I have neither given nor received inappropriate aid in the completion of this exercise.

Module 2: Classical Algorithm Design Techniques

Data Structures And Algorithms

CSE 373: Data Structures and Algorithms

Key question: how do we pick a good pivot (and what makes a good pivot in the first place)?

Module 5: Hashing. CS Data Structures and Data Management. Reza Dorrigiv, Daniel Roche. School of Computer Science, University of Waterloo

Heaps and Priority Queues

CS134 Spring 2005 Final Exam Mon. June. 20, 2005 Signature: Question # Out Of Marks Marker Total

BM267 - Introduction to Data Structures

Module Contact: Dr Geoff McKeown, CMP Copyright of the University of East Anglia Version 1

Algorithmic Analysis and Sorting, Part Two. CS106B Winter

MPATE-GE 2618: C Programming for Music Technology. Unit 4.2

CS 2412 Data Structures. Chapter 10 Sorting and Searching

Priority Queue. 03/09/04 Lecture 17 1

COMP Data Structures

Mergesort again. 1. Split the list into two equal parts

106B Final Review Session. Slides by Sierra Kaplan-Nelson and Kensen Shi Livestream managed by Jeffrey Barratt

Binary Search Trees Treesort

Quicksort. Repeat the process recursively for the left- and rightsub-blocks.

Sorting. Order in the court! sorting 1

Sorting Pearson Education, Inc. All rights reserved.

Introduction to Computers and Programming. Today

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

Problem. Input: An array A = (A[1],..., A[n]) with length n. Output: a permutation A of A, that is sorted: A [i] A [j] for all. 1 i j n.

Topic HashTable and Table ADT

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

Tables, Priority Queues, Heaps

Sorting. Two types of sort internal - all done in memory external - secondary storage may be used

Unit Outline. Comparing Sorting Algorithms Heapsort Mergesort Quicksort More Comparisons Complexity of Sorting 2 / 33

Sorting. Sorting in Arrays. SelectionSort. SelectionSort. Binary search works great, but how do we create a sorted array in the first place?

Hash Open Indexing. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

L14 Quicksort and Performance Optimization

DATA STRUCTURES/UNIT 3

Chapter 7 Sorting. Terminology. Selection Sort

Overview of Sorting Algorithms

CSE 373 NOVEMBER 8 TH COMPARISON SORTS

Princeton University Computer Science COS226: Data Structures and Algorithms. Midterm, Fall 2013

Topics Recursive Sorting Algorithms Divide and Conquer technique An O(NlogN) Sorting Alg. using a Heap making use of the heap properties STL Sorting F

Sorting. Order in the court! sorting 1

Merge Sort Goodrich, Tamassia Merge Sort 1

COS 226 Midterm Exam, Spring 2009

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Sorting. CPSC 259: Data Structures and Algorithms for Electrical Engineers. Hassan Khosravi

Quick-Sort. Quick-Sort 1

Standard ADTs. Lecture 19 CS2110 Summer 2009

Sorting. Sorting. Stable Sorting. In-place Sort. Bubble Sort. Bubble Sort. Selection (Tournament) Heapsort (Smoothsort) Mergesort Quicksort Bogosort

Searching in General

CS 311 Data Structures and Algorithms, Spring 2009 Midterm Exam Solutions. The Midterm Exam was given in class on Wednesday, March 18, 2009.

SORTING. Comparison of Quadratic Sorts

CSE100. Advanced Data Structures. Lecture 21. (Based on Paul Kube course materials)

Priority Queue Sorting

Computer Science 302 Spring 2007 Practice Final Examination: Part I

CSE 332: Data Structures & Parallelism Lecture 10:Hashing. Ruth Anderson Autumn 2018

CSE 373: Data Structures and Algorithms

Transcription:

Table ADT and Sorting Algorithm topics continuing (or reviewing?) CS 24 curriculum

A table ADT (a.k.a. Dictionary, Map) Table public interface: // Put information in the table, and a unique key to identify it: bool put(keytype key, ItemType info); // Get information from the table, according to the key value: ItemType get(keytype key); // Update information that is already in the table: bool update(keytype key, ItemType newinfo); // Remove information (and associated key) from the table: bool remove(keytype key); // Above methods return false if unsuccessful (except get returns null) // Print all information in table, in the order of the keys: void printall();

Table implementation options l Many possibilities depends on application And how much trouble efficiency is worth l Option 1: use a BST To put: inserttree using key for ordering To update: deletetree, then inserttree To printall: use in-order traversal l Option 2: use sorted array with binary searching l Option 3: implement as a hash table

Hashing ideas and concepts l Idea: transform arbitrary key domain (e.g., strings) into dense integer range Then use result as index to table int index = hash(key); // transform key to int l Collisions: hash(k1)==hash(k2), k1!= k2 Usually impossible to avoid ( perfect hashing rare) Therefore, must have a way to handle collisions l e.g., if using open addressing techniques - while (occupied(index)) index = probe(key); l Concept: insertion/searching is quick but only until the table starts to get filled up Then collisions start happening too often!

Implementing a hash table l Constructor allocates memory for array of items, and initializes all items to empty key size is size of array n is the number of items in the table Load factor is n / size l put method uses hash(key) (and probe(key)if open address hashing) to find empty slot for new item May be necessary to resize array l If so, also necessary to rehash existing items l If open address hashing, resize when load factor > 50%

Open address hashing l get & update methods use hash(key) and probe(key) in exact same sequence as put To find existing info where it was put l remove is more complicated Cannot just remove an item future probes for get and update might terminate prematurely at empty slot l Common trick is to have deleted key Problem with that is table can seem full prematurely l Inefficient alternative rehashes all items when any removed l Note: to printall in key order must sort first So O(n log n) at best!

Hash functions l Goal: uniform distribution of keys Means each index of table is equally likely Important for reducing collisions l Common approach is a restricted transformation Step 1 transform key to large integer Step 2 restrict integer to 0 size-1 l Usually done with modulus operator - % l Lots of variations partly depends on key type General observation: hard to find a good hash function Note: should be cheap to compute too e.g., division is slower on most CPUs than addition

Resolving collisions l Simplest open address approach is linear probing If (index = hash(key)) is not empty, try index+1, then index+2,, until empty slot l Note: searching for first open address Leads to primary clusters collisions bunch up l Quadratic probing vary probe, like 1, 3, 6, Leads to secondary clusters but not as quickly l Double hashing probe(key) varies by key Best open addressing approach for avoiding clusters l Or completely different approach chaining

Chaining l Constructor allocates memory for array of Lists, and creates an empty list for each element of the array l put method uses hash(key) and appends to end of list at that index of array Still should resize when load factor approaches 80% l Clustering is not a problem, but long lists slow performance l remove method is easier now just delete from list l But lots more overhead than open addressing Must store node links as well as key and info Use list method calls instead of direct array access

Sorting l Probably the most expensive common operation l Problem: arrange a[0..n-1] by some ordering e.g., in ascending order: a[i-1]<=a[i], 0<i<n l Two general types of strategies Comparison-based sorting includes most strategies l Apply to any comparable data (key, info) pairs l Lots of simple, inefficient algorithms l Some not-so-simple, but more efficient algorithms Address calculation sorting rarely used in practice l Must be tailored to fit the data not all data are suitable

Selection sort largest sorted l Idea: build sorted sequence at end of array l At each step: Find largest value in not-yet-sorted portion Exchange this value with the one at end of unsorted portion (now beginning of sorted portion) l Complexity is O(n 2 ) but simple to program l Also best way to find kth largest, or top k values

Heap sort l Another priority queue sorting algorithm Note about selection sort: unsorted part of array is like a priority queue remove greatest value at each step Also recall that heaps make faster priority queues l Idea: create heap out of unsorted portion, then remove one at a time and put in sorted portion l Complexity is O(n log n) O(n) to create heap + O(n log n) to remove/reheapify l Note proof: O(n log n) is the fastest possible class of any comparison-based sorting algorithm But constants do matter so some are faster than others in practice

Insertion sort l Generally better than other simple algorithms l Inserts one element into sorted part of array Must move other elements to make room for it current l Complexity is O(n 2 ) (code) But runs faster than selection sort and others in its class Really quick on nearly sorted array l Often used to supplement more sophisticated sorts

Divide & conquer strategies l Idea: (1) divide array in two; (2) sort each part; (3) combine two parts to overall solution l e.g., mergesort if (array is big enough to continue splitting) à divide array into left half and right half; mergesort(left half); mergesort(right half); merge(left half and right half together); else à sort small array in a simpler way Cost each time to merge two halves is O(n), and overall complexity is O(n log n) But notice it also uses 2n space Commonly used to sort large files (i.e., when there are too many records to load all of them into memory at once)

Quick sort l Invented in 1960 by C.A.R. Hoare Studied extensively by many people since Probably used more than any other sorting algorithm l Basic (recursive) quicksort algorithm: if (there is something to sort) { partition array; sort left part; sort right part; } All the work is done by partition (a.k.a. split) function And there is no need to merge anything at the end

Partitioning (for quicksort) l Arrange so elements in the two sub-arrays are on correct side of a pivot element Also means pivot element ends up in its final position all <= pivot pivot all >= pivot l Done by performing two series of scans scan from (i = left) until a[i] >= pivot; scan from (j = right) until a[j] <= pivot; swap a[i] and a[j], and continue both scans; stop scanning when i >= j; (code)

Quick sort (cont.) l Complexity is O(n log n) on average Fastest comparison-based sorting algorithm But overkill, and not-so-fast with small arrays Um what about a small partition?! One optimization applies insertion sort for partitions smaller than than 7 elements l And worst case is O(n 2 )! Depends on initial ordering and choice of pivot l Btw: C library qsort, and C++ STL sort

Compare 3 table implementations Table operation Hash table BST Sorted array create (new table) O(n) O(1) O(n) get, update O(1) O(log n) O(log n) put O(1) O(log n) O(n) remove O(1) O(log n) O(n) printall O(n log n) O(n) O(n) l Conclusion: choice depends on table purpose and size of n l Q. Ever want to use a sorted array? A. It depends!