CS 2412 Data Structures. Chapter 10 Sorting and Searching

Similar documents
Sorting. Chapter 12. Exercises

Sorting. Sorting in Arrays. SelectionSort. SelectionSort. Binary search works great, but how do we create a sorted array in the first place?

Data Structures And Algorithms

Question Bank Subject: Advanced Data Structures Class: SE Computer

Chapter 10. Sorting and Searching Algorithms. Fall 2017 CISC2200 Yanjun Li 1. Sorting. Given a set (container) of n elements

Sorting. Chapter 12. Objectives. Upon completion you will be able to:

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

Topic HashTable and Table ADT

Comp 335 File Structures. Hashing

Component 02. Algorithms and programming. Sorting Algorithms and Searching Algorithms. Matthew Robinson

TABLES AND HASHING. Chapter 13

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

Introducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved

DATA STRUCTURES/UNIT 3

CSCD 326 Data Structures I Hashing

! A Hash Table is used to implement a set, ! The table uses a function that maps an. ! The function is called a hash function.

Hash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1

Table ADT and Sorting. Algorithm topics continuing (or reviewing?) CS 24 curriculum

Lecture 6 Sorting and Searching

Visit ::: Original Website For Placement Papers. ::: Data Structure

CS301 - Data Structures Glossary By

DS ata Structures Aptitude

a) State the need of data structure. Write the operations performed using data structures.

UNIT 5. Sorting and Hashing

1. Attempt any three of the following: 15

Sorting. Task Description. Selection Sort. Should we worry about speed?

Sorting and Searching Algorithms

CSCE 2014 Final Exam Spring Version A

Solutions to Chapter 8

DO NOT. UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N.

Table of Contents. Chapter 1: Introduction to Data Structures... 1

CSE 214 Computer Science II Searching

Section 1: True / False (1 point each, 15 pts total)

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard

Chapter 10 Sorting and Searching Algorithms

Data Structures and Algorithms 2018

12/1/2016. Sorting. Savitch Chapter 7.4. Why sort. Easier to search (binary search) Sorting used as a step in many algorithms

UNIT III BALANCED SEARCH TREES AND INDEXING

Sorting. Hsuan-Tien Lin. June 9, Dept. of CSIE, NTU. H.-T. Lin (NTU CSIE) Sorting 06/09, / 13

CS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017

CS 310 Hash Tables, Page 1. Hash Tables. CS 310 Hash Tables, Page 2

Hashing for searching

Chapter 17. Disk Storage, Basic File Structures, and Hashing. Records. Blocking

Searching in General

Successful vs. Unsuccessful

Hashing. Dr. Ronaldo Menezes Hugo Serrano. Ronaldo Menezes, Florida Tech

Chapter 10 - Notes Applications of Arrays

9/24/ Hash functions

Hashing HASHING HOW? Ordered Operations. Typical Hash Function. Why not discard other data structures?

Sorting. Two types of sort internal - all done in memory external - secondary storage may be used

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

Chapter 7 Sorting. Terminology. Selection Sort

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING B.E SECOND SEMESTER CS 6202 PROGRAMMING AND DATA STRUCTURES I TWO MARKS UNIT I- 2 MARKS

CS 310 Advanced Data Structures and Algorithms

Classic Data Structures Introduction UNIT I

File Organization and Storage Structures

Walls & Mirrors Chapter 9. Algorithm Efficiency and Sorting

9/10/12. Outline. Part 5. Computational Complexity (2) Examples. (revisit) Properties of Growth-rate functions(1/3)

Faster Sorting Methods

O(n): printing a list of n items to the screen, looking at each item once.

PERFORMANCE OF VARIOUS SORTING AND SEARCHING ALGORITHMS Aarushi Madan Aarusi Tuteja Bharti

SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 6. Sorting Algorithms

Use PageUp and PageDown to move from screen to screen. Click on speaker to play sound.

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

Selection, Bubble, Insertion, Merge, Heap, Quick Bucket, Radix

Data and File Structures Chapter 11. Hashing

APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY THIRD SEMESTER B.TECH DEGREE EXAMINATION, JULY 2017 CS205: DATA STRUCTURES (CS, IT)

! Search: find a given item in a list, return the. ! Sort: rearrange the items in a list into some. ! list could be: array, linked list, string, etc.

SCJ2013 Data Structure & Algorithms. Bubble Sort. Nor Bahiah Hj Ahmad & Dayang Norhayati A. Jawawi

Hash Table and Hashing

4. SEARCHING AND SORTING LINEAR SEARCH

Question And Answer.

Frequently asked Data Structures Interview Questions

1) What is the primary purpose of template functions? 2) Suppose bag is a template class, what is the syntax for declaring a bag b of integers?

Priority Queue Sorting

Searching and Sorting

CSE 332 Autumn 2013: Midterm Exam (closed book, closed notes, no calculators)

Introduction To Hashing

Sorting & Searching. Hours: 10. Marks: 16

HASH TABLES. Hash Tables Page 1

Sorting. Bubble Sort. Pseudo Code for Bubble Sorting: Sorting is ordering a list of elements.

SORTING. Comparison of Quadratic Sorts

UNIT 7. SEARCH, SORT AND MERGE

Hashing. 5/1/2006 Algorithm analysis and Design CS 007 BE CS 5th Semester 2

Introduction to Computer Science Midterm 3 Fall, Points

Hashing. Introduction to Data Structures Kyuseok Shim SoEECS, SNU.

A6-R3: DATA STRUCTURE THROUGH C LANGUAGE

INSTITUTE OF AERONAUTICAL ENGINEERING

CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019

Hash Tables. Hashing Probing Separate Chaining Hash Function

UNIT-2. Problem of size n. Sub-problem 1 size n/2. Sub-problem 2 size n/2. Solution to the original problem

COMP Data Structures

Module Contact: Dr Geoff McKeown, CMP Copyright of the University of East Anglia Version 1

Searching & Sorting in Java Bubble Sort

Introduction to Hashing

08 A: Sorting III. CS1102S: Data Structures and Algorithms. Martin Henz. March 10, Generated on Tuesday 9 th March, 2010, 09:58

Computer Science 136 Spring 2004 Professor Bruce. Final Examination May 19, 2004

Course Name: B.Tech. 3 th Sem. No of hours allotted to complete the syllabi: 44 Hours No of hours allotted per week: 3 Hours. Planned.

General Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure

Summer Final Exam Review Session August 5, 2009

Transcription:

CS 2412 Data Structures Chapter 10 Sorting and Searching

Some concepts Sorting is one of the most common data-processing applications. Sorting algorithms are classed as either internal or external. Sorting order can be either ascending sequence or descending sequence. Sort stability is an attribute of a sort, indicating that data with equal keys maintain their relative input order in the output. Sort efficiency usually is based on the comparisons and moves required for the sorting. The best possible sorting algorithms are O(n log n). During the sorting process, each traversal of the data is referred to as a sort pass. Data Structure 2016 R. Wei 2

Selection sorts Heap sort: we have already discussed. First build a heap. Then remove the root of the heap and put the last element to the root and reheap down. Straight selection sort: In each pass of the selection sort, the smallest element is selected from the unsorted sublist and exchange with the element at the beginning of the unsorted list. Data Structure 2016 R. Wei 3

Data Structure 2016 R. Wei 4

Algorithm selectionsort (list, last) set current to 0 loop (until last element sorted) set smallest to current set walker to current +1 loop (walker key < smallest key) set smallest to walker increment walker end loop exchange (current, smallest) increment current end loop Data Structure 2016 R. Wei 5

The efficiency of selection sort Straight select sort: O(n 2 ). The algorithm has two level of loops, each of the loop executes about n times. Heap sort: O(n log n). To build a heap, about n log n loops are needed. To sort from the heap needs another n log n loops. In big-o notation, the complexity is O(n log n). Data Structure 2016 R. Wei 6

Insertion sorts Straight insertion sort: the list is divided into sorted and unsorted sublists. In each pass the first element of the unsorted sublist is inserted into the sorted sublist at correct position. Shell sort: the list is divided into K segments and each segment is sorting (the segments are dispersed through the list). After each passing, the number of segments is reduced according to a increment. When the number of segments is reduced to 1, the list is sorted. Data Structure 2016 R. Wei 7

Data Structure 2016 R. Wei 8

Algorithm insertionsort(list, last) set current to 1 loop (until last element sorted) move current element to hold set walker to current - 1 loop (walker >= 0 AND hold key < walker key) move walker element right one element decrement walker end loop move hold to walker + 1 element increment current end loop Data Structure 2016 R. Wei 9

The main idea for the Shell sort is divide the list into segments and use insertion sort to sort each segment. The positions of the elements of a segment are at a distance of increment. In the following example, the list is of size 10. The 5 segments for increment K = 5 are as follows: Segment 1. Segment 2. Segment 3. Segment 4. Segment 5. A[0], A[5] A[1], A[6] A[2], A[7] A[3], A[8] A[4], A[9] Then for increment K = 2 Segment 1. Segment 2. A[0], A[2], A[4], A[6], A[8] A[1], A[3], A[5], A[7], A[9] Data Structure 2016 R. Wei 10

Data Structure 2016 R. Wei 11

Data Structure 2016 R. Wei 12

Algorithm shellsort (list, last) set incre to last / 2 loop (incre not 0) set current to incre loop(until last element sorted) move current element to hold set walker to current - incre loop (walker>=0 AND hold key < walker key) move walker element one increment right set walker to walker - incre end loop move hold to walker + incre element increment current end loop set incre to incre / 2 end loop Data Structure 2016 R. Wei 13

void shellsort (int list [], int last) { int hold; int incre; int walker; incre = last / 2; while (incre!= 0) { for (int curr = incre; curr <= last; curr++) { hold = list [curr]; walker = curr - incre; while (walker >= 0 && hold < list [walker]) { list [walker + incre] = list [walker]; walker = ( walker - incre ); Data Structure 2016 R. Wei 14

} // while list [walker + incre] = hold; } // for walk incre = incre / 2; } // while return; } // shellsort Note In the above algorithm, the increment start from n/2, then each pass reduce half of the size. This is not the most efficient way, but simple. The ideal increments should be set so that no two elements will appear at same segment more than once. But this is not easy in general. Data Structure 2016 R. Wei 15

Insertion sort efficiency: Straight insertion sort: O(n 2 ). The algorithm has two embedded loops. The execute times is about n(n + 1)/2. Shell sort: the complexity is difficult to analysis. Using empirical studies show that the average sort complexity is O(n 1.25 ) Data Structure 2016 R. Wei 16

Exchange sorts Bubble sort: the list in divided into two sublists: sorted and unsorted. The smallest element is bubbled from the unsorted sublist to the sorted sublist each time. Quick sort: each time a pivot is selected. Then the elements less than pivot and the elements greater or equal to pivot are separated into two sublist. The pivot is put at its ultimately correct location in the list. Data Structure 2016 R. Wei 17

Example: 23 78 45 8 56 32 8 23 78 45 32 56 8 23 32 78 45 56 8 23 32 45 78 56 8 23 32 45 56 78 Data Structure 2016 R. Wei 18

Algorithm bubblesort(list, last) set current to 0 set sorted to false loop (current <= last AND sorted false) set walker to last set sorted to true loop (walker > current) if (walker dta < walker -1 data) set sorted to false exchange (list, walker, walker -1) end if decrement walker end loop increment current end loop Data Structure 2016 R. Wei 19

Data Structure 2016 R. Wei 20

Note for quick sort There are different methods for selecting the pivot. Select the first element. Select the middle element. Select the median value of three elements: left, right and the element in the middle of the list. This text uses this method. When the partition becomes small, a straight insertion sort can be used, which may be more efficient. Data Structure 2016 R. Wei 21

Example for one pass of a quick sort: Data Structure 2016 R. Wei 22

Algorithm medianleft(sortdata, left, right) set mid to (left + right ) /2 if (left key > mid key) exchange (sortdata, left, mid) end if if (left key > right key) exchange ( sortdata, left, right) end if if(mid key > right key) exchange (sortdata, mid, right) end if exchange (sortdata, left, mid) //put pivot in left. Data Structure 2016 R. Wei 23

Data Structure 2016 R. Wei 24

Data Structure 2016 R. Wei 25

The list in Figure 12-15 is sorted as follows: Data Structure 2016 R. Wei 26

The exchange sort efficiency: Bubble sort: O(n 2 ). There are two loops in the algorithm. The comparison is about n(n + 1)/2. Quick sort: O(n log n). The algorithm has 5 loops. However, for each pass, the partition is general half size as previous pass. Roughly say, there are total log 2 n passes. Data Structure 2016 R. Wei 27

void bubblesort (int list [], int last) { int temp; for (int current = 0, sorted = 0; current <= last &&!sorted; current++) for (int walker = last, sorted = 1; walker > current; walker--) if (list[ walker ] < list[ walker - 1 ]) { sorted = 0; temp = list[walker]; list[walker] = list[walker - 1]; list[walker - 1] = temp; } // if return; } // bubblesort Data Structure 2016 R. Wei 28

External sorts In external sorting, portions of the data may be stored in secondary memory during the sorting process. One important method for the external sort is merge the (sorted) files in to one sorted file. Data Structure 2016 R. Wei 29

Merge sorts A simple merge is merge two sorted files into one file. For example, we have two sorted lists: 1, 3, 5 2, 4, 6, 8, 10 After we merged these two list, we should obtain the following list: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. Data Structure 2016 R. Wei 30

The following algorithm merges two sorted files file1, file2. The combined data are written into file3 Algorithm mergefiles open files read (file1 into record1) read (file2 into record2) loop (not end file1 or not end file2) if (record1.key <= record2.key) write (record1 to file3) read (file1 into record1) if (end of file1) set record1.key to infinity end if else write (record2 to file3) Data Structure 2016 R. Wei 31

read (file2 into record2) if (end of file2) set record2 key to infinity end if end if end loop close files end mergefiles Data Structure 2016 R. Wei 32

Merge unsorted files: Form merge runs for the files. Each run is ordered. The end of each run is identified by a stepdown. Merge each run of the two files. When one run is stepdown, the another run is rollout (copied to the merged file). Data Structure 2016 R. Wei 33

Data Structure 2016 R. Wei 34

The sorting process: Sort phase: Divide the file into merge files according to the size of memory. Foe example, if we have 2300 records, but the memory only can handle 500 records. We first read in 500 records and sort it as the first merge run. Then read and sort 501-1000 records as first run of the merge 2, etc. Merge phase: merge the sorted runs. Data Structure 2016 R. Wei 35

Data Structure 2016 R. Wei 36

There are different merge concepts. We discuss 3 of them as examples Natural merge: after merge, all data are written in one file and need a distribute phase to redistribute the data to two files. Balance merge: use a constant number of input merge files and the same number of output merger files. Ployphase merge: A constant number of input merge files are merged to one output merge file, the input merge files are immediately reused when their input has been completely merged. Data Structure 2016 R. Wei 37

Data Structure 2016 R. Wei 38

Data Structure 2016 R. Wei 39

Searching Binary search: for sorted list. Sequential search: Straight sequential search: each time check if the key equals to the target AND if it is the last key. Sentinel sequential search: add the target at the end of the list so that each time just check if key equals to the target. Probability search: when a target is found, move the element containing target up one location. In this way, most frequent targets are easier to found. Data Structure 2016 R. Wei 40

Hashed list searches Hashing is a method using key-to-address mapping to find the data quickly. The basic idea is using a hash function to map a key (which is at a large range) to a index (which is at a small range) of data. Some keys may be mapped to a same index (synonyms). Then we need some method to solve the collision. The main part of hashing is to find good hashing methods. Data Structure 2016 R. Wei 41

Data Structure 2016 R. Wei 42

Hashing methods: Direct method: the range of keys and the range of index are the same. Subtraction method: subtract a fixed number from the key. Also require both ranges are the same. Modulo-division method: index= key MODULO listsize Digit-extraction method: select digits at certain positions as the index. Midsquare method: key is squared and the middle digits are used as index. Data Structure 2016 R. Wei 43

Folding method: fold shift (key is divided into parts whose size matches the size of the index. Then the left and right parts are shifted and added with the middle part); fold boundary (the left and right numbers are folded on a fixed boundary between them and the center number. The two outside values are reversed). Data Structure 2016 R. Wei 44

Rotation method: rotating the last character to the front of the key. Usually used by incorporating with other methods. Pseudorandom method: the key is used as the seed in a pseudorandom number generator, the resulting random number is then scaled into the possible index range. Data Structure 2016 R. Wei 45

Some concepts used in collision resolution method: Load factor: the number of elements in the list divided by the number of physical allocated for the list, expressed as percentage (better less than 75). α = k n 100. Clustering: as data are added to a list and collisions are resolved, some hashing algorithms tend to cause data to group within the list. Data Structure 2016 R. Wei 46

Data Structure 2016 R. Wei 47

Open addressing to resolve collisions (disadvantage: each collision resolution increases the probability of future collisions). Linear probe: when data cannot be stored in the home address, we resolve the collision by adding 1 to the current address. Data Structure 2016 R. Wei 48

Quadratic probe: the increment is the collision probe number squared. Data Structure 2016 R. Wei 49

Pseudorandom collision resolution (double hashing): use a pseudorandom number to resolve the collision. Use the collision address as the key of the the pseudorandom generator. Data Structure 2016 R. Wei 50

Key offset (double hashing): calculate the new address as a function of the old address and the key. For example: offset = key / listsize address = (offset + old address) modulo listsize Data Structure 2016 R. Wei 51

Linked list collision resolution: use a separate area to store collisions and chains all synonyms together in a linked list (usually use LIFO sequence). Two storage areas are used: prime area and the overflow area. Data Structure 2016 R. Wei 52

Bucket hashing: keys are hashed to buckets, nodes that accommodate multiple data occurrences. (disadvantage: use more empty space, when the bucket is full, collision occurs) Data Structure 2016 R. Wei 53

Combination approaches may used: bucket hashing first, then a linear probe is used if bucket is full. Data Structure 2016 R. Wei 54