Computational Complexities of the External Sorting Algorithms with No Additional Disk Space

Similar documents
An efficient external sorting algorithm

Merge Sort Algorithm

ARC Sort: Enhanced and Time Efficient Sorting Algorithm

Sorting Algorithms. Slides used during lecture of 8/11/2013 (D. Roose) Adapted from slides by

Use of Tree-based Algorithms for Internal Sorting

IS 709/809: Computational Methods in IS Research. Algorithm Analysis (Sorting)

Freeze Sorting Algorithm Based on Even-Odd Elements

Keywords: Binary Sort, Sorting, Efficient Algorithm, Sorting Algorithm, Sort Data.

International Journal of Scientific & Engineering Research, Volume 4, Issue 7, July ISSN

An 11-Step Sorting Network for 18 Elements. Sherenaz W. Al-Haj Baddar, Kenneth E. Batcher

Chapter 18 Indexing Structures for Files

A Comparative Study of Entropy Encoding Techniques for Lossless Text Data Compression

Indexes as Access Paths

Keywords Comparisons, Insertion Sort, Selection Sort, Bubble Sort, Quick Sort, Merge Sort, Time Complexity.

Cpt S 122 Data Structures. Sorting

Evaluation of Power Consumption of Modified Bubble, Quick and Radix Sort, Algorithm on the Dual Processor

Smart Sort and its Analysis

Sorting. Sorting in Arrays. SelectionSort. SelectionSort. Binary search works great, but how do we create a sorted array in the first place?

Sorting Algorithms. + Analysis of the Sorting Algorithms

Efficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488)

MERGE SORT SYSTEM IJIRT Volume 1 Issue 7 ISSN:

IUT Job Cracker Design and Implementation of a Dynamic Job Scheduler for Distributed Computation

We can use a max-heap to sort data.

Using Genetic Programming to Evolve a General Purpose Sorting Network for Comparable Data Sets

D.Abhyankar 1, M.Ingle 2. -id: 2 M.Ingle, School of Computer Science, D.A. University, Indore M.P.

Question Bank Subject: Advanced Data Structures Class: SE Computer

Task Graph Scheduling on Multiprocessor System using Genetic Algorithm

An Enhanced Selection Sort Algorithm

CSE 530A. B+ Trees. Washington University Fall 2013

Regard as 32 runs of length 1. Split into two scratch files of 4 blocks each, writing alternate blocks to each file.

O(n): printing a list of n items to the screen, looking at each item once.

Divide and Conquer. Algorithm Fall Semester

B-Trees. Introduction. Definitions

A Multi Join Algorithm Utilizing Double Indices

Chapter 4. Divide-and-Conquer. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

AnOptimizedInputSortingAlgorithm

Comparative Study Of Different Data Mining Techniques : A Review

Merge Sort Roberto Hibbler Dept. of Computer Science Florida Institute of Technology Melbourne, FL

PowerVault MD3 SSD Cache Overview

A 12-STEP SORTING NETWORK FOR 22 ELEMENTS

Dual Sorting Algorithm Based on Quick Sort

SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 6. Sorting Algorithms

Linear Discriminant Analysis in Ottoman Alphabet Character Recognition

Memory management: outline

Search for Approximate Matches in Large Databases *

Chapter 6 Objectives

Memory management: outline

A New Line Drawing Algorithm Based on Sample Rate Conversion

Enhanced Quicksort Algorithm

6.2 DATA DISTRIBUTION AND EXPERIMENT DETAILS

Key question: how do we pick a good pivot (and what makes a good pivot in the first place)?

Module 2: Classical Algorithm Design Techniques

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS

S O R T I N G Sorting a list of elements implemented as an array. In all algorithms of this handout the sorting of elements is in ascending order

School of Computer and Information Science

SAS System Powers Web Measurement Solution at U S WEST

Position Sort. Anuj Kumar Developer PINGA Solution Pvt. Ltd. Noida, India ABSTRACT. Keywords 1. INTRODUCTION 2. METHODS AND MATERIALS

Virtual Memory. Chapter 8

CSE 373 NOVEMBER 8 TH COMPARISON SORTS

Fast Bit Sort. A New In Place Sorting Technique. Nando Favaro February 2009

Advance Indexing. Limock July 3, 2014

Sorting Algorithms. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar

Operating system Dr. Shroouq J.

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Week 02 Module 06 Lecture - 14 Merge Sort: Analysis

A Comparative study on Algorithms for Shortest-Route Problem and Some Extensions

University of Waterloo Midterm Examination Sample Solution

Problem. Input: An array A = (A[1],..., A[n]) with length n. Output: a permutation A of A, that is sorted: A [i] A [j] for all. 1 i j n.

Multiway Blockwise In-place Merging

PLD Semester Exam Study Guide Dec. 2018

Heap-Filter Merge Join: A new algorithm for joining medium-size relations

QUICKSORT TABLE OF CONTENTS

Outline. Where Do Heuristics Come From? Part 3. Question. Max ing Multiple Heuristics

IP LOOK-UP WITH TIME OR MEMORY GUARANTEE AND LOW UPDATE TIME 1

An Algorithm for Merging Disk Files in Place P.P. Roets

a 0, a 1,..., a n 1 a' 0, a' 1,..., a' n 1 a' 0 a' 1 a' n 1.

17/05/2018. Outline. Outline. Divide and Conquer. Control Abstraction for Divide &Conquer. Outline. Module 2: Divide and Conquer

Analysis of Algorithms. Unit 4 - Analysis of well known Algorithms

Reduction of Blocking artifacts in Compressed Medical Images

Sorting on the Cray X1

Implementation of Process Networks in Java

Lecture Notes 14 More sorting CSS Data Structures and Object-Oriented Programming Professor Clark F. Olson

Virtuozzo Containers

Operating Systems Projects Built on a Simple Hardware Simulator

2 Proposed Implementation. 1 Introduction. Abstract. 2.1 Pseudocode of the Proposed Merge Procedure

Lecture C8: Trees part II 3 C 2 A 4 D. Response to 'Muddiest Part of the Lecture Cards' (16 respondents)

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER

Comparative Study on VQ with Simple GA and Ordain GA

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 27, SPRING 2013

6.830 Lecture 8 10/2/2017. Lab 2 -- Due Weds. Project meeting sign ups. Recap column stores & paper. Join Processing:

Matrix Multiplication on an Experimental Parallel System With Hybrid Architecture

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems

Performance Evaluations for Parallel Image Filter on Multi - Core Computer using Java Threads

Bubble sort is so named because the numbers are said to bubble into their correct positions! Bubble Sort

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION

08 A: Sorting III. CS1102S: Data Structures and Algorithms. Martin Henz. March 10, Generated on Tuesday 9 th March, 2010, 09:58

B + -Trees. Fundamentals of Database Systems Elmasri/Navathe Chapter 6 (6.3) Database System Concepts Silberschatz/Korth/Sudarshan Chapter 11 (11.

Advanced Databases. Lecture 1- Query Processing. Masood Niazi Torshiz Islamic Azad university- Mashhad Branch

MICROPROCESSOR MEMORY ORGANIZATION

Modified Directional Weighted Median Filter

Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering

Transcription:

Computational Complexities of the External Sorting Algorithms with o Additional Disk Space Md. Rafiqul Islam, S. M. Raquib Uddin and Chinmoy Roy Computer Science and Engineering Discipline, Khulna University, Khulna-908, angladesh Email:cseku@khulna.bangla.net (R. Islam) Abstract This paper presents an analysis of I/O (read and write) complexities of the external sorting algorithms with no additional disk space. Each algorithm sorts records partitioning into blocks each of with block size. Analyzing the algorithms we find that the input complexity of each of the algorithm is Ο ; whereas the output complexity varies from one to another. Here we review the algorithms first and then analyze the I/O complexities. The I/O complexities are compared then.. Introduction Although the memories of current computers have been increasing rapidly, there still exists a need for external sorting for large databases. In [], the authors confirmed that sorting has continued to be counted for roughly one-fourth of all computer cycles. The problem of how to sort data efficiently has been widely discussed. According to Shaffer [], the main concern with external sorting is to minimize disk access since reading a disk block takes about a million times longer than accessing an item in RAM. The number of I/Os is a more appropriate measure in the performance of the external sorting and the other external problems, because the I/O speed is much slower than the CPU speed. The most common external sorting used is still the merge sort, as described by Knuth [3], Singh and aps [4], and others. In two-way merge sort, a file is divided into sub files. The records of the two sub files are written to two auxiliary files whereby by pair wise comparison the smaller records are always written first, thus producing sorted runs of two records each. During the second pass, the two runs from the output files are compared; thereby producing new runs of four records each, which are in the sorted sequence. This process continues until the entire file is sorted. This routine makes use of temporary disk files. Dufrene and Lin [5] proposed an algorithm in which no other external file is needed; only the original file (file to be sorted) is used. M.. Adnan et al. [6] proposed a hybrid external sorting algorithm with no additional disk space. Another similar algorithm is proposed by M. R. Islam et al. [7]. In all of these three algorithms the authors gave attention to the time complexities but they did not give attention to the I/O complexities. In this paper we study the I/O complexities of these algorithms. For this, in the next section we review the external sorting algorithms with no additional disk space.. Algorithms review In this section we will review three external sorting algorithms with no additional disk space. 60

Computational Complexities of the External Sorting Algorithms with o Additional Disk Space.. An efficient external sorting algorithm with no additional space This algorithm proposed by Dufrene and Lin [5] is essentially a generalization of the internal ubble Sort, where the individual record in the internal sort is replaced by block of records in the external sort. In this algorithm, the external file is divided into equal sized blocks, which are approximately one half of the memory array (RAM) of the computer. ow, if M is the size of the memory array, then the block size, = M /. Again, if is the size of external file then the number of blocks, S = /. egin of file File size () lock_ lock_........... lock_s- lock_s End of file Figure. External file after splitting into blocks. At the first iteration lock_ and lock_s are read into the lower half and upper half of memory array respectively. These two blocks are then sorted using Quick Sort. The records of the lower half are retained in the memory array, which contains the lowest sorted records of lock_ and lock_s and the records of upper half of memory array are returned to lock_s area of external file. ow lock_s- comes into the upper half and the process continues. The loop terminates when the last block, lock_ has been processed. Then lock_ contains the lowest ordered records for the entire file. The next iteration starts with lock_. After this pass, as in the case of ubble Sort, the size of the file is decreased by one block. The last two blocks to be processed are locks_s- and lock_s, upon the completion of which the entire file is sorted... A faster hybrid external sorting Algorithm with no additional space This algorithm proposed by M.. Adnan et al. [6] is also the generalization of internal ubble Sort. The algorithm works in two phases. In the first phase, this algorithm works as the algorithm proposed by Dufrene and Lin [5], which was reviewed in the previous section. After this phase, we get the external file as shown in Figure. egin of file File size () lock_ lock_.............. lock_s- lock_s Lowest Sorted Sorted Sorted sorted records records records records Figure. External file after first phase. End of file After this, the algorithm switches to its second phase. In this phase, lock_s- and lock_s are read into the lower and upper half of memory array respectively. These two blocks are then merged to sort the records and the sorted records are written simultaneously in the position of lock_s- in the external file until the block is full. So, half of the records in the memory array will be sorted by merging and written in the position of lock_s- in the external file. The remaining records in the lower half (if International Journal of The Computer, the Internet and Management Vol. 3.o.3 (September-December, 005) pp 60-68 6

any) are copied into the upper half of memory array. ow the upper half of memory array contains the highest records of lock_s and lock_s-. Then, again Merge Sort is applied to sort the records in the upper half of memory array. The additional space required for Merge Sort is the lower half of memory array. Figure 3 and Figure 4 illustrate this approach. After this, lock_s- is read into lower half of memory array. The Merging and Merge Sort terminates when lock_ is read into the lower half of memory array and processed accordingly. After this iteration the upper half of the memory array contains the highest sorted records and they are written in the position of lock_s in the external file. egin of file File size () lock_ lock_........... lock_s- lock_s End of file Merge of 50% Records Figure 3. Sorting by merging. Memory Copy of records Memory Additional space Sort the upper Required for Merge sort The next iteration starts with lock_s- and lock_s- to be read into the lower half and upper half of memory array respectively. At the end of this iteration, upper half of the half of memory using merge sort Figure 4. Sorting in memory by Merge sort. memory array contains the highest sorted records among lock_, lock_3,..., lock_s- and they are written in the position of lock_s- in the external file. 6

Computational Complexities of the External Sorting Algorithms with o Additional Disk Space After each pass, as in the case of ubble sort, the size of the external file is decreased by one block. The last blocks to be processed are lock_ and lock_3, upon the completion of which the entire file is sorted..3. A new external sorting algorithm with no additional disk space This algorithm proposed by M. R. Islam et al. [7] is also the generalization of internal ubble sort. The algorithm works in two phases. In the first phase, this algorithm works as the algorithm proposed by M.. Adnan et al. [6] which was reviewed in section. That is, lock_ and lock_s are read into lower half and upper half of the memory array respectively and they are sorted using Quick sort. This phase terminates when lock_ is read into the upper half of memory array and sorted with the remaining records in the lower half of the memory array. After this, the algorithm switches to its second phase. In this phase, lock_s- and lock_s are read into the lower and upper half of the memory array respectively. Then the algorithm uses the special merging process. The merging process used here, is a special one because, the merging is accomplished in two steps. In the first step, merging is applied to sort the records (as both halves of memory array contain sorted records) of the lower and upper half of memory array and the sorted records are written simultaneously in the position of lock_s- in the external file until the block is full. In the second step, the remaining records in the lower and upper half of memory array are again merged and the sorted records are written from the beginning of the upper half of memory array simultaneously. ow the upper half of the memory array contains the highest ordered records of lock_s and lock_s-. Figure 5 illustrates this approach. egin of file File size () lock_ lock_........... lock_s- lock_s End of file Merge of 50% Records Memory Merge the records and write from the beginning of upper half Figure 5. Sorting by special merging technique. International Journal of The Computer, the Internet and Management Vol. 3.o.3 (September-December, 005) pp 60-68 63

After this, lock_s- is read into lower half of memory array. In this way, when the last block, lock_ has been processed, the upper half of memory array contains the highest sorted records of the entire file and they are written in the position of lock_s in the external file. The next iteration starts with lock_s- and lock_s- to be read into the lower and upper half of memory array respectively. At the end of this iteration, upper half of memory array contains the highest records among the blocks lock_, lock_3,..., lock_s- and they are written in the position of lock_s- in the external file. After each pass, as in the case of the ubble sort, the size of the external file is decreased by one block. The last blocks to be processed are lock_ and lock_3, upon the completion of which the entire file is sorted. 3. Complexities of the algorithms In this section we will study about the I/O complexities and time complexities of the three external sorting algorithms with no additional disk space. 3.. Analysis of I/O complexity In the algorithm proposed by Dufrene and Lin [5] the external file is divided into equal sized blocks, which are approximately one half of the available memory array (RAM) of the computer. ow, if M is the size of memory i.e., the number of records that can fit into main memory, then the block size, = M /. Again, if is the size of external file then the number of blocks, S = /. At first iteration, each block is read into main memory and then written to its corresponding block position. So, it takes / I/Os [ / reads and / writes]. After this iteration lock_ contains the lowest ordered records for the entire file. The next iteration starts with lock_. At the second iteration each block except the first is read into the main memory and after sorting internally is written back to the file. So, it takes / I/Os. After this pass lock_ will contain next lower sorted records of the entire file. Subsequent iteration starts with lock_3 and it takes / I/Os. For each pass, as in the case of ubble Sort, the size of the file is decreased by one block. The last two blocks to be processed are lock_s- and lock_s, upon the completion of which the entire file is sorted. The total number of iteration is /. Thus the total number of I/O is + + + 3 +... + = + + + 3 +... + = + + + 3 + 3. = + + + 3 +... + + + = + = for all. + ( ) So, here the I/O complexity is Ο ; that means input (read) complexity of Dufrene and Lin s algorithm is Ο and output (write) complexity is Ο. The algorithm proposed by M.. Adnan et al. [6] works in two phases. In the first phase, this algorithm works as the 64

Computational Complexities of the External Sorting Algorithms with o Additional Disk Space previous one. Assuming its parameters as the previous one it takes / I/Os in the first phase [ / reads and / writes]. After this phase the first block of the external file contains the lowest sorted records of the entire file. In the second phase, the records are merged (compared) and written to the disk (block) simultaneously using no extra space. / blocks will have to be processed in this second phase. The first iteration reads each block except the first and is written back to the file. Here for one block there will be writes, whereas read for one block. That means in the first iteration of the second phase there will be / reads and writes. After this iteration the upper half of memory array contains the highest sorted records and they are written in the position of lock_s in the external file. After each pass the file size is reduced by block. Thus in the second iteration there will be / reads and writes and the process continues. The last blocks to be processed are lock_ and lock_3, upon the completion of which the entire file is sorted. So it s input and output operations can be found distinctly. Thus in the second phase number of input operations = + + 3 +... +. Total number of disk input is + + +... + ; this equation is same as equation ( 3. ) which can be simplified as Ο. Thus the input (read) complexity for this algorithm remains same compared to the previous one. The output complexity of the algorithm is as follows. The total number of disk output is + ( ) + ( ) + ( 3) +... + = + + 3 +... + + + = + + 3 +... + + + = + = + = + = + for all. So, here the output complexity is Ο. The algorithm proposed by M. R. Islam et al. [7] also works in two phases. In the first phase this algorithm works as the previous one. Assuming its parameters as the previous one it takes / I/Os in the first phase [ / reads and / writes]. In the second phase the algorithm uses a special merging process and the merging is accomplished in two steps. In the first step, merging is applied to sort the records (as both halves of memory array contain sorted records) of the lower and upper half of memory array and the sorted records are written simultaneously in the position of lock_s- in the external file until the block is full. In the second step, the remaining records in the lower and upper half of memory array are again merged and the sorted records are written from the beginning of the upper half of memory array simultaneously. The first iteration reads each International Journal of The Computer, the Internet and Management Vol. 3.o.3 (September-December, 005) pp 60-68 65

block except the first and after merging the records are written back to the file. Here for one block there will be writes, whereas read for one block. That means in the first iteration of the second phase there will be / reads and writes and the process continues. After each pass the file size is reduced by block. Here the total number of disk input is + + +... + ; this equation is same as equation ( 3. ) which can be simplified as Ο. Again here the input complexity is Ο. Similar to the previous one its output complexity can be written as Ο. 3.. Time complexity The time complexity of the algorithm presented by Dufrene and Lin [5] is shown as n log n i as given by M. R. Islam et al. e i= [7] and the algorithm presented by M.. Adnan et al. [6] can be shown as n n n n log e n + + log i. i= Again the time complexity of the algorithm proposed by M. R. Islam et al. is shown as n log e n + n i. In each case n is i= the number of records that resides into the memory array. The time complexity of M.. Adnan et al. s algorithm is less than that of Dufrene and Lin s algorithm and the time complexity of M. R. Islam et al. s algorithm is less than that of the Dufrene and Lin s algorithm as shown in [6] and [7] respectively. 4. Comparison of I/O complexity From the complexity analysis we showed that the disk read (input) operations of the three external sorting algorithms proposed by Dufrene and Lin; M.. Adnan et al. and M. R. Islam et al. are same for a number of records. The disk write (output) operations of the algorithm proposed by Dufrene and Lin is + and that of the algorithm proposed by M. R. Islam et al. is +. ow we compare the output complexities of these two algorithms. We assume that, + = + + + + = + + + + + = + + + + + = + + + + +. ow the comparison is performed in four steps. i) Here we see that > for >. So we get < for >. ii) Again, < +. iii)we assume, = * = * M = M = [Since = M ]. 66

Computational Complexities of the External Sorting Algorithms with o Additional Disk Space However M and M <. So we can write <. iv) Again we assume, = + M = S+ [ S = Total number of blocks]. However M < S ; Since M = and S >. Thus M < S +. Hence < +. So we found that <, < +, < and < +. Thus we can write + + + < + + + + +. Consequently + < +. Thus we find that the number of write operations of the algorithm proposed by Dufrene and Lin has a better performance to the algorithm proposed by M. R. Islam et al. We can also say that the number of write operations of the algorithm proposed by Dufrene and Lin has a better performance to the algorithm proposed by M.. Adnan et al. that means the output complexity of the algorithm proposed by Dufrene and Lin is less than that of the algorithm proposed by M. R. Islam et al. and that of the algorithm proposed by M.. Adnan et al. The increment of output complexity of the algorithm proposed by M. R. Islam et al. from the algorithm proposed by Dufrene and Lin is calculated and given in Table. Table. Comparison of output complexity of M. R. Islam et al. s algorithm and Dufrene and Lin s algorithm. External file Size (M) Size RAM size (M) Ratio of output complexity Increment of output complexity (%) 80 64 9.04 804 60 64 0.93 993 30 64 6.6 56 640 64 9.03 803 80 64 30.48 948 Here, ratio of output + complexity =. + At the last column of Table, we have calculated the increment of output complexity (in percentage) of the algorithm proposed by M. R. Islam et al. from the algorithm proposed by Dufrene and Lin. Using output complexity for various size of external file from Table, the chart of Figure 6 is projected. In the chart we see that Dufrene and Lin s algorithm shows better performance than M. R. Islam et al. s algorithm in case of the number of output operations. International Journal of The Computer, the Internet and Management Vol. 3.o.3 (September-December, 005) pp 60-68 67

Output Complexity o. of Output Operations 60000 50000 40000 30000 0000 0000 0 80 60 30 640 80 Dufrene and Lin's Algorithm M. R. Islam et al.'s algorithm External File Size (M) Figure 6. Output complexity for various size of external file 5. Conclusion We have analyzed the I/O complexities of the external sorting algorithms with no additional disk space. Here the I/O complexities of the algorithms are compared. From the analysis we found that the input (read) complexities of the algorithms are the same. However, the output (write) complexity of Dufrene and Lin s algorithm is less than that of the other two algorithms whereas the output (write) complexities of M.. Adnan s algorithm and M. R. Islam s algorithm are the same. The time complexity of the algorithm proposed by M.. Adnan et al. is less than that of the algorithm proposed by Dufrene and Lin as shown in [6] and that the algorithm proposed by M. R. Islam et al. is also less than that of the algorithm proposed by Dufrene and Lin as shown in [7]. Here is an open research question how the output (write) complexity of M. R. Islam s algorithm can be reduced with the same time complexity. References [] E. E. Lindstorm, J. S. Vitter (985), The design and analysis of ucket-sort for bubble memory secondary storage, IEEE Trans. Comput. C-34 (3) 8-33. [] Clifford A. Shaffer (997). A Practical Introduction to Data Structures and Algorithm Analysis. Prentice-Hall [3] D. E. Knuth (985), Sorting and Searching, The Art of Computer Programming, Vol. 3, Addison Wesley, Reading, MA, (985). [4]. Singh and T. L. aps (985), Introduction to Data Structure, West publishing Co, St. Paul, M,. [5] W. R. Dufrene, F. C. Lin (99). An efficient sorting algorithm with no additional space, Comput. J. 35 (3) [6]. Adnan, R. Islam,. Islam, S. Hossen (00), A faster hybrid external sorting algorithm with no additional disk space, Presented at International Conference on Computer and Information Technology (ICCIT), 7-8 December (Dhaka, angladesh). [7] R. Islam,. Adnan,. Islam, S. Hossen (003), A new external sorting algorithm with no additional disk space, Information Processing Letters 86 9-33. 68