Use of Tree-based Algorithms for Internal Sorting

Similar documents
International Journal of Scientific & Engineering Research, Volume 4, Issue 7, July ISSN

Multiple Pivot Sort Algorithm is Faster than Quick Sort Algorithms: An Empirical Study

Algorithms in Systems Engineering ISE 172. Lecture 16. Dr. Ted Ralphs

Binary Trees

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs

Data Structure. IBPS SO (IT- Officer) Exam 2017

COMP Data Structures

12 Abstract Data Types

Balanced Binary Search Trees

Smart Sort and its Analysis

having any value between and. For array element, the plot will have a dot at the intersection of and, subject to scaling constraints.

1) What is the primary purpose of template functions? 2) Suppose bag is a template class, what is the syntax for declaring a bag b of integers?

Operations on Heap Tree The major operations required to be performed on a heap tree are Insertion, Deletion, and Merging.

CS301 - Data Structures Glossary By

Trees. (Trees) Data Structures and Programming Spring / 28

Computer Science 210 Data Structures Siena College Fall Topic Notes: Priority Queues and Heaps

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard

Lecture Notes for Advanced Algorithms

Do Hypercubes Sort Faster Than Tree Machines?

List Sort. A New Approach for Sorting List to Reduce Execution Time

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree

CSE 530A. B+ Trees. Washington University Fall 2013

Report Seminar Algorithm Engineering

We can use a max-heap to sort data.

Data Structures Lesson 9

Data Structures Lesson 7

CIS265/ Trees Red-Black Trees. Some of the following material is from:

Question Bank Subject: Advanced Data Structures Class: SE Computer

Sorting is a problem for which we can prove a non-trivial lower bound.

arxiv: v3 [cs.ds] 18 Apr 2011

Dual Sorting Algorithm Based on Quick Sort

On the other hand, the main disadvantage of the amortized approach is that it cannot be applied in real-time programs, where the worst-case bound on t

21# 33# 90# 91# 34# # 39# # # 31# 98# 0# 1# 2# 3# 4# 5# 6# 7# 8# 9# 10# #

Sorting. Bubble Sort. Pseudo Code for Bubble Sorting: Sorting is ordering a list of elements.

Pseudo code of algorithms are to be read by.

Multiway searching. In the worst case of searching a complete binary search tree, we can make log(n) page faults Everyone knows what a page fault is?

Module Contact: Dr Geoff McKeown, CMP Copyright of the University of East Anglia Version 1

CS350: Data Structures B-Trees

ADAPTIVE SORTING WITH AVL TREES

Smoothsort's Behavior on Presorted Sequences. Stefan Hertel. Fachbereich 10 Universitat des Saarlandes 6600 Saarbrlicken West Germany

--(1977b), 'Finite State Machine Theory and Program Design: A Survey', Computer Studies Quarterly, 1, to be published.

Comparing Implementations of Optimal Binary Search Trees

What is a Multi-way tree?

Three Sorting Algorithms Using Priority Queues

Abstract Data Structures IB Computer Science. Content developed by Dartford Grammar School Computer Science Department

Dynamic Memory Allocation for CMAC using Binary Search Trees

4.1 COMPUTATIONAL THINKING AND PROBLEM-SOLVING

Analysis of Algorithms

STRUCTURE EXITS, NOT LOOPS

Cpt S 122 Data Structures. Data Structures Trees

Bioinformatics Programming. EE, NCKU Tien-Hao Chang (Darby Chang)

Announcements. Midterm exam 2, Thursday, May 18. Today s topic: Binary trees (Ch. 8) Next topic: Priority queues and heaps. Break around 11:45am

UNIT III BALANCED SEARCH TREES AND INDEXING

Design and Analysis of Algorithms Lecture- 9: B- Trees

B-Trees. Introduction. Definitions

Lecture 13: AVL Trees and Binary Heaps

Chapter 17 Indexing Structures for Files and Physical Database Design

Week 10. Sorting. 1 Binary heaps. 2 Heapification. 3 Building a heap 4 HEAP-SORT. 5 Priority queues 6 QUICK-SORT. 7 Analysing QUICK-SORT.

Tree Structures. A hierarchical data structure whose point of entry is the root node

A Dualheap Selection Algorithm A Call for Analysis

DATA STRUCTURE : A MCQ QUESTION SET Code : RBMCQ0305

Search Trees. The term refers to a family of implementations, that may have different properties. We will discuss:

Data Structures and Algorithms 2018

Unit 8: Analysis of Algorithms 1: Searching

Lab 4. 1 Comments. 2 Design. 2.1 Recursion vs Iteration. 2.2 Enhancements. Justin Ely

A Secondary storage Algorithms and Data Structures Supplementary Questions and Exercises

COSC 311: ALGORITHMS HW1: SORTING

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan

Exercise 1 : B-Trees [ =17pts]

HEAPS ON HEAPS* Downloaded 02/04/13 to Redistribution subject to SIAM license or copyright; see

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

CS 350 : Data Structures B-Trees

Hierarchical data structures. Announcements. Motivation for trees. Tree overview

Draw a diagram of an empty circular queue and describe it to the reader.

About this exam review

Week 10. Sorting. 1 Binary heaps. 2 Heapification. 3 Building a heap 4 HEAP-SORT. 5 Priority queues 6 QUICK-SORT. 7 Analysing QUICK-SORT.

PART IV. Given 2 sorted arrays, What is the time complexity of merging them together?

CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019

Binary Trees. BSTs. For example: Jargon: Data Structures & Algorithms. root node. level: internal node. edge.

Sorting (I) Hwansoo Han

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

Computational Complexities of the External Sorting Algorithms with No Additional Disk Space

[ DATA STRUCTURES ] Fig. (1) : A Tree

Visit ::: Original Website For Placement Papers. ::: Data Structure

SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 6. Sorting Algorithms

Analysis of Algorithms

Sorting Algorithms. Slides used during lecture of 8/11/2013 (D. Roose) Adapted from slides by

Sorting: Given a list A with n elements possessing a total order, return a list with the same elements in non-decreasing order.

Multi-Way Number Partitioning

Priority Queues, Binary Heaps, and Heapsort

DIVIDE & CONQUER. Problem of size n. Solution to sub problem 1

Cpt S 122 Data Structures. Course Review Midterm Exam # 1

Quick Sort. CSE Data Structures May 15, 2002

Search Trees. Undirected graph Directed graph Tree Binary search tree

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

ECE 242 Data Structures and Algorithms. Advanced Sorting II. Lecture 17. Prof.

Trees. Courtesy to Goodrich, Tamassia and Olga Veksler

Triangulation: basic graphics algorithms

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

Transcription:

Use of Tree-based s for Internal Sorting P. Y. Padhye (puru@deakin.edu.au) School of Computing and Mathematics Deakin University Geelong, Victoria 3217 Abstract Most of the large number of internal sorting algorithms are based on the use of arrays or linked lists. This report highlights the use of tree structures for internal sorting. It is shown that the ordered binary tree can be used very effectively for sorting data, whatever the initial ordering of the input may be. It is also shown that the B-Trees, normally used to process large files on disks, can also be effectively employed to sort data in main memory. In both cases, a simple inorder traversal of the tree can retrieve data in sorted order. s based on binary- and B-tree structure are compared with Quick Sort and Heap Sort algorithms in terms of times required to sort various input lists. Input lists of various sizes and degrees of 'presortedness' were used in the comparison tests. It is shown that the times required to build balanced ordered binary trees are the same as or less than the times required to sort the data using Quick Sort and always better than the times required using Heap Sort. The time required to build a B- Tree varies with the order of the B-Tree. B-Trees of order 1, 2, 4, 8 and 16 have been compared to find that B-Trees of order 4 are most suitable. The times to build B-Trees of order 4 are shown to be same as or better than Heap Sort and slightly slower than Quick Sort. Further tests conducted to compare the times required for a complete processing task also found the ordered binary tree performance close to Quick Sort and the B-Tree performance as good as or better than Heap Sort. 1. Introduction "Sorting is the most-studied and most-solved problem in the theory of s" [Kingston, 1990]. "Sorting is an almost universally performed, fundamental, relevant and essential activity, particularly in data processing" [Wirth, 1976]. The topic has been researched for many years and several books are written on this topic [Knuth, 1973], [Mehlhorn, 1984], [Sedgewick, 1980]. Sorting and searching are also important topics in the area of data structures. Hence research in this area is still lively and relevant. By far the most commonly used data structure for internal sorting is the array. Tree structures like ordered binary trees, B-Trees are usually considered for applications which involve searching. It is, however, not widely recognised that these tree structures are suitable for sorting as well as searching. The tree structures are useful for sorting because they partition the data and retrieving data in key sequence is effected by a simple inorder traversal. Simply building the tree is thus tantamount to sorting. Further, the tree structures provide faster random access and maintain volatile data better than arrays or linked lists. One more advantage of using tree structures for sorting is that the sorting is progressive. At any intermediate stage the sorted output of the data seen up to that stage can be produced. The entire data does not have to be read before starting the sorting. Quick Sort, Heap Sort and most of the array based sorting algorithms except Insertion Sort, need to read the entire data first into an array.

2 The best case for building an ordered binary tree occurs when the input data is in random order. The time to build the tree in this case is O(nlogn). If the input is already nearly in the right or reverse order the ordered binary tree degenerates into a linear linked list and the time to build is O(n^2). To maintain the desirable O(nlogn) performance the tree has to be balanced. This can be done by first reading the input into an array and then building the binary tree by reading randomly from the array. In this case, however, the building of the tree can only progress when the entire data is available. Alternatively, a B-Tree may be used. Though the B-Tree structure is often used with large files on disks with two-level storage it can also be used for small files in one-level main storage. The B-Tree is always balanced, and it has a larger branching factor than a binary tree, resulting in a shorter average search path length. It is practically insensitive to the 'presortedness' of the input data. It is also more suitable than arrays or binary tree to handle any number of subsequent additions or deletions. This report assesses the use of ordered binary trees and B-Trees to sort and process input varying in size from 64 to 4096 integers. The input also varied in the initial ordering: ascending, largely ascending, random, largely descending, and descending orders were tested. B-Trees of order 1, 2, 4, 8, and 16 were compared before choosing order 4 as the suitable one for the tests. The times required to build the ordered binary trees and B- Trees are comparable with times required to sort the input using the better array sorting methods (O(nlogn)) such as Quick Sort and Heap Sort. It may be noted that the Heap Sort algorithm is also based on a tree structure, the complete binary tree, implemented as an array. The Quick Sort algorithm also works in a manner similar to an ordered binary tree and may be considered to be a pseudo-tree based algorithm. The different versions of binary tree algorithms used in the tests are described in Section 2. Three versions of B-Tree implementation are compared in Section 3. Effect of the order of a B-Tree on the building time is considered in Section 4. Section 5 reports the measured times to build binary- and B-Trees and compares these with the times required to sort the data using Quick Sort and Heap Sort. Section 6 describes similar comparisons of the 'task times'. The task was to read the input and printout the data in ascending order to the screen, formatted 8 integers per line and including checking for error in sorting. In all of the following sections it is assumed that the sorting is required in ascending (non decreasing) order with duplicates allowed. 2. Binary Tree s The conventional ordered binary tree building algorithm, called here Bintreel, uses non recursive search, dynamic pointer variables for the links, and the Pascal procedure new( ) to get the tree nodes. A slight improvement in the building time is achieved by implementing the binary tree as an array of records, getting the new nodes from the static array, and using array subscripts as the links. This version based on the static array is called Bintree2 here. Both Bintree1 and Bintree2 are unsuitable if the input is not in random order. Tests reported in Section 5 confirm the dramatic increase in building times when the input is in ascending or descending order. To overcome this drawback a version called here Bintree3 is developed. In this version the input is first read into an array to be used for the tree. At this stage the root and left/right links are not assigned. Then this array is read randomly. The root and left/right links of the nodes are set up during this random reading of the array. Tests described in Section 5 confirm that this approach is very successful, even allowing for the extra time required to read the data into an array, and the time required to generate an array of random subscripts to be used to read the data array randomly.

3 One drawback of the Bintree3 algorithm is that the binary tree is nearly but not perfectly balanced. A better alternative is to use the B-Tree structure. 3. B-Tree s The B-Tree presents an ideal solution to overcome the drawbacks of the ordered binary tree. It maintains a perfect balance for input in any order. The balance is retained under any number of subsequent additions or deletions. Its search paths are also shorter due to its large branching factor. In order to test its use for sorting and processing data in main memory the following versions of the B-Tree implementation were considered:. B-Tree1 : The B-Tree is implemented as an array of nodes, each node holding arrays of data and pointers (node-subscripts). The B-tree is built using nonrecursive search procedure. Backtracking up the tree in case of node overflow during insert operation is aided by maintaining a stack of pointers to nodes on the search path.. B-Tree2 : This is similar to B-tree1 except for one feature. In addition to the pointer to each node on the search path, the rank of the child to which the parent pointed during the search was also pushed on to the stack. This meant that the parent node when visited again due to the splitting of the child node, was not searched again for the position to place the overflown key.. B-Tree3 : This version uses dynamic implementation. The B-Tree nodes are obtained as required using the Pascal new() procedure. The records within each node are also maintained as a dynamic ordered linked list. Whereas the static array implementations B-Tree1, B-Tree2 may waste some space (some nodes may be only half full), this version uses only as much space as required. The list-links in each node, however, do take up some space. Times required to build a B-tree using the above versions B-Tree1/2/3, reading input from the file rndnos.4k containing integers in random order, are given in Table 1. Table 1. Comparison of different versions B-Tree (of order 4) Version Sorting Time (seconds) Input in random order No. of integers sorted in ascending order B-Tree1 0.05 0.11 0.24 0.47 0.90 2.02 4.55 B-Tree2 0.05 0.11 0.24 0.46 0.94 2.09 4.68 B-Tree3 0.06 0.12 0.24 0.48 1.11 2.35 4.94 Results of Table 1 show that there is not much difference between B-Tree1 and B- Tree2, and both are slightly faster than B-Tree3. Hence, B-Tree1, being also the simplest of the three, was chosen for the rest of the tests reported here.

4 4. Effect of the order of the B-Tree The order of a B-Tree is defined differently by different people. In this report the following definition ([Wirth, 1976]) is used: A B-Tree of order d has at the most 2d keys in any node. Its root node may have a minimum of one key, while all other nodes have a minimum of d keys. The order of a B-Tree has some impact on all the B-tree operations. To study this effect, times required to build B-Trees of order 1, 2, 4, 8, and 16 were measured using files containing integers as input. The times are compared in Table 2. Table 2. Effect of the order of a B-Tree on the time for building the B-Tree. Building Time (seconds) No. of integers inserted in the B-Tree Version B-Tree1 A. Input in ascending order (input file ascdnos.4k) order 1 0.06 0.12 0.28 0.69 1.40 2.70 5.25 order 2 0.05 0.13 0.23 0.63 1.31 2.62 5.11 order 4 0.05 0.12 0.25 0.65 1.33 2.64 5.15 order 8 0.05 0.12 0.30 0.71 1.39 2.66 5.45 order 16 0.06 0.15 0.31 0.77 1.65 3.33 6.73 B. Input in random order (input file rndnos.4k) order 1 0.05 0.13 0.25 0.50 1.14 2.40 5.05 order 2 0.05 0.12 0.24 0.45 0.88 1.97 4.48 order 4 0.05 0.11 0.24 0.47 0.90 2.02 4.55 order 8 0.05 0.14 0.26 0.53 1.16 2.45 5.08 order 16 0.05 0.13 0.30 0.65 1.26 2.53 5.20 C. Input in descending order (input file descdnos.4k) order 1 0.06 0.12 0.29 0.60 1.24 2.52 5.06 order 2 0.05 0.12 0.26 0.48 0.92 1.85 3.78 order 4 0.05 0.12 0.26 0.47 0.91 1.77 3.55 order 8 0.05 0.12 0.27 0.48 0.91 1.79 3.55 order 16 0.05 0.13 0.31 0.64 1.27 2.54 5.09 Table 2 shows that the effect of the order of the B-Tree on building time is not pronounced when the number of elements to be inserted is small ( <= 256 ). For moderate to large input size (between 256 to 4096 elements) the building times seem to be influenced by the order of the B-Tree as well as the order of the input data. When the input is in ascending or random order, times for order 2 and 4 are better than those for order 1, 8 and 16. When the input is in descending order, times for order 4 and 8 are better than the rest. When the order of a B-Tree is large the search path length is reduced, but the time required to insert data in a node is increased. When the B-Tree order is small the search path length is increased, but the time to insert data in a node is decreased. Hence a choice of order 4 seems to be a good compromise for a B-Tree in main memory for input data in any order.

5. Comparison of the performance of tree-based algorithms 5 The algorithms Bintree1/2/3 (Section 2) and B-Tree1 (Section 3 and 4) were tested on several input lists with varying degrees of presortedness. The lists contained integers in ascending, descending, random, roughly ascending or roughly descending order. To supply the test lists the following files containing integers were generated.. rndnos.4k 4096 random integers were generated using the Sun Pascal function random( ) on eros, a SUN-4 minicomputer, and written to this file.. ascdnos.4k 4096 integers from rndnos.4k file were sorted in ascending order and written to this file.. descdnos.4k 4096 integers from rndnos.4k file were sorted in descending order and written to this file.. incnos.4k contained 4096 random integers in roughly ascending order. The integers were generated using equation 1: n = 1024*k + r (1) for k = 1,2,3,...,4096 where r is a random integer in the range 0 to 16383 [Hale,1992].. decnos.4k contained 4096 random integers in roughly descending order. The integers were generated using equation 2: n = 1024*(4097-k) + r (2) for k = 1,2,3,...,4096 with r as above [Hale, 1992]. The input lists were also sorted using.heap Sort [Williams, 1964], and.quick Sort [Sedgewick, 1980]. A non-recursive, fine-tuned version of Quick Sort, which uses Insertion Sort when the size of a sub-array is 20 or less and uses a random element as a pivot during the partitioning process, is implemented. In all the Tables 3 to 7 in this section the B-Tree referred to is of order 4. The times measured in the tests were obtained on an IBM PC compatible using Turbo Pascal programs. In each case the time reported is the average time of ten execution runs. The measurements include time required for the all the operations for building the trees (Bintree1/2/3 and B-Tree1) or sorting the data (Heap Sort, Quick Sort). The times measured in seconds are given in the Tables 3 to 7. They may be considered to be adequate for comparison purposes only. The programs ran very much faster when executed on eros, a Sun-4 minicomputer.

Table 3. Comparison of Tree-based Sorting s (Input file ascdnos.4k) _ Sorting Time (seconds) Input in ascending order 6 No. of integers sorted in ascending order _ Bintree1 0.08 0.22 0.74 2.71 9.96 38.07 148.75 Bintree2 0.06 0.21 0.67 2.43 8.79 34.88 129.77 Bintree3 0.04 0.10 0.17 0.44 0.87 1.65 3.28 B-Tree1 0.05 0.12 0.25 0.64 1.32 2.64 5.14 Heap Sort 0.06 0.13 0.26 0.61 1.23 2.42 4.95 Quick Sort 0.04 0.10 0.18 0.45 0.88 1.67 3.37 _ The results shown in Table 3 confirm that the times required to build an ordered binary tree (Bintree1) are very large when the input is already ordered. Static array implementation of the binary tree (Bintree2) is only marginally better. The version Bintree3, which randomizes input before use, improves the performance very significantly to the level of Quick Sort and better than Heap Sort. Times for building B-Tree are comparable to Heap Sort. Table 4. Comparison of Tree-based Sorting s (Input file incnos.4k) Sorting Time (seconds) Input in roughly ascending order No. of integers sorted in ascending order Bintree1 0.05 0.16 0.38 1.09 3.49 12.25 45.06 Bintree2 0.05 0.12 0.35 0.95 3.10 11.01 40.58 Bintree3 0.05 0.11 0.20 0.37 0.73 1.52 3.15 B-Tree1 0.05 0.11 0.24 0.56 1.19 2.49 5.03 Heap Sort 0.06 0.12 0.25 0.52 1.06 2.28 4.77 Quick Sort 0.05 0.11 0.20 0.37 0.76 1.64 3.39 Check Sort 0.04 0.10 0.19 0.37 0.73 1.48 2.95 The results of Table 4 also confirm that the input in roughly ascending order is not suitable for building an ordered binary tree. The static array version Bintree2 is somewhat better than the dynamic version Bintree1. The version Bintree3, which randomizes input before use, improves the performance very significantly to the level of Quick Sort and much better than B-Tree and Heap Sort. Times for building B-Tree are comparable to Heap Sort.

Table 5. Comparison of Tree-based Sorting s (Input file rndnos.4k) 7 Sorting Time (seconds) Input in random order No. of integers sorted in ascending order Bintree1 0.06 0.11 0.25 0.44 0.89 1.75 3.60 Bintree2 0.04 0.12 0.24 0.43 0.88 1.75 3.55 Bintree3 0.05 0.11 0.20 0.38 0.74 1.53 3.19 B-Tree1 0.05 0.11 0.24 0.47 0.90 2.02 4.55 Heap Sort 0.04 0.12 0.24 0.49 1.05 2.25 4.76 Quick Sort 0.05 0.10 0.21 0.39 0.79 1.71 3.54 Table 5 indicates that when the data is in random order all the methods have comparable times when the input size is not very large (up to 512 elements). For large inputs Bintree3 and Quick Sort continue to be the best, but the ordinary binary tree versions Bintree1 and Bintree2 are very close to them. Table 6. Comparison of Tree-based Sorting s (Input file decnos.4k) Sorting Time (seconds) Input in roughly descending order No. of integers sorted in ascending order _ Bintree1 0.05 0.16 0.37 1.10 3.57 12.74 47.27 Bintree2 0.05 0.12 0.33 0.99 3.27 11.57 42.94 Bintree3 0.06 0.11 0.18 0.38 0.72 1.60 3.15 B-Tree1 0.05 0.11 0.23 0.45 0.88 1.85 3.58 Heap Sort 0.06 0.12 0.24 0.48 1.02 2.21 4.57 Quick Sort 0.04 0.10 0.20 0.41 0.81 1.77 3.53 Table 6 shows that when data is in roughly descending order, Bintree1 and Bintree2 perform poorly. Bintree3 continues to perform very well, marginally better than Quick Sort. B-Tree1 performance is close to Quick Sort and somewhat better than Heap Sort.

Table 7. Comparison of Tree-based Sorting s (Input file descdnos.4k) _ 8 Sorting Time (seconds) Input in descending order No. of integers sorted in ascending order _ Bintree1 0.06 0.22 0.70 2.32 8.41 32.17 126.38 Bintree2 0.07 0.20 0.63 2.13 7.64 29.18 114.56 Bintree3 0.04 0.10 0.22 0.40 0.77 1.54 3.14 B-tree1 0.05 0.12 0.26 0.48 0.91 1.79 3.55 Heap Sort 0.06 0.12 0.25 0.52 1.05 2.18 4.59 Quick Sort 0.05 0.11 0.23 0.42 0.86 1.72 3.53 _ Table 7 demonstrates that when input data is already sorted in reverse order the results are similar to the case of data in roughly descending order. Bintree1 and Bintree2 perform very poorly. The best performance is of Bintree3, ahead of even Quick Sort. B- Tree1 is close to Quick Sort and better than Heap Sort in this case. 6. Comparison of times required for a complete 'task' The tree-based algorithms 'sort' the input by building the tree. In the case of array based sorting algorithms the sorted data is output apparently more easily as the tree traversal algorithms are not as simple as the array traversal. To test this factor, and to give the comparison a more practical basis, times for a complete task were measured and compared. The task involved reading the input file from a disk, sorting the data into an array or building a binary or B-Tree, and output the data in ascending order to screen. The output was to be checked for any sorting error and formatted 8 integers to a line. The tree traversal part for Bintree3 and B-Tree1 was tackled using both the recursive and non-recursive options. The recursive algorithms are labeled Bintree3r and B-Tree1r. The non-recursive counter parts are labelled Bintree3nr and B-Tree1nr. The results are given in Tables 8 to 12. In all these tests the B-Tree referred to is of order 4.

Table 8. Task Times Comparison of Tree-based s (Input file ascdnos.4k) 9 Task Time (seconds) Task involved sorting given input on disk file and printing output on screen in ascending order 8 integers per line, including checking for sorting error. Input in ascending order No. of integers printed out in ascending order Bintree1 0.64 1.32 2.90 6.96 18.24 53.99 178.58 Bintree3r 0.62 1.21 2.37 4.78 9.52 18.85 37.64 Bintree3nr 0.60 1.20 2.36 4.78 9.50 18.89 37.68 B-Tree1r 0.62 1.22 2.41 4.95 9.94 19.82 39.55 B-Tree1nr 0.65 1.25 2.44 5.00 9.99 19.82 39.58 Heap Sort 0.62 1.22 2.42 4.93 9.87 19.61 39.32 Quick Sort 0.61 1.21 2.36 4.79 9.50 18.92 37.77 Table 8 results confirm that the conventional ordered binary tree (Bintree1) performs very poorly when the input is already ordered. However, when the input is randomized before building the tree (Bintree3r/Bintree3nr) the performance is very good, matching the performance of Quick Sort. The B-Tree times are slightly slower than Quick Sort but comparable to Heap Sort. The times for recursive and non-recursive versions of the binary tree are almost the same. Similarly there is hardly any difference between the times for recursive and non-recursive versions of the B-Tree. Table 9. Task Times Comparison of Tree-based s (Input file incnos.4k) Task Time (seconds) Task involved sorting given input on disk file and printing output on screen in ascending order 8 integers per line, including checking for sorting error. Input in roughly ascending order No. of integers printed out ascending order Bintree1 0.63 1.27 2.58 5.39 12.07 29.40 79.40 Bintree3r 0.61 1.21 2.36 4.69 9.35 18.74 37.49 Bintree3nr 0.63 1.21 2.37 4.70 9.37 18.74 37.50 B-Tree1r 0.64 1.24 2.43 4.89 9.81 19.69 39.45 B-Tree1nr 0.64 1.25 2.45 4.90 9.83 19.73 39.50 Heap Sort 0.63 1.21 2.45 4.84 9.69 19.47 39.15 Quick Sort 0.61 1.21 2.37 4.72 9.38 18.85 37.78

10 Table 9 results confirm that the conventional ordered binary tree (Bintree1) performs poorly when the input is already roughly ordered. However, when the input is randomized before building the tree (Bintree3r/Bintree3nr) the performance is very good, even slightly better than Quick Sort. The B-Tree times are slightly slower than Quick Sort but comparable to Heap Sort. The times for recursive and non-recursive versions of the binary tree are almost the same. Similarly there is hardly any difference between the times for recursive and non-recursive versions of the B-Tree. Table 10. Task Times Comparison of Tree-based s (Input file rndnos.4k) Task Time (seconds) Task involved sorting given input on disk file and printing output on screen in ascending order 8 integers per line, including checking for sorting error. Input in random order No. of integers printed out ascending order Bintree1 0.62 1.23 2.42 4.79 9.51 18.97 37.99 Bintree3r 0.62 1.21 2.39 4.71 9.38 18.75 37.55 Bintree3nr 0.63 1.21 2.38 4.70 9.36 18.72 37.56 B-Tree1r 0.64 1.26 2.42 4.79 9.54 19.23 39.00 B-Tree1nr 0.64 1.25 2.47 4.82 9.55 19.24 38.98 Heap Sort 0.63 1.23 2.48 4.84 9.67 19.43 39.15 Quick Sort 0.62 1.21 2.39 4.75 9.43 18.91 37.89 Table 10 results confirm that the conventional ordered binary tree (Bintree1) performs well when the input is in random order. However, when the input is randomized again before building the tree (Bintree3r/Bintree3nr) the performance is improved, and is even slightly better than Quick Sort. The B-Tree times are slightly slower than Quick Sort but comparable to Heap Sort. For both the binary tree and the B-Tree, the recursive versions are as fast as their non-recursive counter parts.

Table 11. Task Times Comparison of Tree-based s (Input file decnos.4k) 11 Task Time (seconds) Task involved sorting given input on disk file and printing output on screen in ascending order 8 integers per line, including checking for sorting error. Input in roughly descending order No. of integers printed out in ascending order Bintree1 0.63 1.26 2.59 5.40 12.24 30.08 82.17 Bintree3r 0.62 1.20 2.39 4.68 9.35 18.78 37.51 Bintree3nr 0.61 1.22 2.39 4.70 9.36 18.79 37.52 B-Tree1r 0.63 1.21 2.42 4.78 9.51 19.04 37.96 B-Tree1nr 0.64 1.25 2.44 4.81 9.53 19.05 37.96 Heap Sort 0.62 1.24 2.44 4.82 9.63 19.39 38.93 Quick Sort 0.62 1.20 2.38 4.74 9.44 18.96 37.87 Check Sort 0.62 1.20 2.37 4.65 9.23 18.51 36.87 Table 11 results confirm that the conventional ordered binary tree (Bintree1) performs poorly when the input is already roughly ordered. When the input is randomized before building the tree (Bintree3r/Bintree3nr) the performance is improved, and is even slightly better than Quick Sort. The B-Tree times are also comparable to Quick Sort and slightly better than Heap Sort. For both the binary tree and the B-Tree, the recursive versions are as fast as their non-recursive counter parts. Table 12. Task Times Comparison of Tree-based s (Input file descdnos.4k) Task Time (seconds) Task involved sorting given input on disk file and printing output on screen in ascending order 8 integers per line, including checking for sorting error. Input in descending order No. of integers printed out in ascending order Bintree1 0.65 1.32 2.88 6.71 17.12 49.66 161.78 Bintree3r 0.62 1.22 2.40 4.73 9.38 18.75 37.50 Bintree3nr 0.64 1.21 2.36 4.73 9.39 18.74 37.51 B-Tree1r 0.64 1.22 2.44 4.80 9.53 19.00 37.94 B-Tree1nr 0.65 1.24 2.46 4.83 9.56 19.01 37.94 Heap Sort 0.63 1.22 2.47 4.85 9.67 19.38 38.95 Quick Sort 0.63 1.21 2.42 4.76 9.46 18.92 37.88 Check Sort 0.61 1.19 2.36 4.63 9.17 18.25 36.47

12 Table 12 results confirm that the conventional ordered binary tree (Bintree1) performs very poorly when the input is already ordered. When the input is randomized before building the tree (Bintree3r/Bintree3nr) the performance is improved, and is even slightly better than Quick Sort. The B-Tree times are also comparable to Quick Sort and slightly better than Heap Sort. For both the binary tree and the B-Tree, the recursive versions are as fast as their non-recursive counter parts. 7. Conclusions The report highlights the use of ordered binary tree and B-Tree structures in main memory for sorting and processing data in key sequenced order. Array implementation of the ordered binary tree is seen to be a little faster than the alternative based on dynamic variables. It is seen that B-Tree of order 4 performs slightly better than B-Trees of order 1, 2, 8 or 16. Also, the array implementation of a B-Tree, works faster than the alternative based on dynamic variables. It is shown that an ordered binary tree built by randomizing the input is a very effective method for sorting/processing the input. The Bintree3 algorithm using this approach handles input data in any initial order as fast as the fine-tuned Quick Sort and better than Heap Sort or B-tree1 algorithm. The search time for a given key is somewhat smaller in Bintree3 than the logically equivalent binary search of a sorted array, as Bintree3 uses a simple key comparison while the latter needs the integer divide operation to find the 'middle' position during any search. It is shown that the array based implementation, B-Tree1, builds a B-Tree in memory as fast as the Heap Sort algorithm sorting it in an array, and only marginally slower than the fine-tuned Quick Sort and Bintree3. Times required to complete a 'task' were also compared: the task included all the operations required to print the sorted, formatted output to screen. When outputting the data in sorted order, there is not much difference between the recursive and non-recursive inorder traversals. This applies to both the ordered binary tree and the B-Tree. The simplicity of recursive version makes tree traversal almost as easy to code as array or linked list traversal. The Bintree3 algorithm is again as fast as Quick Sort and better than Heap Sort to do the task. The B-Tree1 algorithm is somewhat slower than Bintree3 but comparable with Heap Sort. Since any number of subsequent additions or deletions will not affect its balance, the B-Tree1 algorithm may be preferred where the data is volatile. B-Tree may also be handy in applications where intermediate results are to be output, where the entire input is not available in one go or its size unknown.

13 Acknowledgements The author would like to thank Prof. Andrzej Goscinski and Mr.Bob Hale for their encouragement, criticism and useful comments. Thanks also to Judith Jamieson for all the secretarial help. References Hale, R.P., Teague, G.J., 1992. "SCP760 Data Structures Study Guide", Session 8, p. 14. Deakin University. Kingston, J.H. 1990. "s and Data Structures: Design, Correctness, Analysis" Addison-Wesley, Sydney. Knuth, D.E., 1973. "The Art of Computer Programming, Vol. 3: Sorting & Searching" Addison-Wesley, Reading, Mass. Mehlhorn, K., 1984. "Data Structures & s, Vol. 1: Sorting & Searching" Springer-Verlag, Berlin Heidelberg 1984 Sedgewick, R., 1980. "Quicksort." Garland Publishing Inc., New York & London. Williams, J.W.J., 1964. " 232: Heapsort." Communications of the A.C.M. Vol. 7, p. 701. Wirth, N., 1976. "s + Data Structures = Programs' Prentice-Hall, Inc. Englewood Cliffs, N.J.