4. SEARCHING AND SORTING LINEAR SEARCH

Size: px

Start display at page:

Download "4. SEARCHING AND SORTING LINEAR SEARCH"

Blanche Griffin
5 years ago
Views:

1 4. SEARCHING AND SORTING SEARCHING Searching and sorting are fundamental operations in computer science. Searching refers to the operation of finding the location of a given item in a collection of items. Sorting refers to the operations of arranging data in some given order, such as increasing or decreasing, with numerical data, or alphabetically, with character data. Searching : Searching is an operation which finds the location of a given element in a list. The search is said to be successful or unsuccessful depending on whether the element that is to be searched is found or not. The two standard searching techniques are: Linear search Binary search. LINEAR SEARCH Linear search is the simplest method of searching. In this method, the element to be found is sequentially searched in the list (Hence also called sequential search). This method can be applied to a sorted or an unsorted list. Hence, it is used when the records are not stored in order. Principle : The algorithm starts its search with the first available record and proceeds to the next available record repeatedly until the required data item to be searched for is found or the end of the list is reached. Algorithm : ALGORITHM LINEARSEARCH(K, N, X ) // K is the array containing the list of data items // N is the number of data items in the list // X is the data item to be searched Repeat For I = 0 to N -1 Step 1 If K( I ) = X Then WRITE( ELEMENT IS PRESENT AT LOCATION I) QUIT Else I I+1 End If End Repeat WRITE( ELEMENT NOT PRESENT IN THE COLLECTION ) End LINEARSEARCH Page 1

2 In the above algorithm, K is the list of data items containing N data items. X is the data item, which is to be searched in K. If the data item to be searched is found then the position where it is found will be displayed. If the data item to be searched is not found then the appropriate message will be displayed to indicate the user, that the data item is not found. The data item X is compared with each and every element in the list K During this comparison, if X matches with a data item in K, then the position where the data item was found will gets displayed and the control comes out of the loop and the procedure comes to an end. If X does not match with any of the data items in K, then finally the element not found will be displayed. Example: X Number to be searched : 40 i = 0 i =1 i = 2 i = 3 i = 4 i = 5 i = 6 i = 7 i = 8 i = 9 X K[0] X K[1] X K[2] X K[3] X K[4] X K[5] X K[6] X = K[7] I = 7 : Number found at location 7 i.e., as a 8 th element Page 2

3 Program: //Linear Search #include <stdio.h> #include <conio.h> int k[10],n=10; search(int x) { for (int i=0; i<n; i++) { if (k[i]==x){ printf( Element present at location %d, i); exit(0); printf( \nelement not found in the collection ); void main() { clrscr(); int i,x; printf( \nenter the elements ); for (i=0; i<10; i++) scanf( %d,.&k[i]); printf( \nenter the element that you want to search ); scanf( %d,&x); search(x); The search( ) function gets the number to be searched in the variable x as a argument and compares it with each and every element in the array K. If the number x is found in the array, then the position i, where it is found will gets printed. If the number is not found in the entire list, then the function will display the not found message to the user. In the main( ) function receives the n values from the user and stored in the array K. The user is prompted to enter the number to be searched and is passed to the search( ) function as a argument. The search which receives the value x will give the appropriate message. Advantages: 1. Simple and straight forward method. 2. Can be applied on both sorted and unsorted list. Disadvantages: 1. Inefficient when the number of data items in the list increases. Page 3

4 BINARY SEARCH Binary search method is very fast and efficient. This method requires that the list of elements be in sorted order. Binary search cannot be applied on an unsorted list. Principle: The data item to be searched is compared with the approximate middle entry of the list. If it matches with the middle entry, then the position will be displayed. If the data item to be searched is lesser than the middle entry, then it is compared with the middle entry of the first half of the list and procedure is repeated on the first half until the required item is found. If the data item is greater than the middle entry, then it is compared with the middle entry of the second half of the list and procedure is repeated on the second half until the required item is found. This process continues until the desired number is found or the search interval becomes empty. Algorithm: ALGORITHM BINARYSEARCH(K, N, X) // K is the array containing the list of data items // N is the number of data items in the list // X is the data item to be searched Lower 0, Upper N 1 While Lower Upper Mid ( Lower + Upper ) / 2 If (X < K[Mid])Then Upper Mid -1 Else If (X>K[Mid]) Then Lower Mid + 1 Else Write( ELEMENT FOUND AT, MID) Quit End If End If End While Write( ELEMENT NOT PRESENT IN THE COLLECTION ) End BINARYSEARCH In Binary Search algorithm given above, K is the list of data items containing N data items. X is the data item, which is to be searched in K. If the data item to be searched is found then the position where it is found will be printed. If the data item to be searched is not found then Element Not Found message will be printed, which will indicate the user, that the data item is not found. Initially lower is assumed 0 to point the first element in the list and upper is assumed as N-1 to point the last element in the list because the range of any array is 0 to N-1. The mid position of the list is calculated by finding the average between lower and upper and X is Page 4

5 compared with K[mid]. If X is found equal to K[mid] then the value mid will gets printed, the control comes out of the loop and the procedure comes to an end. If X is found lesser than K[mid], then upper is assigned mid 1, to search only in the first half of the list. If X is found greater than K[mid], then lower is assigned mid + 1, to search only in the second half of the list. This process is continued until the element searched is found or the collection becomes becomes empty. Example: X Number to be searched : 40 U Upper L Lower=N-1 M Mid i = 0 i =1 i = 2 i = 3 i = 4 i = 5 i = 6 i = 7 i = 8 i = L = 0 M = (0+9)/2 =4 U = 9 X< K[4] U = 4 1 = L = 0 M = (0+3)/2=1 U = 3 X > K[1] L = = L, M = 2 U = 3 K > A [2] L = = L, M, U = 3 K = A[3] P = 3 : Number found at position 3 Page 5

6 Program: #include<stdio.h> #include<conio.h> int K[10],N=10; binarysearch(int x) { int mid, lower=0, upper=n-1; while (lower<=upper) { mid=(lower+upper)/2; if (K[mid]>X) upper=mid-1; else if(k[mid]<x) lower=mid+1; else Write( Element Found at location, MID); Quit Write( Element not found in the collection ) main(){ int i,x; printf( \nenter the elements ); for(i=0;i<n;i++) scanf( %d,&k[i]); printf( \nenter the element to search ); scanf( %d,&x); binarysearch(x); The binarysearch( ) function gets the element to be searched in the variable X. Initially lower is assigned 0 and upper is assumed N 1. The mid position is calculated and if K[mid] is found equal to X, then mid position will gets displayed. If X is less than K[mid] upper is assigned mid 1 to search only in first half of the list else lower is assigned mid + 1 to search only in the second half of the list. This is process is continued until lower is less than or equal to upper. If the element is not found even after the loop is completed, then the Not Found Message will be displayed to the user indicating that the element is not found. Advantages: 1. Searches several times faster than the linear search. 2. In each iteration, it reduces the number of elements to be searched from n to n/2. Disadvantages: 1. Binary search can be applied only on a sorted list. Page 6

7 HASH SEARCH Hash search is one of the fastest searching mechanisms used to locate the given element in the collection. This searching method uses the concept called Hashing to get the address of the location in the table to store/retrieve the element. Hashing is the process of applying the hashing function to the given value to get the address of the location for its storage or retrieval. There are so many hashing techniques which can be used or a user defined function may also be used. These functions may do some arithmetic manipulation on the given value and return the result. The result will be treated as an address for the given element. Some of the hashing functions are 1. Division method 2. Mid Square Method 3. Folding method The table where the values get stored is called Hash Table. The Hash table is the collection of several buckets say HT(0),HT(1). Each bucket has the capacity to hold any number of values which will be decided by the designer of the table. The buckets may have the respective number of slots and each slot being large enough to hold one element. If the bucket size=1 then each bucket can hold only one element. Insert X into Hash table: The given value X which has to be inserted into the hash table will be given as an input to the hashing function. The function will return the address for its storage. While storing the record/element, if the address calculated is the one which already contain some data, the collision said to be occurred. In order to overcome this problem any of the collision resolution techniques will be used. One of the most widely used techniques is linear probing. The sequential search is made in the linear probing to find out the next empty location & the value gets stored there. If all the locations are occupied by some elements then overflow is said to be occurred. Algorithm: ALGORITHM INSERTHT(HT, N, X) // HT is the Hash table // N is the number of buckets // X is the data item to be insert Begin I 1 Y X MOD N While(I<=N) Begin IF (HT(Y)= ) THEN HT(Y) X Quit Else Y (Y+1)MOD N I I+1 End Write( Table is Full ) End Page 7

8 In the above algorithm HT is the Hash table which has been initialized with a special value say. The algorithm receives three values as arguments. The given value X will be given to hashing function to get the address for its storage. The search is made in the location Y of the hash table HT. If the specified location is empty then it will be stored. The search for the next empty location will be made if the location is having some element with it. Search X in the Hash table: The value X will be given to the hashing function to get the address for the search. If the corresponding location of the hash table contains the value X means the successful search message will be displayed. The search will be made on the remaining buckets if the first search gets failed. The Unsuccessful search message will be given to the user if the element cannot be found in the remaining N-1 buckets. Algorithm: ALGORITHM SEARCHHT(HT, N, X) // HT is the Hash table // N is the number of buckets // X is the data item to be search Begin I 1 Y X MOD N While(I<=N) Begin IF (HT(Y)=X) THEN Write( X IS PRESENT ) Quit Else Y (Y+1)MOD N I I+1 End Write( X IS ABSENT ) End Page 8

9 INDEXED SEQUENTIAL SEARCH Indexed Sequential Files: The sequential file structures are inefficient during the searching process. But all the data base activities involve searching of a particular record and then doing some process over it. But in case of sequential files, the delay in retrieving the records is so high particularly for those which are stored at the end of the files. Hence to overcome this difficulty, a technique called indexing is adopted where the records are stored as blocks in the memory and the highest key value of each block is stored in an index file. Whenever a record is to be searched, its key value is compared with the index file and the control is taken directly to the particular memory block where the record is present. This eliminates the need of searching through all the records which are stored in other memory blocks before that. Now the sequential search is done only within the block where the record is present whereas the block is directly accessed using the index value. This greatly reduces the number of comparisons before searching a particular record. An index file consists of three areas. They are Prime area Index area Overflow area The prime area is where the records of the file are stored when the sequential file is initially created. Consider that the external storage device is a hard disk which consists of many cylinders. Each cylinder in a hard disk consist so many tracks and sectors. The prime area is starts from track 1 leaving the track 0 empty. All the records in the file are stored in the prime area starting from track 1. At the end of the prime area in every cylinder, some extra space is left for accommodating extra records which may come in the future. This area is called the overflow area. This is used when in each cylinder there is an overflow of records after the prime area is completely filled. In some cases, there is a master overflow area which is used to hold the overflow of records when all the overflow areas are also filled up. The third important area in an indexed sequential file is called index area which occupies track 0 in every cylinder. The highest key value in every track of the prime area is stored in the index area along with the track number. Thus the index area maintains the index of records in stored in all the tracks of a cylinder. Hence this type of index is also called track index. There is another index maintained for the track indexes of all cylinders. This index stores the highest key value of each track index along with the cylinder number and this type of index is called the cylinder index. Some type of file systems even maintains an index called master index, where the highest key value of each row in the cylinder index is stored along with the row number. This makes the searching even more efficient. Page 9

10 INDEXED SEQUENTIAL ACCESS METHOD The file structure described above is uses a method called indexed sequential access method to search for a given record. Given a key, the program first compares the key value with the values in the entries of the master index. If the key value is greater than the entry in the master index, then the comparison is made with the next cell entry. If the key value is found to be lesser than the value present in the cell of the master index, then the row number corresponding to that is retrieved and the control taken to that particular row directly in the cylinder index. Now the key value is again compared with each and every value in the cylinder index and stops when a greater value is found. After a greater value than the key value is found in the cylinder index, the cylinder number corresponding to that is retrieved and the control is directly taken to that particular cylinder. Now the key value is compared with each and every value in the track index and when a greater value is found, the corresponding track number is retrieved and control is taken to that track directly. This is the lowest level in the hierarchy and now a sequential search is made for the key value within this track alone. If the record matching to this key value is found then, the record is retrieved and loaded to the memory other wise record not found error is displayed. Page 10

11 In the above example, a key value is 923 is given and is searched using ISAM technique. First the key value is compared with the values in the master index. Since the first value in the master index (4000) is greater than 923, the control is transferred to the row one in the cylinder index. Now, 923 is compared with each value in the row 1 of cylinder index. The first value in the row 1 of cylinder index (1000) is greater than 923. Hence the control is transferred to track 10 where 923 is sequentially searched and retrieved. In this case the number of comparisons required to search for the key value 923 is 23 in track in track index + 1 in row 1 of cylinder index + 1 in master index = 35 comparisons. But if the same data is stored as ordinary sequential file, then the number of comparisons that would have taken to find out the record 923 is 923 comparisons. This shows that the ISAM has drastically reduced the searching time when compared to the ordinary sequential file. Page 11

DATA STRUCTURES/UNIT 3

DATA STRUCTURES/UNIT 3 UNIT III SORTING AND SEARCHING 9 General Background Exchange sorts Selection and Tree Sorting Insertion Sorts Merge and Radix Sorts Basic Search Techniques Tree Searching General Search Trees- Hashing.