A Depth First Search approach to finding the Longest Common Subsequence

Size: px
Start display at page:

Download "A Depth First Search approach to finding the Longest Common Subsequence"

Transcription

1 A Depth First Search approach to finding the Longest Common Subsequence Fragkiadaki Eleni, Samaras Nikolaos Department of Applied Informatics, University of Macedonia, Greece Harhalakis Stefanos, Department of Informatics, TEI of Thessaloniki, Greece Abstract. While examining the problem of the Longest Common Subsequence (LCS) of two strings we faced the problem of the vast amount of memory required from the existing algorithms. When using large strings the requirements in memory usually exceed the RAM memory of the computer while swapping all of the available hard disk. These algorithms are based on the classic LCS algorithm. A sort description of this algorithm is included in the paper as well as a new algorithm that reduces the amount of memory required when examining two strings, the second of which is of small length. There are small limitations for the length of the first string. The algorithm presented in this paper, introduces a new way of storing the data of the problem and for handling the information required to solve it. The experimental results prove that in the cases described above this algorithm has smaller execution time and significantly smaller requirements in memory than the classic LCS algorithm. Keywords. Algorithm, Longest Common Subsequence, Data structures, Data storage. 1. INTRODUCTION Finding the Longest Common Subsequence (LCS) of two strings is a well know problem. The algorithm of Hunt-Szymanski was one of the first algorithms presented to solve the problem [2], [7]. Generally the algorithms that solve the problem find practice in many different sections. First of all they are used in bioinformatics in finding the LCS of two strings of DNA, searching similarities among different people and living organizations. These algorithms are also used when trying to compare two fragments of text or code [1]. Many algorithms have been introduced that solve the LCS problem, in most of which dynamic programming has been used. These algorithms present one major problem, that of the size of memory required for storing the information that will finally lead them in finding the Longest Common Subsequence. In order to reduce the amount of memory required, we used a different approach for solving the LCS problem and for handling the data of the algorithm. This paper is organized as follows. It includes a description of the Classic LCS algorithm and a more detailed description of its major disadvantage. The description of the DFS LCS algorithm is presented. We describe the steps which the algorithm follows in order to compute the Longest Common Subsequence (LCS), the structures that it uses and an illustrative example. The computer, the programming language and compiler, the operating system and the experimental results are presented in detail using tables and charts with the results. The last section of this paper refers to our final conclusions about the usage of this algorithm. It also 1

2 Σχεδίαση Λειτουργιών, Ανάκτηση Πληροφοριών και Διαχείριση Γνώσης. refers to our conclusions concerning the comparison of the DFS - LCS algorithm with the classic LCS algorithm. 2. ALGORITHM DESCRIPTION There are many algorithms that deal with the problem of finding the Longest Common Subsequence of two strings. These algorithms consist of two main parts. The first part is the part of creating the initial structure that will be used throughout the execution of the algorithm. By the end of the construction of this structure the length of the Longest Common Subsequence will be know, but the subsequence itself will not. This first part is usually common among the LCS algorithms. These algorithms differ in the second part. The second part is the one that actually returns the longest common subsequence of the two strings. By having the second part different these algorithms vary in time and space complexity [3], [6]. Apart from the algorithms that compute the LCS, there are algorithms that compute the number or even all of the LCSs [4], [5]. 2.1 DESCRIPTION OF THE CLASSIC LCS ALGORITHM The classic algorithm that is being used for finding the longest common subsequence follows the described methodology. Given two strings X, Y and their lengths n, m the algorithm returns the length as well as the Longest Common Subsequence of those two strings. When using the term Longest Common Subsequence we define the characters that appear in both strings, in an advancing order but not necessarily in adjacent positions. Main purpose of the algorithm is to construct a n x m table (table L) where n is the number of lines and m is the number of columns. During the execution of the algorithm, we examine whether or not the current character of X equals the current character of Y. If this condition is satisfied, we set the L[i,j] element of the table to be equal with the L[i-1,j-1] element. If the condition is not satisfied we set the L[i,j] element to be equal with the maximum value of the L[i-1,j] and L[i,j-1] elements of the table. Once we have concluded with the construction of the table following the procedure described above, the number that indicates the length of the longest common subsequence of the two strings is located in the bottom right corner of the table (element L[n,m]). This table returns not only the LCS of the two strings but also the LCS of all prefixes of the strings (Prefix of a string is a substring of the initial string which starts with the first character and has length equal or less than the length of the initial string). In other words, the element L[i,j] is a number that indicates the length of the Longest Common Subsequence when using the i first elements of string X and the j first elements of string Y. The major disadvantage of this algorithm is the vast amount of memory that it requires. As mentioned before, the algorithm constructs a table of nxm proportions. For example if string X consists of characters and string Y consists of 500 characters then the algorithm will construct a x500 table, requiring approximately 5 Gb of memory. Naturally this amount of memory is not common for personal computers, and the algorithm results in swapping the hard drive of the computer. The speed of the hard disk can not be compared with the speed of the Ram memory which is much faster. Therefore transferring blocks from the hard disk to the memory and the other way around is a very slow procedure. 2.2 PSEUDOCODE OF THE CLASSIC LCS ALGORITHM Input: strings X, Y Output: the length L[i,j] of a LCS of X[0..i] and Y[0..j] for i 1 to n-1 do

3 L[i,-1] 0 for j 1 to m-1 do L[-1,j] 0 for I 0 to n-1 do for j 0 to m-1 do if X[i] = Y[j] then L[i,j] L[i-1,j-1] + 1 else L[i,j] max{l[i-1,j], L[i,j-1]} return array L 2.3 DESCRIPTION OF THE DFS LCS ALGORITHM The aim of the DFS LCS algorithm is to reduce the amount of memory required in solving the LCS problem. To do so, the algorithm uses a recursive function. This function examines all the different combinations of characters in Y, to determine which of these can be found in X, and which is the longest among them. The algorithm decides whether using the current character of Y produces a subsequence that exists in both X and Y and is the longest one. In order to improve the performance of the algorithm and to reduce the combinations that will be examined, the algorithm uses a procedure that ignores some of the characters in Y. For instance, if the current character of Y is the letter A (which exists in X and is available for use) and we are examining the case that the specific letter is not included in the LCS, if the next character that we will be examined is also A there is no reason to examine it because it presents no difference from the previous A that we ignored. This procedure of ignoring A will stop as soon as a different character gets included in the LCS. Using this procedure reduces significantly the number of combinations examined by the algorithm. Although we only check each combination once there are some common parts between the combinations that the algorithm examines. To overcome this repetition in the calculations we use a Cache memory. This memory stores information about each character we use from X, Y. For instance, if we use a character from Y in position j and this character is located in position i in X it is possible that this character will appear in many different combinations of Y. We store this character in the cache memory, along with its position in X, its position in Y, the length of the LCS found from this character till the end and the LCS itself. By doing so every time we want to examine a new character from Y, firstly we check the cache memory to see if there is an instance of this character cached and if there is, we replace it with this instance. If there is not an entry for this character in the Cache memory, we proceed with the calculations and as soon as we complete them we add this character in the Cache for future use. Doing so, each different combination (position of character in X, position of character in Y) is only calculated once. The algorithm consists of two parts, the initialization and the main body. The initialization consists of two steps. During the first step the algorithm defines the different characters from which the two strings have been constructed from. By the end of this step we have knowledge of the number of different characters in the alphabet and of the alphabet itself. During the next step of the initialization we construct the Index Table of X. The Index Table of X is a table whose every element is a list. The first list registers the positions in X where we can find the first character of the alphabet. In order to create this index we read each character of X and we add the position of this character in the proper list. 3

4 Σχεδίαση Λειτουργιών, Ανάκτηση Πληροφοριών και Διαχείριση Γνώσης. The main body is the algorithm itself which is a recursive function. Each call to the recursive function, function find_lcs(), has as inputs the two strings X, Y, their lengths n, m, the starting positions in X(x_pos) and Y(y_pos) and finally the ignoremask. The ignoremask is a 1xs table, where s is the length of the alphabet. Each element represents one character from the alphabet. If the value stored in the first position of the table is 1, then we should ignore the first character of the alphabet. If the value is 0 we should not ignore the character during the execution of the current call of the recursive function. The recursive function produces as output the length of the LCS (lcssz) and the LCS (lcs) found. We begin with the first letter of Y and make our way to the end. Firstly the algorithm examines whether we have reached the end of one of the two strings. If we have then the algorithm returns that the length of the LCS beginning from x_pos in X and y_pos in Y till the end is 0 and the function call is ended. If we haven t, we set the variable sz2 (length of the LCS from x_pos in X, y_pos in Y till the end when using the character in the y_pos position in Y) to -1 and the variable sz1 (length of the LCS from x_pos in X, y_pos in Y till the end when not using the character in the y_pos position in Y) to -1 as well. Then the algorithm checks whether or not we should ignore the current character of Y. At this point if the value in the corresponding position of the ignoremask table is 0 then we should not ignore the current character of Y and therefore continue with our calculations. The current character of Y is stored in the y_element variable. The next step is to determine if the current character of Y exists in X and is available for use. To do so the algorithm calls the indextable_find_first() function. Two parameters are required for this function, the first one is the current character of Y and the second one is the position in X (x_pos) from which we must start our search for this character. This function returns either -1, if we did not succeed in finding the character in X, or the position in which we found the character in X in the Index table of X. The result is stored in the pos_found variable. If the function returns -1 then the variable sz2 is set to 0, no character was found and the LCS from this character till the end is of 0 length. If the pos_found variables value is not -1 then we set the variable is_cached to 0 and the variable tocheck to 1. The variable is_cached is used to determine whether or not an instance of the specific character exists in the Cache memory or not. The variable tocheck has value 1 when we want to remember that the specific character should be examined whereas value 0 means that we have already inserted this character in the cache memory. The function indextable_get() returns the actual position of the current character of Y in X. It requires two parameters, the current character of Y (y_element) and the position in the Index Table of X (pos_found) in which we found it. The result returned from this function is stored in the variable k. At this point of the execution, the algorithm has determined that the current character in Y in position y_pos, stored in the variable y_element, exists in X (in position k). Up until now we do not know if the specific character exists in Cache memory. To determine that we use the function cache_find(), which requires two parameters, the position of the character in X and the position of the character in Y. This function returns the length of the LCS stored in Cache memory and the actual LCS. These results are stored in the variables sz2 and lcs2. If the value stored in the sz2 variable is larger or equal to 0, we succeeded in finding the character in the Cache memory. Therefore the variable tocheck is set to 0 and the variable is_cached is set to 1. If we did not find the character in the cache memory we need to continue with our calculations to determine the LCS from this character till the end. A new temporary ignore mask is been set here, the im. The algorithm calls the indextable_get_cur_begin() function which requires one parameter, the current element of Y. This function calculates the position in the corresponding list of the Index Table from which our calculations should continue and the value is stored in the pos variable. After that the algorithm calls the indextable_set_cur_begin() function which requires two parameters, the current character of Y (y_element) and the position in the corresponding list of the Index Table of X where we found

5 the current character of Y, incremented by one. Then the algorithm recursively calls the function find_lcs() to determine the LCS from that point forward. The starting position in X has been incremented by one (k+1) as well as the starting position in Y (y_pos+1). The ignore mask for the new call of the function is the one determined previously (table im). The results from this call are stored in the variables lcsx (the LCS) and the sz2 (length of the LCS). The LCS when including the character in the position y_pos of Y, is the character (found in position k of X) followed by the best LCS found from that position forward. After the completion of this step the current position from which we should begin our search in the Index Table of X is set to its previous value (by using the function indextable_set_cur_begin() with the parameter pos). The length of the LCS when using the current character of Y is incremented by one. At this point if the value of the variable is_cached is 0, the element that we examined is added to the Cache memory by using the function cache_update() which takes 4 parameters. The first one is the position of the character in X (k), the second one is the position of the character in Y (y_pos), the third one is the size of the LCS found from this character till the end (sz2) and the fourth one is the LCS (lcs2). Up to this point the algorithm has examined the case where the current character of Y will be used in the LCS. The character was found in X and was available for use and the character should not be ignored. We examined both cases where the specific character was found in Cache memory, and was not found in Cache memory and we determined the LCS. Now we will examine the case where the current character of Y should be ignored. In this case the size of the LCS is set to -1, to illustrate that the character was not used in the LCS. Next the algorithm calculates the LCS without the use of the current character of Y (actually ignores the current character of Y and the starting position in Y is incremented by 1). The variable oldignore has value either 1 (if the current character of Y has been ignored before) or 0 (if the current character of Y hasn t been ignored up to this point). If the current character of Y hasn t been ignored up to now we set the element in the corresponding position in the ignoremask table to 1. Then the algorithm calls the find_lcs() function to determine the LCS without the use of the current character of Y. The starting position in X is x_pos, the starting position in Y is y_pos+1. We use the ignoremask table, presented earlier. The results of this call are stored in the variables sz1 (for the length of the LCS returned) and lcs1 (for the LCS). In order to keep the data stored in the ignore mask table consequent with their previous state we set the element representing the ignore state of the current letter of Y to its previous state. Finally the algorithm examines which is the best result. To do so, the algorithm compares the LCSs length in each case (when using the current character of Y in the LCS and when we do not use it), and stores the best length in the sz variable and the LCS in the lcs variable. 2.4 PSEUDOCODE find_lcs() input: strings X, Y, n, m, x_pos, y_pos, ignoremask output: lcs, lcssz lcs [] lcssz -1 if (x_pos = n or y_pos = m) lcssz 0 return sz1-1 sz2-1 5

6 Σχεδίαση Λειτουργιών, Ανάκτηση Πληροφοριών και Διαχείριση Γνώσης. if (ignoremask(y(y_pos))=0) y_element Y(y_pos) pos_found indextable_find_first(y_element, x_pos) if (pos_fount = -1) sz2 0 else is_cached 0 tocheck 1 k indextable_get(y_element, pos_found) (sz2, lcs2) cache_find(k, y_pos) if (sz2 >=0) tocheck 0 is_cached 1 if (tocheck = 1) im [] pos indextable_get_cur_begin(y_element) indextable_set_cur_begin(y_element, pos_found+1) (lcsx, sz2) find_lcs(k+1, y_pos+1, im) lcs2 [k lcsx] indextable_set_cur_begin(y_element, pos) sz2 sz2 + 1 if (is_cached = 0) cache_update(k, y_pos, sz2, lcs2) else sz2-1 k Y(y_pos) oldignore ignoremask(k) if (oldignore=0) ignoremask(k) 1 (lcs1, sz1) find_lcs(x_pos, y_pos+1, ignoremask) if (oldignore=0) ignoremask(k) 0 sz 0 if (sz1>0 or sz2>0) if (sz1>=sz2) lcs lcs1 sz sz1 else lcs lcs2 sz sz2 lcssz sz

7 3. AN ILLUSTRATIVE EXAMPLE The alphabet used to create the following example consisted of 4 letters. These letters were A, C, G, T. The X string consists of 9 characters and Y string of 4 characters. Specifically the X string is GGCTACACC and the Y string is CAAG. In our example the size of the alphabet is 4. In this stage in the execution of the algorithm the Index Table of X is created. The Index Table of X is presented in the table below, where each line refers to the one letter of the alphabet begging from the first. The Cache memory is also initialized as well as the ignoremask table. The ignoremask table has size equal to the size of the alphabet used (in our example has 4 elements) Table 1: The index of string X Once the initialization stage has completed the execution of the main body of the algorithm begins. Here we have the initial call of the recursive function that the algorithm uses. The parameters past in this function are the starting positions in X, Y which are 0, 0, and the ignoremask table (with all its elements set to 0). Each recursive call has two branches, one that searches for the LCS including the current character of Y and one that searches for the LCS without the current character of Y. Each branch is a call to the recursive function. The first includes the current character and the second advances one character in Y and searches for the LCS from that point forward. Call 1 positions to start from in X, Y (0, 0): Step 1.1: With the first letter of Y which is C. The algorithm examines whether or not C exists in X and is available for use. The letter exists in X and the first available for use C in X is located in position 2. Step 1.2: The algorithm examines whether or not the letter in position 2 of X and position 0 of Y exists in the cache memory. Searching the cache for this element (2, 0) does not produce a result. Step 1.3: Call 2 positions to start from in X, Y (2+1, 0+1): Step 2.1: (same as step 1.1) Letter found in position 4. Step 2.2: (same as step 1.2) Element in positions (4, 1) was not found in cache memory Step 2.3: Call 3 positions to start from in X, Y (4+1, 1+1): Step 3.1: (same as step 1.1) Letter found in position 6. Step 3.2: (same as step 1.2) Element in positions (6, 2) was not found in cache memory Step 3.3: Call 4 positions to start from in X, Y (6+1, 2+1): Step 4.1: With the letter in position 3 of string Y which is G. The algorithm examines whether or not G exists in X and is available for use. There is not a letter G in X in or after position 7.The computations stop. 7

8 Σχεδίαση Λειτουργιών, Ανάκτηση Πληροφοριών και Διαχείριση Γνώσης. Step 4.2: Without the letter in position 3 of string Y which is G. We advance one character in Y Step 4.3: Call 5 positions to start from in X, Y (7, 3+1): Step 5.1: We have reached the end of Y. No more computations can be done from this point on. So, we go back to the previous call. Step 4.4: The search of this call of the function stops, no character could be used in the LCS. Call 4 returns 0 to call 3. Step 3.4: (same as step 4.2) Step 3.5: Call 6 positions to start from in X, Y (5, 2+1): Step 6.1: (same as step 4.1) Step 6.2: (same as step 4.2) Step 6.3: Call 7 positions to start from in X, Y (5, 3+1): Step 1: (same as step 5.1) Step 6.4: We have finished with the calculations for this call, this call returns to the previous one with LCS 0. Step 3.6: We have finished searching from element (6, 2) and forward and we add to the cache that the element in position 6 of X and 2 of Y returns a LCS of 1 and that LCS is A. Step 2.4: (same as step 3.6). We add to the cache that the element in position 4 of X and 1 of Y returns a LCS of 2 and that LCS is AA Step 2.5: (same as step 4.2) Step 2.6: Call 8 positions to start from in X, Y (3, 1+1): Step 8.1: In this step we should normally examine with the letter in position 2 of Y, which is A. Since we are in the case where we examine the LCS without the previous A there is no reason to include this A in out computations. Therefore this A is ignored. Step 8.2: (same as step 4.2) Step 8.3: Call 9 positions to start from in X, Y (3, 2+1): Step 9.1: (same as step 4.1) Step 9.2: (same as step 4.2) Step 9.3: Call 10 positions to start from in X, Y (3, 3+1): Step 10.1: (same as step 5.1) Step 9.4: The computation for this call and here and this call returns 0 to the previous call. Step 8.4: The computation for this call and here and this call returns 0 to the previous call. Step 2.7: The computations for this call end here and return to the previous call LCS 2. Step 1.4: (same as step 3.6). We add to the cache that the element in position 2 of X and 0 of Y returns a LCS of 3 and that LCS is CAA Step 1.5: (same as step 4.2) Step 1.6: Call 11 positions to start from in X, Y (0, 0+1): Step 11.1: (same as step 1.1) Letter found in position 4.

9 Step 11.2: The algorithm examines whether or not the letter in position 4 of X and position 1 of Y exists in the cache memory. Searching the cache for this element (4, 1) returns the result that including character A in the LCS we have a LCS of size 2 which is AA. This step ends here and returns LCS 2. Step 11.3: (same as step 4.2) Step 11.4: Call 12 positions to start from in X, Y (0, 1+1): Step 12.1: In this step we should normally examine with the letter in position 2 of Y, which is A. Since we are in the case where we examine the LCS without the previous A there is no reason to include this A in out computations. Therefore this A is ignored. Step 12.2: (same as step 4.2) Step 12.3: Call 13 positions to start from in X, Y (0, 2+1): Step 13.1: (same as step 1.1) Letter found in position 0. Step 13.2: Call 14 positions to start from in X, Y (0+1, 3+1): Step 14.1: (same as step 5.1) Step 13.3: (same as step 4.2) Step 13.4: Call 15 positions to start from in X, Y (0, 3+1): Step 15.1: (same as step 5.1) Step 13.5: (same as step 3.6). We add to the cache that the element in position 0 of X and 3 of Y returns a LCS of 1 and that LCS is G. This call returns to the previous one LCS of 1. Step 12.4: We have finished the computations for this call. This call returns to the previous one LCS of 1. Step 11.5: We have finished the computations for this call. This call returns to the previous one LCS 1, which is G. Step 1.4: We have ended the computation for this call. This call returns LCS 3, which is CAA. 4. COMPUTATIONAL EXPERIMENTS The computer we used to run the experiments is an Intel Celeron with a processor at 2.4 GHz, 768MB of RAM memory and 2,048GB of swap memory. The operating system is Debian GNU/ LINUX with a kernel The algorithm was implemented in C/C++ and the algorithm of classic LCS was implemented in C. The compiler was (Debian 1: ). The examples where created randomly, using the Random function of the operating system which uses a non-linear additive feedback random number generator employing a default table of size 31 long integers to return successive pseudo-random numbers in the range from 0 to RAND_MAX. We examined a total of 33 experiments, from the angle of time and memory usage. The results presented in table 2 concern the time required by each algorithm to solve the LCS problem. The first column illustrates the charts we present in this section and which results where used for the construction of each chart. The second one presents the length of string X and the third presents the length of string Y. The fourth and fifth columns present the number of different characters used to construct strings X, Y. The actual size of the alphabet mentioned in the algorithm is the max {size of alphabet X, size of alphabet Y}. The next two columns refer to the DFS LCS algorithm and the last two refer to the Classic LCS algorithm. The User time is the actual time the algorithm required to complete its calculations and the System time is the time consumed by the algorithm and refers to the usage of the operating system. The CPU time 9

10 Σχεδίαση Λειτουργιών, Ανάκτηση Πληροφοριών και Διαχείριση Γνώσης. for each algorithm is the sum of the User and System time. The word killed means that the execution of the algorithm was terminated by the operating system because the demands of the algorithm in memory exceeded the available memory of the computer. Chart 1 Length X Length Y X Y Size of alphabet DFS - LCS Classic LCS User (seconds) System (seconds) User (seconds) System (seconds) , ,530 4, , ,287 4, , ,176 4, , ,205 4, , ,128 4, , ,357 12, , ,782 12, , ,033 12, , ,873 12,505 Chart 2 Charts ,210 0,656 Killed ,423 0,723 Killed ,949 0,683 Killed ,229 0,845 Killed ,031 0,967 Killed ,364 1,147 Killed ,342 1,406 Killed ,917 1,705 Killed ,581 2,446 Killed ,741 4,324 Killed ,476 4,616 Killed ,460 5,414 Killed ,130 7,106 Killed ,083 8,756 Killed , ,051 0,

11 ,633 1, ,488 3, Killed Table 2: Experimental results of time usage The results presented in table 3 concern the memory required by each algorithm to solve the LCS problem. The first 5 columns represent the same data as in the previous table. The next two columns refer to the DFS LCS algorithm and the last one refers to the Classic LCS algorithm. The columns named Memory represent the maximum memory consumed by each algorithm, measured in MegaBytes. The second column under the DFS LCS algorithm is the cache memory mentioned earlier. Its size is included in the Memory column. Charts 5 6 Alphabet size DFS LCS Classic LCS Length X Length Y X Y Memory (MB) Cache (MB) Memory (MB) ,523 0, , ,277 1, , ,500 1, , ,500 1, , ,063 0, ,000 Charts 7 8 Chart ,785 0, , ,281 0, , , , , , ,000 1,090 killed ,000 7,835 killed ,000 26,904 killed ,000 57,503 killed , ,678 killed ,335 killed , ,190 killed , ,730 killed , ,135 killed , ,697 killed , ,904 killed , ,493 killed , ,931 killed , ,199 killed Table 3: Experimental results of memory usage 11

12 Σχεδίαση Λειτουργιών, Ανάκτηση Πληροφοριών και Διαχείριση Γνώσης. 5,0 4,5 4,0 3,5 Seconds 3,0 2,5 2,0 1,5 1,0 0,5 Ex. 1 Ex. 2 Ex. 3 Ex. 4 Ex. 5 USER DFS LCS SYSTEM DFS LCS USER CLASSIC LCS SYSTEM CLASSIC LCS Chart 1: Time usage with a large X, and small Y (X and Y were created using the same alphabet which increases from example to example) The chart presented above (chart 1) illustrates that in both cases the increase in the alphabet size does not affect the execution time of the algorithms severally. In the DFS LCS algorithm there are small time requirements because the algorithm does not have large demands in memory. In the second case the time is significantly larger due the vast amount of memory used, which causes the swapping of the hard disk of the computer and the participation of the operating system to transfer the blocks from the hard to the RAM memory and back. For example, in the second example the CPU time for the DFS LCS algorithm is seconds whereas for the Classic LCS algorithm the CPU time is 7.17 seconds. In other words the time required from the Classic LCS algorithm is 22.2 times the time the DFS LCS algorithm required to solve the problem. Our next experiment focuses in the time required by the DFS LCS algorithm to find the longest common subsequence when X consists of characters. The length of Y increases by 100 characters in each example, and both strings were constructed using the same alphabet of 4 different characters. In the following diagram we can see the increase of the execution time as the length of Y increases Seconds USER DFS-LCS SYSTEM DFS-LCS Length of Y Chart 2: Time usage with a large X, and small Y (X and Y have the same size alphabet, and the size of Y increases by 100 characters in each example) As mentioned earlier the algorithm depends on the size of string Y. When using the same alphabet to construct the strings X and Y, using a string Y with length bigger than 1500

13 characters causes the system to run out of memory and swap memory as well. This situation does not occur when the alphabet of Y is larger than the alphabet of X. In this case the algorithm takes advantage of this difference and computes the LCS of X and Y, and the length of Y can increase significantly. We should also mention that the classic LCS algorithm could not compute the LCS for the examples used in this experiment because it caused the system to run out of memory. Our next experiment consists of a string X of small length (3000 characters) and a Y string whose length increases in each example. Both strings were created from 4 different characters. In the charts that follow (charts 3 & 4) we can see the time requirements for both algorithms. In chart 3, when Y s length becomes 1600 the algorithm runs out of memory and can not compute the LCS. 65,0 0, ,0 5 45, Seconds 4 35,0 3 25,0 2 15,0 Seconds , USER DFS-LCS SYSTEM DFS- LCS USER CLASSIC LCS SYSTEM CLASSIC LCS Chart 3 & 4: Time usage with a small X and small Y (X and Y have the same size alphabet, and the size of Y increases by 100 characters in each example) The following experiment illustrates the amount of memory that each algorithm requires to solve the LCS problem when using a large X ( characters) and a small Y (100 characters). In the first example used we can see that the classic LCS algorithm requires 50.8 times the memory the DFS LCS algorithm does for solving the same problem. 18, , ,0 60 MegaByte 12,0 1 8,0 6,0 4,0 MegaByte ,0 10 Ex. 1 Ex. 2 Ex. 3 Ex. 4 Ex. 5 Ex. 1 Ex. 2 Ex. 3 Ex. 4 Ex. 5 Memory DFS- LCS Cache DFS-LCS Memory Classic LCS Charts 5 & 6: Memory usage with a large X, and small Y (X and Y were created using the same alphabet which increases from example to example) 13

14 Σχεδίαση Λειτουργιών, Ανάκτηση Πληροφοριών και Διαχείριση Γνώσης. The difference in the alphabet size of X and Y affects the memory required for the execution of the DFS LCS algorithm. As mentioned before one character found in both X and Y is registered in the cache memory. This memory increases in size as more common characters are found among the two strings. In the case examined where characters included in Y are not included in X there will be a smaller amount of registrations in the cache memory and therefore the DFS LCS algorithm requires even less memory as shown in the two charts that follow (charts 7 & 8). In the fourth example the classic LCS algorithm requires 98.2 times the memory the DFS LCS algorithm does. MegaByte 16,0 15,0 14,0 13,0 12,0 11,0 1 9,0 8,0 7,0 6,0 5,0 4,0 3,0 2,0 1,0 Ex. 1 Ex. 2 Ex. 3 Ex. 4 MegaByte Ex. 1 Ex. 2 Ex. 3 Ex. 4 Memory DFS- LCS Cache DFS-LCS Memory Classic LCS Chart 7 & 8: Memory usage with a large X, and small Y (X was created from an alphabet with 4 different characters and Y was created using an alphabet which increases from example to example). In out last experiment we used a large X and a Y whose length increases by 100 characters in each example. The Classic LCS algorithm could not compute the solution to these problems because it caused the operating system to run out of memory. MegaByte Length of Y Memory DFS-LCS Cache DFS-LCS Chart 9: Memory usage with a large X and a Y whose length increases by 100 in every example (X and Y were created using the same alphabet). 5. CONCLUSIONS The algorithm presented in this paper illustrates a new approach in the LCS problem. We have demonstrated that in all cases in which the length of Y is small the algorithm produces the output (the LCS) in execution time that either does not differ significantly from the execution time of the classic LCS algorithm or is better. Also when using this algorithm there

15 are no specific limitations about the length of X. On the contrary the classic LCS algorithm tested in the same examples required significantly larger amount of memory and produced delays due to the swapping of the hard drive. This algorithm depends not only from the size of Y but also from the size of the alphabet used to produce the two strings. In cases where the length of the alphabet used to construct Y is significantly larger than the one used to produce X, the length of Y can increase notably. These are cases that the classic LCS can not distinct from and therefore produces worst results. References [1] Michael T. Goodrich, Roberto Tamassia (2001), Algorithm Design, pp , Wiley Publications. [2] James W. Hunt, Thomas G. Szymanski (1977), A Fast Algorithm for Computing Longest Common Subsequences, Association for Computing Machinery, 20, pp [3] H. Goeman, M. Clausen (1999), A New Practical Linear Space Algorithm for the Longest Common Subsequence Problem, The Prague Stringology Club Workshop [4] Ronald I. Greenberg (2002) Fast and Simple Computation of All Longest Common Subsequences, arxiv:cs.ds/ v1. [5] Ronald I. Greenberg (2003) Computing the Number of Longest Common Subsequences, arxiv:cs.ds/ v1 [6] L. Bergroth, H. Hakonen, T. Raita (2000) A Survey of Longest Common Subsequence Algorithms, Proceedings of the Seventh International Symposium on String Processing and Information Retrieval (IEEE) [7] Maxime Crochemore, Costas S. Iliopoulos, Yoan J.Pinzon (2003), Speeding-up Hirschberg and Hunt-Szymanski LCS Algorithms, Fundamenta Informaticae, 56, pp

A Performance Evaluation of the Preprocessing Phase of Multiple Keyword Matching Algorithms

A Performance Evaluation of the Preprocessing Phase of Multiple Keyword Matching Algorithms A Performance Evaluation of the Preprocessing Phase of Multiple Keyword Matching Algorithms Charalampos S. Kouzinopoulos and Konstantinos G. Margaritis Parallel and Distributed Processing Laboratory Department

More information

A New String Matching Algorithm Based on Logical Indexing

A New String Matching Algorithm Based on Logical Indexing The 5th International Conference on Electrical Engineering and Informatics 2015 August 10-11, 2015, Bali, Indonesia A New String Matching Algorithm Based on Logical Indexing Daniar Heri Kurniawan Department

More information

2 Proposed Implementation. 1 Introduction. Abstract. 2.1 Pseudocode of the Proposed Merge Procedure

2 Proposed Implementation. 1 Introduction. Abstract. 2.1 Pseudocode of the Proposed Merge Procedure Enhanced Merge Sort Using Simplified Transferrable Auxiliary Space Zirou Qiu, Ziping Liu, Xuesong Zhang Department of Computer Science Southeast Missouri State University Cape Girardeau, MO 63701 zqiu1s@semo.edu,

More information

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, Dynamic Programming

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, Dynamic Programming Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 25 Dynamic Programming Terrible Fibonacci Computation Fibonacci sequence: f = f(n) 2

More information

IV/IV B.Tech (Regular) DEGREE EXAMINATION. Design and Analysis of Algorithms (CS/IT 414) Scheme of Evaluation

IV/IV B.Tech (Regular) DEGREE EXAMINATION. Design and Analysis of Algorithms (CS/IT 414) Scheme of Evaluation IV/IV B.Tech (Regular) DEGREE EXAMINATION Design and Analysis of Algorithms (CS/IT 414) Scheme of Evaluation Maximum: 60 Marks 1. Write briefly about the following 1*12= 12 Marks a) Give the characteristics

More information

Bit-Parallel LCS-length Computation Revisited

Bit-Parallel LCS-length Computation Revisited Bit-Parallel LCS-length Computation Revisited Heikki Hyyrö Abstract The longest common subsequence (LCS) is a classic and well-studied measure of similarity between two strings A and B. This problem has

More information

A Rapid Automatic Image Registration Method Based on Improved SIFT

A Rapid Automatic Image Registration Method Based on Improved SIFT Available online at www.sciencedirect.com Procedia Environmental Sciences 11 (2011) 85 91 A Rapid Automatic Image Registration Method Based on Improved SIFT Zhu Hongbo, Xu Xuejun, Wang Jing, Chen Xuesong,

More information

Lab Determining Data Storage Capacity

Lab Determining Data Storage Capacity Lab 1.3.2 Determining Data Storage Capacity Objectives Determine the amount of RAM (in MB) installed in a PC. Determine the size of the hard disk drive (in GB) installed in a PC. Determine the used and

More information

CS2 Algorithms and Data Structures Note 1

CS2 Algorithms and Data Structures Note 1 CS2 Algorithms and Data Structures Note 1 Analysing Algorithms This thread of the course is concerned with the design and analysis of good algorithms and data structures. Intuitively speaking, an algorithm

More information

Fast and Cache-Oblivious Dynamic Programming with Local Dependencies

Fast and Cache-Oblivious Dynamic Programming with Local Dependencies Fast and Cache-Oblivious Dynamic Programming with Local Dependencies Philip Bille and Morten Stöckel Technical University of Denmark, DTU Informatics, Copenhagen, Denmark Abstract. String comparison such

More information

Algorithm Design Techniques part I

Algorithm Design Techniques part I Algorithm Design Techniques part I Divide-and-Conquer. Dynamic Programming DSA - lecture 8 - T.U.Cluj-Napoca - M. Joldos 1 Some Algorithm Design Techniques Top-Down Algorithms: Divide-and-Conquer Bottom-Up

More information

String Algorithms. CITS3001 Algorithms, Agents and Artificial Intelligence. 2017, Semester 2. CLRS Chapter 32

String Algorithms. CITS3001 Algorithms, Agents and Artificial Intelligence. 2017, Semester 2. CLRS Chapter 32 String Algorithms CITS3001 Algorithms, Agents and Artificial Intelligence Tim French School of Computer Science and Software Engineering The University of Western Australia CLRS Chapter 32 2017, Semester

More information

ANALYSIS AND EVALUATION OF DISTRIBUTED DENIAL OF SERVICE ATTACKS IDENTIFICATION METHODS

ANALYSIS AND EVALUATION OF DISTRIBUTED DENIAL OF SERVICE ATTACKS IDENTIFICATION METHODS ANALYSIS AND EVALUATION OF DISTRIBUTED DENIAL OF SERVICE ATTACKS IDENTIFICATION METHODS Saulius Grusnys, Ingrida Lagzdinyte Kaunas University of Technology, Department of Computer Networks, Studentu 50,

More information

A Revised Algorithm to find Longest Common Subsequence

A Revised Algorithm to find Longest Common Subsequence A Revised Algorithm to find Longest Common Subsequence Deena Nath 1, Jitendra Kurmi 2, Deveki Nandan Shukla 3 1, 2, 3 Department of Computer Science, Babasaheb Bhimrao Ambedkar University Lucknow Abstract:

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design and Analysis LECTURE 16 Dynamic Programming Least Common Subsequence Saving space Adam Smith Least Common Subsequence A.k.a. sequence alignment edit distance Longest Common Subsequence

More information

To become familiar with array manipulation, searching, and sorting.

To become familiar with array manipulation, searching, and sorting. ELECTRICAL AND COMPUTER ENGINEERING 06-88-211: COMPUTER AIDED ANALYSIS LABORATORY EXPERIMENT #2: INTRODUCTION TO ARRAYS SID: OBJECTIVE: SECTIONS: Total Mark (out of 20): To become familiar with array manipulation,

More information

Introduction to Algorithms

Introduction to Algorithms Introduction to Algorithms 6.046J/18.401J LECTURE 12 Dynamic programming Longest common subsequence Optimal substructure Overlapping subproblems Prof. Charles E. Leiserson Dynamic programming Design technique,

More information

Question Bank Subject: Advanced Data Structures Class: SE Computer

Question Bank Subject: Advanced Data Structures Class: SE Computer Question Bank Subject: Advanced Data Structures Class: SE Computer Question1: Write a non recursive pseudo code for post order traversal of binary tree Answer: Pseudo Code: 1. Push root into Stack_One.

More information

Searching Algorithms/Time Analysis

Searching Algorithms/Time Analysis Searching Algorithms/Time Analysis CSE21 Fall 2017, Day 8 Oct 16, 2017 https://sites.google.com/a/eng.ucsd.edu/cse21-fall-2017-miles-jones/ (MinSort) loop invariant induction Loop invariant: After the

More information

CS420: Operating Systems

CS420: Operating Systems Main Memory James Moscola Department of Engineering & Computer Science York College of Pennsylvania Based on Operating System Concepts, 9th Edition by Silberschatz, Galvin, Gagne Background Program must

More information

File System Forensics : Measuring Parameters of the ext4 File System

File System Forensics : Measuring Parameters of the ext4 File System File System Forensics : Measuring Parameters of the ext4 File System Madhu Ramanathan Department of Computer Sciences, UW Madison madhurm@cs.wisc.edu Venkatesh Karthik Srinivasan Department of Computer

More information

Dr. Amotz Bar-Noy s Compendium of Algorithms Problems. Problems, Hints, and Solutions

Dr. Amotz Bar-Noy s Compendium of Algorithms Problems. Problems, Hints, and Solutions Dr. Amotz Bar-Noy s Compendium of Algorithms Problems Problems, Hints, and Solutions Chapter 1 Searching and Sorting Problems 1 1.1 Array with One Missing 1.1.1 Problem Let A = A[1],..., A[n] be an array

More information

CMPS 102 Solutions to Homework 7

CMPS 102 Solutions to Homework 7 CMPS 102 Solutions to Homework 7 Kuzmin, Cormen, Brown, lbrown@soe.ucsc.edu November 17, 2005 Problem 1. 15.4-1 p.355 LCS Determine an LCS of x = (1, 0, 0, 1, 0, 1, 0, 1) and y = (0, 1, 0, 1, 1, 0, 1,

More information

Checking Resource Usage in Fedora (Linux)

Checking Resource Usage in Fedora (Linux) Lab 5C Checking Resource Usage in Fedora (Linux) Objective In this exercise, the student will learn how to check the resources on a Fedora system. This lab covers the following commands: df du top Equipment

More information

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015 Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015 Quick-Sort 7 4 9 6 2 2 4 6 7 9 4 2 2 4 7 9 7 9 2 2 9 9 2015 Goodrich and Tamassia

More information

On the Parallel Implementation of Best Fit Decreasing Algorithm in Matlab

On the Parallel Implementation of Best Fit Decreasing Algorithm in Matlab Contemporary Engineering Sciences, Vol. 10, 2017, no. 19, 945-952 HIKARI Ltd, www.m-hikari.com https://doi.org/10.12988/ces.2017.79120 On the Parallel Implementation of Best Fit Decreasing Algorithm in

More information

Efficient Multiway Radix Search Trees

Efficient Multiway Radix Search Trees Appeared in Information Processing Letters 60, 3 (Nov. 11, 1996), 115-120. Efficient Multiway Radix Search Trees Úlfar Erlingsson a, Mukkai Krishnamoorthy a, T. V. Raman b a Rensselaer Polytechnic Institute,

More information

White Paper. File System Throughput Performance on RedHawk Linux

White Paper. File System Throughput Performance on RedHawk Linux White Paper File System Throughput Performance on RedHawk Linux By: Nikhil Nanal Concurrent Computer Corporation August Introduction This paper reports the throughput performance of the,, and file systems

More information

BIOINFORMATICS APPLICATIONS NOTE

BIOINFORMATICS APPLICATIONS NOTE BIOINFORMATICS APPLICATIONS NOTE Sequence analysis BRAT: Bisulfite-treated Reads Analysis Tool (Supplementary Methods) Elena Y. Harris 1,*, Nadia Ponts 2, Aleksandr Levchuk 3, Karine Le Roch 2 and Stefano

More information

Analysis of Algorithms

Analysis of Algorithms Algorithm An algorithm is a procedure or formula for solving a problem, based on conducting a sequence of specified actions. A computer program can be viewed as an elaborate algorithm. In mathematics and

More information

Scheduling the Intel Core i7

Scheduling the Intel Core i7 Third Year Project Report University of Manchester SCHOOL OF COMPUTER SCIENCE Scheduling the Intel Core i7 Ibrahim Alsuheabani Degree Programme: BSc Software Engineering Supervisor: Prof. Alasdair Rawsthorne

More information

Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures*

Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures* Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures* Tharso Ferreira 1, Antonio Espinosa 1, Juan Carlos Moure 2 and Porfidio Hernández 2 Computer Architecture and Operating

More information

Chapter 1 Programming: A General Overview

Chapter 1 Programming: A General Overview Introduction Chapter 1 Programming: A General Overview This class is an introduction to the design, implementation, and analysis of algorithms. examples: sorting large amounts of data organizing information

More information

A parallel approach of Best Fit Decreasing algorithm

A parallel approach of Best Fit Decreasing algorithm A parallel approach of Best Fit Decreasing algorithm DIMITRIS VARSAMIS Technological Educational Institute of Central Macedonia Serres Department of Informatics Engineering Terma Magnisias, 62124 Serres

More information

A Review on Cache Memory with Multiprocessor System

A Review on Cache Memory with Multiprocessor System A Review on Cache Memory with Multiprocessor System Chirag R. Patel 1, Rajesh H. Davda 2 1,2 Computer Engineering Department, C. U. Shah College of Engineering & Technology, Wadhwan (Gujarat) Abstract

More information

Preview. Memory Management

Preview. Memory Management Preview Memory Management With Mono-Process With Multi-Processes Multi-process with Fixed Partitions Modeling Multiprogramming Swapping Memory Management with Bitmaps Memory Management with Free-List Virtual

More information

Dynamic Programming. An Enumeration Approach. Matrix Chain-Products. Matrix Chain-Products (not in book)

Dynamic Programming. An Enumeration Approach. Matrix Chain-Products. Matrix Chain-Products (not in book) Matrix Chain-Products (not in book) is a general algorithm design paradigm. Rather than give the general structure, let us first give a motivating example: Matrix Chain-Products Review: Matrix Multiplication.

More information

4.1 Performance. Running Time. The Challenge. Scientific Method

4.1 Performance. Running Time. The Challenge. Scientific Method Running Time 4.1 Performance As soon as an Analytic Engine exists, it will necessarily guide the future course of the science. Whenever any result is sought by its aid, the question will arise by what

More information

CS60020: Foundations of Algorithm Design and Machine Learning. Sourangshu Bhattacharya

CS60020: Foundations of Algorithm Design and Machine Learning. Sourangshu Bhattacharya CS60020: Foundations of Algorithm Design and Machine Learning Sourangshu Bhattacharya Dynamic programming Design technique, like divide-and-conquer. Example: Longest Common Subsequence (LCS) Given two

More information

C06: Memory Management

C06: Memory Management CISC 7310X C06: Memory Management Hui Chen Department of Computer & Information Science CUNY Brooklyn College 3/8/2018 CUNY Brooklyn College 1 Outline Recap & issues Project 1 feedback Memory management:

More information

Parallelization of Graph Isomorphism using OpenMP

Parallelization of Graph Isomorphism using OpenMP Parallelization of Graph Isomorphism using OpenMP Vijaya Balpande Research Scholar GHRCE, Nagpur Priyadarshini J L College of Engineering, Nagpur ABSTRACT Advancement in computer architecture leads to

More information

Virtual Memory COMPSCI 386

Virtual Memory COMPSCI 386 Virtual Memory COMPSCI 386 Motivation An instruction to be executed must be in physical memory, but there may not be enough space for all ready processes. Typically the entire program is not needed. Exception

More information

Running Time. Analytic Engine. Charles Babbage (1864) how many times do you have to turn the crank?

Running Time. Analytic Engine. Charles Babbage (1864) how many times do you have to turn the crank? 4.1 Performance Introduction to Programming in Java: An Interdisciplinary Approach Robert Sedgewick and Kevin Wayne Copyright 2002 2010 3/30/11 8:32 PM Running Time As soon as an Analytic Engine exists,

More information

Merge Sort Algorithm

Merge Sort Algorithm Merge Sort Algorithm Jaiveer Singh (16915) & Raju Singh(16930) Department of Information and Technology Dronacharya College of Engineering Gurgaon, India Jaiveer.16915@ggnindia.dronacharya.info ; Raju.16930@ggnindia.dronacharya.info

More information

An Implementation of a 5-term GFSR Random Number Generator for Parallel Computations

An Implementation of a 5-term GFSR Random Number Generator for Parallel Computations The Eighth International Symposium on Operations Research and Its Applications (ISORA 9) Zhangjiajie, China, September 2 22, 29 Copyright 29 ORSC & APORC, pp. 448 452 An Implementation of a 5-term GFSR

More information

B-Trees and External Memory

B-Trees and External Memory Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015 and External Memory 1 1 (2, 4) Trees: Generalization of BSTs Each internal node

More information

A PRIMAL-DUAL EXTERIOR POINT ALGORITHM FOR LINEAR PROGRAMMING PROBLEMS

A PRIMAL-DUAL EXTERIOR POINT ALGORITHM FOR LINEAR PROGRAMMING PROBLEMS Yugoslav Journal of Operations Research Vol 19 (2009), Number 1, 123-132 DOI:10.2298/YUJOR0901123S A PRIMAL-DUAL EXTERIOR POINT ALGORITHM FOR LINEAR PROGRAMMING PROBLEMS Nikolaos SAMARAS Angelo SIFELARAS

More information

Position Sort. Anuj Kumar Developer PINGA Solution Pvt. Ltd. Noida, India ABSTRACT. Keywords 1. INTRODUCTION 2. METHODS AND MATERIALS

Position Sort. Anuj Kumar Developer PINGA Solution Pvt. Ltd. Noida, India ABSTRACT. Keywords 1. INTRODUCTION 2. METHODS AND MATERIALS Position Sort International Journal of Computer Applications (0975 8887) Anuj Kumar Developer PINGA Solution Pvt. Ltd. Noida, India Mamta Former IT Faculty Ghaziabad, India ABSTRACT Computer science has

More information

An Efficient Algorithm for Computing Non-overlapping Inversion and Transposition Distance

An Efficient Algorithm for Computing Non-overlapping Inversion and Transposition Distance An Efficient Algorithm for Computing Non-overlapping Inversion and Transposition Distance Toan Thang Ta, Cheng-Yao Lin and Chin Lung Lu Department of Computer Science National Tsing Hua University, Hsinchu

More information

Longest common subsequence between run-length-encoded strings: a new algorithm with improved parallelism

Longest common subsequence between run-length-encoded strings: a new algorithm with improved parallelism Information Processing Letters 90 (2004) 167 173 www.elsevier.com/locate/ipl Longest common subsequence between run-length-encoded strings: a new algorithm with improved parallelism Valerio Freschi, Alessandro

More information

Determining gapped palindrome density in RNA using suffix arrays

Determining gapped palindrome density in RNA using suffix arrays Determining gapped palindrome density in RNA using suffix arrays Sjoerd J. Henstra Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Abstract DNA and RNA strings contain

More information

Final Exam Solutions

Final Exam Solutions COS 226 FINAL SOLUTIONS, FALL 214 1 COS 226 Algorithms and Data Structures Fall 214 Final Exam Solutions 1. Digraph traversal. (a) 8 5 6 1 4 2 3 (b) 4 2 3 1 5 6 8 2. Analysis of algorithms. (a) N (b) log

More information

IP LOOK-UP WITH TIME OR MEMORY GUARANTEE AND LOW UPDATE TIME 1

IP LOOK-UP WITH TIME OR MEMORY GUARANTEE AND LOW UPDATE TIME 1 2005 IEEE International Symposium on Signal Processing and Information Technology IP LOOK-UP WITH TIME OR MEMORY GUARANTEE AND LOW UPDATE TIME 1 G.T. Kousiouris and D.N. Serpanos Dept. of Electrical and

More information

B-Trees and External Memory

B-Trees and External Memory Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015 B-Trees and External Memory 1 (2, 4) Trees: Generalization of BSTs Each internal

More information

Freeze Sorting Algorithm Based on Even-Odd Elements

Freeze Sorting Algorithm Based on Even-Odd Elements IOSR Journal of Engineering (IOSRJEN) ISSN (e) 2250-3021, ISSN (p) 2278-8719 Vol. 04, Issue 03 (March. 2014), V6 PP 18-23 www.iosrjen.org Freeze Sorting Based on Even-Odd Elements Sarvjeet Singh, Surmeet

More information

Lecture 5: Suffix Trees

Lecture 5: Suffix Trees Longest Common Substring Problem Lecture 5: Suffix Trees Given a text T = GGAGCTTAGAACT and a string P = ATTCGCTTAGCCTA, how do we find the longest common substring between them? Here the longest common

More information

Lecture 25 Spanning Trees

Lecture 25 Spanning Trees Lecture 25 Spanning Trees 15-122: Principles of Imperative Computation (Fall 2018) Frank Pfenning, Iliano Cervesato The following is a simple example of a connected, undirected graph with 5 vertices (A,

More information

Distributed minimum spanning tree problem

Distributed minimum spanning tree problem Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with

More information

Lecture Notes for Chapter 2: Getting Started

Lecture Notes for Chapter 2: Getting Started Instant download and all chapters Instructor's Manual Introduction To Algorithms 2nd Edition Thomas H. Cormen, Clara Lee, Erica Lin https://testbankdata.com/download/instructors-manual-introduction-algorithms-2ndedition-thomas-h-cormen-clara-lee-erica-lin/

More information

Department of Computer Science Admission Test for PhD Program. Part I Time : 30 min Max Marks: 15

Department of Computer Science Admission Test for PhD Program. Part I Time : 30 min Max Marks: 15 Department of Computer Science Admission Test for PhD Program Part I Time : 30 min Max Marks: 15 Each Q carries 1 marks. ¼ mark will be deducted for every wrong answer. Part II of only those candidates

More information

6.2 DATA DISTRIBUTION AND EXPERIMENT DETAILS

6.2 DATA DISTRIBUTION AND EXPERIMENT DETAILS Chapter 6 Indexing Results 6. INTRODUCTION The generation of inverted indexes for text databases is a computationally intensive process that requires the exclusive use of processing resources for long

More information

CS 310: Memory Hierarchy and B-Trees

CS 310: Memory Hierarchy and B-Trees CS 310: Memory Hierarchy and B-Trees Chris Kauffman Week 14-1 Matrix Sum Given an M by N matrix X, sum its elements M rows, N columns Sum R given X, M, N sum = 0 for i=0 to M-1{ for j=0 to N-1 { sum +=

More information

The Assignment Problem: Exploring Parallelism

The Assignment Problem: Exploring Parallelism The Assignment Problem: Exploring Parallelism Timothy J Rolfe Department of Computer Science Eastern Washington University 319F Computing & Engineering Bldg. Cheney, Washington 99004-2493 USA Timothy.Rolfe@mail.ewu.edu

More information

Trees. Courtesy to Goodrich, Tamassia and Olga Veksler

Trees. Courtesy to Goodrich, Tamassia and Olga Veksler Lecture 12: BT Trees Courtesy to Goodrich, Tamassia and Olga Veksler Instructor: Yuzhen Xie Outline B-tree Special case of multiway search trees used when data must be stored on the disk, i.e. too large

More information

CSE 410 Final Exam Sample Solution 6/08/10

CSE 410 Final Exam Sample Solution 6/08/10 Question 1. (12 points) (caches) (a) One choice in designing cache memories is to pick a block size. Which of the following do you think would be the most reasonable size for cache blocks on a computer

More information

Evaluation of Power Consumption of Modified Bubble, Quick and Radix Sort, Algorithm on the Dual Processor

Evaluation of Power Consumption of Modified Bubble, Quick and Radix Sort, Algorithm on the Dual Processor Evaluation of Power Consumption of Modified Bubble, Quick and, Algorithm on the Dual Processor Ahmed M. Aliyu *1 Dr. P. B. Zirra *2 1 Post Graduate Student *1,2, Computer Science Department, Adamawa State

More information

COMP4128 Programming Challenges

COMP4128 Programming Challenges Multi- COMP4128 Programming Challenges School of Computer Science and Engineering UNSW Australia Table of Contents 2 Multi- 1 2 Multi- 3 3 Multi- Given two strings, a text T and a pattern P, find the first

More information

Practical and Optimal String Matching

Practical and Optimal String Matching Practical and Optimal String Matching Kimmo Fredriksson Department of Computer Science, University of Joensuu, Finland Szymon Grabowski Technical University of Łódź, Computer Engineering Department SPIRE

More information

Knuth-Morris-Pratt. Kranthi Kumar Mandumula Indiana State University Terre Haute IN, USA. December 16, 2011

Knuth-Morris-Pratt. Kranthi Kumar Mandumula Indiana State University Terre Haute IN, USA. December 16, 2011 Kranthi Kumar Mandumula Indiana State University Terre Haute IN, USA December 16, 2011 Abstract KMP is a string searching algorithm. The problem is to find the occurrence of P in S, where S is the given

More information

Greedy algorithms. Given a problem, how do we design an algorithm that solves the problem? There are several strategies:

Greedy algorithms. Given a problem, how do we design an algorithm that solves the problem? There are several strategies: Greedy algorithms Input Algorithm Goal? Given a problem, how do we design an algorithm that solves the problem? There are several strategies: 1. Try to modify an existing algorithm. 2. Construct an algorithm

More information

MERGE SORT SYSTEM IJIRT Volume 1 Issue 7 ISSN:

MERGE SORT SYSTEM IJIRT Volume 1 Issue 7 ISSN: MERGE SORT SYSTEM Abhishek, Amit Sharma, Nishant Mishra Department Of Electronics And Communication Dronacharya College Of Engineering, Gurgaon Abstract- Given an assortment with n rudiments, we dearth

More information

DESIGN AND ANALYSIS OF ALGORITHMS GREEDY METHOD

DESIGN AND ANALYSIS OF ALGORITHMS GREEDY METHOD 1 DESIGN AND ANALYSIS OF ALGORITHMS UNIT II Objectives GREEDY METHOD Explain and detail about greedy method Explain the concept of knapsack problem and solve the problems in knapsack Discuss the applications

More information

Merge Sort

Merge Sort Merge Sort 7 2 9 4 2 4 7 9 7 2 2 7 9 4 4 9 7 7 2 2 9 9 4 4 Divide-and-Conuer Divide-and conuer is a general algorithm design paradigm: n Divide: divide the input data S in two disjoint subsets S 1 and

More information

Fast Implementation of the ANF Transform

Fast Implementation of the ANF Transform Fast Implementation of the ANF Transform Valentin Bakoev Faculty of Mathematics and Informatics, Veliko Turnovo University, Bulgaria MDS OCRT, 4 July, 27 Sofia, Bulgaria V. Bakoev (FMI, VTU) Fast Implementation

More information

09/28/2015. Problem Rearrange the elements in an array so that they appear in reverse order.

09/28/2015. Problem Rearrange the elements in an array so that they appear in reverse order. Unit 4 The array is a powerful that is widely used in computing. Arrays provide a special way of sorting or organizing data in a computer s memory. The power of the array is largely derived from the fact

More information

Longest Common Subsequence, Knapsack, Independent Set Scribe: Wilbur Yang (2016), Mary Wootters (2017) Date: November 6, 2017

Longest Common Subsequence, Knapsack, Independent Set Scribe: Wilbur Yang (2016), Mary Wootters (2017) Date: November 6, 2017 CS161 Lecture 13 Longest Common Subsequence, Knapsack, Independent Set Scribe: Wilbur Yang (2016), Mary Wootters (2017) Date: November 6, 2017 1 Overview Last lecture, we talked about dynamic programming

More information

AN EXPERIMENTAL INVESTIGATION OF A PRIMAL- DUAL EXTERIOR POINT SIMPLEX ALGORITHM

AN EXPERIMENTAL INVESTIGATION OF A PRIMAL- DUAL EXTERIOR POINT SIMPLEX ALGORITHM AN EXPERIMENTAL INVESTIGATION OF A PRIMAL- DUAL EXTERIOR POINT SIMPLEX ALGORITHM Glavelis Themistoklis Samaras Nikolaos Paparrizos Konstantinos PhD Candidate Assistant Professor Professor Department of

More information

Sorting Goodrich, Tamassia Sorting 1

Sorting Goodrich, Tamassia Sorting 1 Sorting Put array A of n numbers in increasing order. A core algorithm with many applications. Simple algorithms are O(n 2 ). Optimal algorithms are O(n log n). We will see O(n) for restricted input in

More information

Searching a Sorted Set of Strings

Searching a Sorted Set of Strings Department of Mathematics and Computer Science January 24, 2017 University of Southern Denmark RF Searching a Sorted Set of Strings Assume we have a set of n strings in RAM, and know their sorted order

More information

Longest Common Subsequence. Definitions

Longest Common Subsequence. Definitions Longest Common Subsequence LCS is an interesting variation on the classical string matching problem: the task is that of finding the common portion of two strings (more precise definition in a couple of

More information

Worst case examples of an exterior point algorithm for the assignment problem

Worst case examples of an exterior point algorithm for the assignment problem Discrete Optimization 5 (2008 605 614 wwwelseviercom/locate/disopt Worst case examples of an exterior point algorithm for the assignment problem Charalampos Papamanthou a, Konstantinos Paparrizos b, Nikolaos

More information

String Matching. Pedro Ribeiro 2016/2017 DCC/FCUP. Pedro Ribeiro (DCC/FCUP) String Matching 2016/ / 42

String Matching. Pedro Ribeiro 2016/2017 DCC/FCUP. Pedro Ribeiro (DCC/FCUP) String Matching 2016/ / 42 String Matching Pedro Ribeiro DCC/FCUP 2016/2017 Pedro Ribeiro (DCC/FCUP) String Matching 2016/2017 1 / 42 On this lecture The String Matching Problem Naive Algorithm Deterministic Finite Automata Knuth-Morris-Pratt

More information

Dynamic Programming. Lecture Overview Introduction

Dynamic Programming. Lecture Overview Introduction Lecture 12 Dynamic Programming 12.1 Overview Dynamic Programming is a powerful technique that allows one to solve many different types of problems in time O(n 2 ) or O(n 3 ) for which a naive approach

More information

Week 12: Running Time and Performance

Week 12: Running Time and Performance Week 12: Running Time and Performance 1 Most of the problems you have written in this class run in a few seconds or less Some kinds of programs can take much longer: Chess algorithms (Deep Blue) Routing

More information

Homework3: Dynamic Programming - Answers

Homework3: Dynamic Programming - Answers Most Exercises are from your textbook: Homework3: Dynamic Programming - Answers 1. For the Rod Cutting problem (covered in lecture) modify the given top-down memoized algorithm (includes two procedures)

More information

Merge Sort Roberto Hibbler Dept. of Computer Science Florida Institute of Technology Melbourne, FL

Merge Sort Roberto Hibbler Dept. of Computer Science Florida Institute of Technology Melbourne, FL Merge Sort Roberto Hibbler Dept. of Computer Science Florida Institute of Technology Melbourne, FL 32901 rhibbler@cs.fit.edu ABSTRACT Given an array of elements, we want to arrange those elements into

More information

Point Enclosure and the Interval Tree

Point Enclosure and the Interval Tree C.S. 252 Prof. Roberto Tamassia Computational Geometry Sem. II, 1992 1993 Lecture 8 Date: March 3, 1993 Scribe: Dzung T. Hoang Point Enclosure and the Interval Tree Point Enclosure We consider the 1-D

More information

CMPS 2200 Fall Dynamic Programming. Carola Wenk. Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk

CMPS 2200 Fall Dynamic Programming. Carola Wenk. Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk CMPS 00 Fall 04 Dynamic Programming Carola Wenk Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk 9/30/4 CMPS 00 Intro. to Algorithms Dynamic programming Algorithm design technique

More information

Evaluating find a path reachability queries

Evaluating find a path reachability queries Evaluating find a path reachability queries Panagiotis ouros and Theodore Dalamagas and Spiros Skiadopoulos and Timos Sellis Abstract. Graphs are used for modelling complex problems in many areas, such

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective. Part I: Operating system overview: Memory Management

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective. Part I: Operating system overview: Memory Management ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part I: Operating system overview: Memory Management 1 Hardware background The role of primary memory Program

More information

XIV International PhD Workshop OWD 2012, October Optimal structure of face detection algorithm using GPU architecture

XIV International PhD Workshop OWD 2012, October Optimal structure of face detection algorithm using GPU architecture XIV International PhD Workshop OWD 2012, 20 23 October 2012 Optimal structure of face detection algorithm using GPU architecture Dmitry Pertsau, Belarusian State University of Informatics and Radioelectronics

More information

The Language for Specifying Lexical Analyzer

The Language for Specifying Lexical Analyzer The Language for Specifying Lexical Analyzer We shall now study how to build a lexical analyzer from a specification of tokens in the form of a list of regular expressions The discussion centers around

More information

TRIE BASED METHODS FOR STRING SIMILARTIY JOINS

TRIE BASED METHODS FOR STRING SIMILARTIY JOINS TRIE BASED METHODS FOR STRING SIMILARTIY JOINS Venkat Charan Varma Buddharaju #10498995 Department of Computer and Information Science University of MIssissippi ENGR-654 INFORMATION SYSTEM PRINCIPLES RESEARCH

More information

Enemy Territory Traffic Analysis

Enemy Territory Traffic Analysis Enemy Territory Traffic Analysis Julie-Anne Bussiere *, Sebastian Zander Centre for Advanced Internet Architectures. Technical Report 00203A Swinburne University of Technology Melbourne, Australia julie-anne.bussiere@laposte.net,

More information

Outline. Computer Science 331. Course Information. Assessment. Contact Information Assessment. Introduction to CPSC 331

Outline. Computer Science 331. Course Information. Assessment. Contact Information Assessment. Introduction to CPSC 331 Outline Computer Science 331 Introduction to CPSC 331 Mike Jacobson Department of Computer Science University of Calgary Lecture #1 1 Contact Information 2 3 Expected Background 4 How to Succeed 5 References

More information

Optimizing Fusion iomemory on Red Hat Enterprise Linux 6 for Database Performance Acceleration. Sanjay Rao, Principal Software Engineer

Optimizing Fusion iomemory on Red Hat Enterprise Linux 6 for Database Performance Acceleration. Sanjay Rao, Principal Software Engineer Optimizing Fusion iomemory on Red Hat Enterprise Linux 6 for Database Performance Acceleration Sanjay Rao, Principal Software Engineer Version 1.0 August 2011 1801 Varsity Drive Raleigh NC 27606-2072 USA

More information

A GPU Algorithm for Comparing Nucleotide Histograms

A GPU Algorithm for Comparing Nucleotide Histograms A GPU Algorithm for Comparing Nucleotide Histograms Adrienne Breland Harpreet Singh Omid Tutakhil Mike Needham Dickson Luong Grant Hennig Roger Hoang Torborn Loken Sergiu M. Dascalu Frederick C. Harris,

More information

Informatics 1. Lecture 1: Hardware

Informatics 1. Lecture 1: Hardware Informatics 1. Lecture 1: Hardware Kristóf Kovács, Ferenc Wettl Budapest University of Technology and Economics 2017-09-04 Requirements to pass 3 written exams week 5, 9, 14 each examination is worth 20%

More information

So far... Finished looking at lower bounds and linear sorts.

So far... Finished looking at lower bounds and linear sorts. So far... Finished looking at lower bounds and linear sorts. Next: Memoization -- Optimization problems - Dynamic programming A scheduling problem Matrix multiplication optimization Longest Common Subsequence

More information

CSCI 5454 Ramdomized Min Cut

CSCI 5454 Ramdomized Min Cut CSCI 5454 Ramdomized Min Cut Sean Wiese, Ramya Nair April 8, 013 1 Randomized Minimum Cut A classic problem in computer science is finding the minimum cut of an undirected graph. If we are presented with

More information