PLEASE SCROLL DOWN FOR ARTICLE. Full terms and conditions of use:
|
|
- Christopher Maxwell
- 6 years ago
- Views:
Transcription
1 This article was downloaded by: [Universiteit Twente] On: 21 May 2010 Access details: Access Details: [subscription number ] Publisher Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: Registered office: Mortimer House, Mortimer Street, London W1T 3JH, UK International Journal of Computer Mathematics Publication details, including instructions for authors and subscription information: On-line string matching algorithms: survey and experimental results P. D. Michailidis a ;K. G. Margaritis a a Parallel and Distributed Processing Laboratory, Department of Applied Informatics, University of Macedonia, Thessaloniki, Greece To cite this Article Michailidis, P. D. andmargaritis, K. G.(2001) 'On-line string matching algorithms: survey and experimental results', International Journal of Computer Mathematics, 76: 4, To link to this Article: DOI: / URL: PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.
2 htern. J. Computer Math., Vol. 76, pp Reprints available directly from the publisher Photocopying permitted by license only OPA (Overseas Publishers Association) N.V. Published by license under the Gordon and Breach Science Publishers imprint. Pnnted in Singapore. ON-LINE STRING MATCHING ALGORITHMS: SURVEY AND EXPERIMENTAL RESULTS P. D. MICHAILIDIS and K. G. MARGARITIS* Parallel and Distributed Processing Laboratory, Department of Applied Informatics, University of Macedonia, 156 Egnatia Str., P.O. Box 1591, 54006, Thessaloniki, Greece (Received 9 March 2000) In this paper we present a short survey and experimental results for well known sequential string matching algorithms. We consider algorithms based on different approaches including classical, suffx automata, bit-parallelism and hashing. We put special emphasis on algorithms recently prewnted such as Shift-Or and BNDM algorithms. We compare these algorithms in terms of the number of character comparisons and the running time for four different types of text: binary alphabet, alphabet of size 8, English alphabet and DNA alphabet. Keywords: String matching; Pattern matching; String searching; Text searching; Text editing C. R. Categories: F.2.2, INTRODUCTION Pattern matching is a basic problem in computer science and it occurs naturally as part of data processing, information retrieval, speech recognition, vision for two dimensional image recognition and computational biology. The type of pattern matchmg discussed in this paper is exact string matching. String matching is a special case of pattern matching, where the pattern is described by a finite sequence of symbols (or alphabet C). It consists of finding one or more generally all the occurrences of a short pattern *Corresponding authors. {panosm, kmarg}@uom.gr 41 1
3 412 P. D. MICHAILIDIS AND K. G. MARGARITIS P=P[O]P[l]...P[m-11 of lengthmin a large text T=T[O]T[l]..-T[n-11 of length n, where m, n > 0 and m 5 n. Both P and Tare built over the same alphabet C. The solution to this problem differ if the algorithm has to be on-line (that is, the text is not known in advance) or off-line (the text can be preprocessed). In this paper, we focus on on-line algorithms for this problem. Numerous solutions to string matching problem have been designed [2,10,29 and 241. In general, an on-line string matching algorithm consists of two phases: the preprocessing phase in P and the search phase of P in T. During the preprocessing phase a data structure Xis constructed, X is usually proportional to the length of the pattern and its details vary in different algorithms. The search phase uses the data structure X and it tries to quickly determine if the pattern occurs in the text. This phase is based on four different approaches including classical, suffix automata, bit-parallelism and hashing algorithms. More specifically, for the string matching problem, the algorithms can be divided in four categories: Classical algorithms Brute-Force [24] algorithm, Knuth-Morris- Pratt [18] algorithm, Simon [14] algorithm, Colussi [8] algorithm, Boyer - Moore [3] algorithm, the variations of the Boyer - Moore algorithm like Galil [12] algorithm, Apostolico - Giancarlo [I] algorithm, Turbo-BM [7] algorithm, Reverse Colussi [9] algorithm, Boyer - Moore- Horspool [16] algorithm, Sunday's algorithms (Quick Search, Optimal Mismatch, Maximal Shift) [30], Boyer - Moore - Horspool - Raita [26] algorithm and Boyer - Moore - Smith [28] algorithm. Su@x automata algorithms Reverse Factor [21 and 71 algorithm and Turbo Reverse Factor [7] algorithm. Bit-parallelism algorithms Shift-Or [6] algorithm, Shift-And [31] algorithm and BNDM [25] algorithm. Hashing algorithms Harrison [I51 algorithm and Karp- Rabin [24] algorithm. Several experiments on string matching algorithms have already been reported [16,27,11,4,30,17,28,6,26,22,23 and 251. In this paper we report experiments on eleven well known algorithms from each category: the Brute-Force algorithm, the Knuth-Morris-Pratt algorithm, the Boyer- Moore algorithm, the Turbo-BM algorithm, the Boyer-Moore-Horspool algorithm, the Quick-Search algorithm, the Boyer - Moore - Smith algorithm, the Reverse Factor algorithm, the Shift-Or algorithm, the BNDM algorithm and the Karp- Rabin algorithm.
4 STRING MATCHING ALGORITHMS 413 This paper is organized as follows: in the next section we present the algorithms tested. In the third section we describe the experimental methodology including the test environment, types of test data and ways measures for the comparison of the algorithms. In section four we present the results of our experiments in the form of performance tables and graphs. In the last section, we discuss the conclusions of this paper, and outline some goals for further research. 2. STRING MATCHING ALGORITHMS In this section we present the basic sequential algorithms tested for solving of the string matching problem. However, for the further details and the coding of the algorithms, the reader is referred to [24] and the original references Classical Approach The classical string matching algorithms are based on character comparisons. The Brute-Force (in short, BF algorithm) [24] algorithm, which is the simplest, performs character comparisons between a character in the text and a character in the pattern from left to right. In any case, after a mismatch or a complete match of the entire pattern it shifts exactly one position to the right. It requires no preprocessing phase and no extra space. The BF algorithm has O(mn) worse-case time complexity. The average number of character comparisons is n(l + l/((ci - 1)). The Knuth-Morris-Pratt (in short, KMP) [18] algorithm, which was the first linear time string matching algorithm discovered, performs character comparisons from left to right. In case of mismatch it uses the knowledge of the previous characters that we have already examined in order to compute the next position of the pattern to use. In addition, this algorithm provides the advantage that the pointer in the text is never decremented. The preprocessing phase of the KMP algorithm requires O(m) time and space. The searching phase needs O(n) time in the worse and average case. The next algorithm is Boyer- Moore (in short, BM) [3] algorithm, which is known to be very fast in practice, performs character comparisons between a character in the text and a character in the pattern from right to
5 414 P. D. MICHAILIDIS AND K. G. MARGANTIS left. After a mismatch or a complete match of the entire pattern it uses two shift heuristics to shift the pattern to the right. These two heuristics are called the occurrence heuristic and the match heuristic. For the length of the shlft is the maximum shift between the occurrence heuristic and the match heuristic. The details for two heuristics are referred to original paper [3]. These heuristics are preprocessed in O(m+JCI) time and space. The searching phase of the BM algorithm needs O(n+ rm) time in the worse case, where r is the number of occurrences of the pattern in the text. Finally, the expected performance of the BM algorithm is sublinear requiring about nlm character comparisons on average. The Turbo-BM (in short, TBM) [7] algorithm is an variant of the BM algorithm. It consists in remembering the substring of the text that matched a suffix of the pattern during the last character comparisons (and only if a good suffix shift has been performed). This method has two advantages: a) it is possible to jump over this substring and b) it can enable to perform a turbo shift. The details for the turbo shift is referred to original paper [7]. It can be shown that the number of character comparisons performed by the TBM algorithm is bounded by 2n. The Boyer - Moore- Horspool (in short, BMH) [16] algorithm does not use the match heuristic. In case of mismatch or match of the pattern, the length of the shift is maximized by using only the occurrence heuristic for the text character corresponding to the rightmost pattern character (and not for the text character where the mismatch occurred). The preprocessing phase of the BMH algorithm requires O(m+ 1x1) time and reduces the space requirements from O(m+lCI) to O(IC1). Finally, the searching phase requires O(mn) time in the worse case but it can be proved that the average number of character comparisons is n/lci. The Quick Search (in short, QS) [30] algorithm of Sunday, performs character comparisons from left to right from the leftmost pattern character and in case of mismatch it computes the shift with the occurrence heuristic for the first text character after the last pattern character by the time of mismatch. The preprocessing and searching time of the QS algorithm are same as the BMH algorithm. The Boyer-Moore-Smith (in short, BMS) [28] algorithm, noticed that computing the shift with the text character just next the rightmost text character gives sometimes shorter shift than using the rightmost text character. He advised then to take the maximum between the two values. The preprocessing phase of the BMS algorithm consists of O(m+ (El) time and O(IC1) space. Further, this algorithm has O(mn) worse case time complexity.
6 2.2. Suffix Automata Approach STRING MATCHING ALGORITHMS 415 This category uses the suffix automaton data structure (frequently called DAWG- for Deterministic Acyclic Word Graph) that recognizes all the suffixes of the pattern [lo and 251. The Reverse Factor (in short, RF) [21 and 71 algorithm, which performs the characters of the text from right to left using the smallest suffix automaton of the reverse pattern. The preprocessing phase of the RF algorithm requires linear time and space in the length of the pattern. The searching phase of RF algorithm has a quadratic worse-case time complexity but it is optimal on the average. It performs O(nlogm/m) characters comparisons on the average Bit Parallelism Approach Bit parallelism [6 and 51 uses the intrinsic parallelism of the bit manipulations inside computer words to perform many operations in parallel (whose number of bits in the computer word we denote w). This technique has became a general way to simulate simple nondeterministic finite automata (NFA) instead of converting them to deterministic. The main advantages of this approach are simplicity, flexibility and no buffering. The basic idea of the first Shift-Or (in short, SO) [6] algorithm, is to represent the state of the search as a number, and each search step costs a small number of arithmetic and logical operations, provided that the numbers are large enough to represent all possible states of the search. Assuming that the pattern length is no longer than the computer word of the machine, the time complexity of the preprocessing phase is O((m + 1x1) [mlw]) using O(mlC1) extra space. Finally, the time complexity of the searching phase is O(n rm/wl) in the worse and average case, where rrnlwl is the time to compute a shift or other simple operation on numbers of m bits using a word size of w bits. An new algorithm has appeared recently, called Backward Nondeterministic DAWG Matching (BNDM) [25]. This algorithm uses a nondeterministic suffix automaton that is simulated using bit-parallelism. The preprocessing time for the BNDM algorithm is O(m+ lc() for m 5 w using O(rn(C() extra space. The searching time is O(mn) in the worse case and O(nlogrn/m) on average Hashing Approach We introduce a different approach to string matching, the Karp-Rabin (in short, KR) [24] algorithm, which uses hashing techniques. Hashing provides
7 416 P. D. MICHAILIDIS AND K. G. MARGARITIS a simple method to avoid a quadratic number of character comparisons in most practical situations. The main idea of the KR algorithm is to compute the signature or hashing function of each possible m-character substring in the text and check if it is equal to the signature function of the pattern. The preprocessing phase of the KR algorithm requires O(m) time while the searching phase has O(mn) worse case time complexity. Its expected number of character comparisons is O(m+n). 3. EXPERIMENTAL METHODOLOGY In this section we present the testing methodology which used in our experiments in order to compare the relative performance of string matching algorithms. The parameters which is described the performance of the algorithms are: a) The text size, b) The pattern length and c) The alphabet size. It is known that none of the algorithms are optimal or best in all three cases. Therefore, the main goal in our experimental study is to compare the practical performance of the algorithms against the length of the pattern (small and long patterns) under various alphabets of different sizes (or types of text) i.e., binary alphabet, alphabet of size 8, English alphabet and DNA alphabet, which have different characteristics Test Environment The experiments were run on a Sun UltraSparc-1 of 143Mhz clock, with 64 Mb RAM which is a 32 bit machine and a 2.1 Gb local hard disk. The operating system is Solaris 2.5. During all experiments, this machine was not performing other heavy tasks (or processes). The data structures used in the testing were all in physical memory during the experiments. Finally, the algorithms presented in the Section 2 have been implemented in ANSI C programming language [19] in a homogeneous way so as to keep their comparison significant, using the compiler cc. We greatly used the code presented in [4,13 and 241 for known algorithms Types of Test Data We note that because the performances of the string matching algorithms depended upon statistical properties of the pattern and the text string from
8 STRING MATCHING ALGORITHMS 417 which the test patterns were obtained, experiments were performed on four different types of texts: binary alphabet, alphabet of size 8, English alphabet and DNA alphabet Binary Alphabet The alphabet is C = (0, 1). The text is consisted of 150,000 characters and was randomly built. For patterns of lengths between 2 and 100 we search 50 of them random built Alphabet of Size 8 The alphabet is C = {a, b, c, d, e,f, g, h). The text is consisted of 150,000 characters and was random built. In addition, for patterns of lengths between 2 and 100 we search 50 of them random built English Alphabet We used a document of English language from an web page. The alphabet is consisted of 70 different characters. The text is consisted of 148,188 characters and we search 50 patterns of each length from 2 to 100 characters were chosen at random from words inside the text DNA Alphabet The DNA alphabet consists of the four nucleotides a, c, g and t (standing for adenine, cytosine, guanine, and thymine, respectively) used to encode DNA. Therefore, the alphabet is 6 = {a, c, g, t). The text is consisted of 997,642 characters and we search 50 patterns of each length from 10 to 100 characters. Finally, the text and the patterns is portion of the GenBank DNA database, as distributed by Hume and Sunday [17] Measures of Comparison For the comparison of the string matching algorithms we used the number of character comparisons and the practical running time as measures. The counting of the number of character comparisons is the same as that used by Smith [28], that is, computing the number of actually compared characters to the number of passed characters in the text. Since all algorithms are designed to find all occurrences of a pattern in the text in our experiments, the number of passed characters is always n - m + 1. The running time is the
9 418 P. D. MICHAILIDIS AND K. G. MARGARITIS total time of calling an algorithm to search a pattern in the text including the preprocess time of building the auxiliary arrays. The running time is obtained by calling the C function clock () and it is measured in seconds. Thus, we measured the number of character comparisons and the running time all the algorithms in Section 2 in order to examine the effect of the pattern length. We performed the following test series: We measured the effect of the pattern length in a test series with varying m = 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 60, 80 and 100. In case of the DNA alphabet we used longer patterns because this alphabet has biological applications on long patterns. For this reason, in this alphabet we measured the effect of the pattern length in a test series with varying m = 10, 20, 30, 40, 50 and 100. Finally, to decrease random variation, the results of the algorithms are averages of 50 runs with different patterns of each length. We note that the bit-parallelism algorithms (such as SO and BNDM) use only the running time measure because they involve only implicit character comparisons. In addition, we know that they are limited to run for pattern length smaller than the word size in bits. For this reason, in our experimental study the SO and BNDM algorithms are limited to m EXPERIMENTAL RESULTS In the previous sections we have briefly presented the most well known string matching algorithms and the experimental methodology of our test. In this section, we present the experimental results for the string matching algorithms according to the number of character comparisons and the running time. Finally, the performance of each algorithm was plotted against the length of the pattern for each type of text Results for the Number of Character Comparisons Figures 1 to 4 and Tables I to IV show the results for the number of character comparisons for a binary alphabet, an alphabet of size 8, an English alphabet and a DNA alphabet respectively, against the pattern length. It can be seen that the KMP and KR algorithms produce in all cases exactly 1 character comparison. Further, the BF algorithm produces approximately the same number of character comparisons with the KMP and KR algorithms for the alphabet of size 8 and for the English alphabet. The BF requires more character comparisons for small size alphabet (i.e., the binary or the genome alphabet). Based on the empirical results, it is clear
10 STRING MATCHING ALGORITHMS 419..\-, :! Pattern length FIGURE 1 Binary alphabet. Pattern length FIGURE 2 Alphabet of size 8. Pattern length FIGURE 3 English alphabet. 1...:-:: q -.-. Kii, that for patterns of length greater than 10, the number of character comparisons is approximately 2, twice the number required by the KMP and KR algorithms for the binary alphabet. For the DNA alphabet case the BF requires on average 1,34 character comparisons. This occurs because
11 420 P. D. MICHAILIDIS AND K. G. MARGARITIS " Pattern length FIGURE 4 DNA alphabet. when the small size alphabet is used it leads to many exact pattern matches in the text and as a result the number of character comparisons tends to be greater than 1. However, when a larger alphabet is used this phenomenon is alleviated according to Figures 2 and 3. The number of character comparisons of the BM-like algorithms (such as BM, BMH, QS, BMS and TBM) and the suffix automata algorithm (such as RF) is generally less than 1 with the exception of the binary alphabet where the BMH and QS algorithms have on average 1,25 and 1,l character comparisons. Furthermore, it must be noted that the number of character comparisons of the BM-like and the RF algorithms is significantly higher when the binary alphabet is used than with any other type of text. It should also be observed that for all those algorithms the number of character comparisons decreases significantly as the pattern length increases. Thus the empirical results support theoretical evidence that the BM-like and the RF algorithms are sublinear in the number of character comparisons. The number of character comparisons decreases more slowly as the pattern length increases because for long patterns the probability is higher that the character just fetched occurs somewhere in the pattern, and therefore the distance the pattern can be moved forward (if a mismatch occurs) is shortened. Moreover, it is noticed that the character comparisons of all BM-like algorithms are very close to one another results and tend to stabilize to a certain performance measure except for the binary alphabet. Finally, for long patterns the difference between the number of character comparisons performed by the BM-like algorithms and the number of character comparisons performed by the suffix automata algorithm like RF increases in all cases. In all cases, it can be seen that the BM-like algorithms and suffix automata algorithm (like RF) have better results. More specifically, the
12 TABLE 1 Number of character comparisons for a binary alphabet m BF KMP BM BMH es BMS TBM RF KR - 2 1, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,16883 I 60 2, , , , , , , , , , , , , , , , , , , , , , , , Average 1, , , , , , , , ,099198
13 TABLE 11 Number of character comparisons for an alphabet of size 8 m BF KMP BM BMH 0s BMS TBM RF KR 2 1, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , I 100 1, , , , , , , , Average 1, , ,2378 0, , , , , ,002558
14 TABLE 111 Number of character comparisons for an English alphabet rn BF - KMP BM BMH Qs BMS TBM RF KR 2 0, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,0248 0,06 0, , , , , , , ,061 0,0533 0, , , ,0249 0, , , , , , , , ,0423 0, , , ,0423 0, Average 1, , , , , , , , ,001567
15 TABLE 1V Number of character comvarisons for a DNA alvhabet m BF KMP BM BMH Qs BMS TBM RE KR 10 1, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,3453 1, , , , , , , Average 1, ,1066 0, , , , , , ,000017
16 STRING MATCHING ALGORITHMS 425 BM-like algorithms (such as TBM and BMS) and the RF algorithm is much more efficient in terms of number of character comparisons than the remaining algorithms for small and long patterns respectively Results for the Running Time Figures 5 to 8 and Tables V to VIII show the results for the practical running time for a binary alphabet, an alphabet of size 8, an English alphabet and a DNA alphabet respectively, against the pattern length. We observe that in all cases the KR algorithm requires much more time than any other algorithm. This observation agrees with the expected behaviour that the computation of the hash values is computationally expensive in terms of machine cycles and so increases the running time of the algorithm. Therefore, this algorithm isn't recommend for text applications. FIGURE 5 Pattsrn length Binary alphabet W c VNL W Pattern length FIGURE 6 Alphabet of size 8.
17 426 P. D. MICHAILIDIS AND K. G. MARGARITIS ktt.rn bngth FIGURE 7 English alphabet. FIGURE 8 DNA alphabet. Further, based on empirical results, it is clear that in all cases the KMP algorithm is relatively little slower than the BF algorithm for almost all pattern lengths with the exception of the binary alphabet. This behaviour support theoretical evidence that the KMP algorithm isn't better than the BF algorithm on the average case. Further, it can also be seen that in all cases the BF and KMP algorithms are significantly slower than the BM-like and bit-parallelism algorithms. The running time of the BM-like and bit parallelism (like BNDM) algorithms decreases significantly as the pattern length increases. Moreover, it should be noticed that the BM-like algorithms produce similar running times i.e., very close to each other in all cases with the exception of the binary alphabet. In addition, for long patterns the difference between the running times of BM-like algorithms and of suffix automata algorithms like RF increases in all cases with the exception of the English alphabet. This difference is in favour of RF algorithm.
18 TABLE V Running times for a binary alphabet m BF KMP BM BMH 0s BMS TBM RF SO BNDM KR Average
19 t Gz2wHwrn~m-2F3z2~ OP-Wm --o-somaqq~~n~~g-2-ggg 3 OOOOOOOOOOOOOOOC
20 TABLE VII Running times for an English alphabet m BF KMP BM BMH Qs BMS TBM RF SO BNDM KR Average
21 TABLE VIII Running times for a DNA alphabet m BF KMP BM BMH Qs BMS TBM RF SO BNDM KR 10 0,4958 0,6146 0,1704 0,1596 0,157 0,1638 0,317 0,1248 0,3018 0,1334 1, ,4958 0,6108 0,1546 0,1684 0,161 0,1702 0,2902 0,0752 0,3024 0,0746 1, ,4916 0,605 0,144 0,1534 0,1472 0,153 0,261 0,0554 0,3024 0,0528 1, ,494 0,6084 0,1204 0,1498 0,1474 0,1502 0,2256 0, , ,4932 0,61 0,1286 0,1568 0,1556 0,157 0,2412 0, , ,4931 0,62 0,1252 0,156 0,161 0,1581 0,2221 0, ,5578 Average 0, , , , , , , ,0647 0,3022 0, ,561767
22 STRING MATCHING ALGORITHMS 43 1 The SO bit-parallelism algorithm outperforms KR, KMP and BF algorithms for all pattern lengths. So is faster than the TBM and BNDM algorithms only for small patterns. The latter observation is valid in all cases with the exception of the binary alphabet. However, it can be seen that the SO algorithm outperforms than the BM-like and suffix automata algorithms for small patterns especially for the binary alphabet. Finally, it can be seen that in the majority of cases the suffix automata algorithm such as RF has a faster running time than the BM-like and the bitparallelism algorithms for long patterns. Further, the BM-like algorithms have better running times for small patterns except for the binary alphabet. 5. CONCLUSIONS We have presented experimental results of an extensive set experiments of the most well known string matching algorithms based on classical, suffix automata, bit-parallelism and hashing approach. Therefore, the conclusions of this paper fall into two main categories: general conclusions regarding the algorithms and their testing procedures, and conclusions relating to the performance of specific algorithms. As a general conclusion we can say that testing the algorithms on four different types of text (binary alphabet, alphabet of size 8, English alphabet and DNA alphabet) indicates that varying parameters such as the pattern length and the alphabet size can produce different performances. The specific performance conclusions are: It should be noticed that the absolute shapes of the lines on the performance graph are not conclusive. Information can only be derived from the relative positions of the curves for each algorithm at each pattern length. This is because the patterns were chosen at random and obviously the running time is related to how far into the text the pattern occurs. The running times for all the eleven algorithms can be compared at each pattern length because the same type of text and set of patterns were used with each algorithm. From the empirical evidence it can be concluded that the KR algorithm is linear in the number character comparisons but it has higher running time and it shouldn't be used for pattern matching in strings. However, the main advantage of this algorithm lies in its extension to higher dimensional string matching. It may be used for pattern recognition and image processing and thus in the expanding field of computer graphics. If you plan on direct searching with simple text, the linear BF algorithm is a proper choice because it produces relatively good running time results
23 432 P. D. MICHAILIDIS AND K. G. MARGARITIS despite its striking simplicity. In addition, the BF algorithm has no special memory requirements and needs no preprocessing or complex coding and thus can be surprisingly fast. But this algorithm shouldn't used for the binary alphabet in applications such as image processing or software systems. Despite its theoretical elegance, the KMP algorithm provides no significant speedup advantage over the BF algorithm in practice unless the pattern has highly repetitive subpatterns. However the KMP algorithm guarantees a linear bound and it is well suited to extensions for more difficult problems. It may be a good choice when the alphabet size is near the text size or when dealing with the binary alphabet. As far as the variations of the BM approach we can make the following remarks: Based on empirical results, it is clear that the QS algorithm is proved to be much faster algorithm in practice than the rest BM-like, suffix automata and bit-parallelism algorithms for large alphabets and short patterns. Therefore it is typically suited for search in the English alphabet. In addition, the BM algorithm is faster than its variations (such as BMH, QS, BMS and TBM) for small alphabets and long patterns. However, in theory BMS and QS are better algorithms than BM-like and suffix automata algorithms for short patterns and large alphabets. The TBM and BMS algorithms are also good both for small alphabets and short or medium patterns. We must also note that the main disadvantage of BM-like algorithms is the preprocessing time and the space required, which depends on the alphabet size and/or the pattern size. For this reason, if the pattern is small (1 to 4 characters) it is better to use the BF algorithm. Furthermore, the BM-like algorithms can't to be used if the type of string matching problem is different than finding the first occurrence of a pattern. For example, if the problem is to find the first of several possible patterns or to recognize a position in the text defined by a regular expression. This is also because the preprocessing time would be significant. It should be noted that for long patterns the running time of the suffix automata algorithm (RF) increases because of the preprocessing phase, the time for which is equal to the time for the searching phase. Thus, the RF algorithm is efficient in theory and practice for small alphabets and long patterns. Therefore, this algorithm is a good choice to be used for DNA applications. In practice, the bit-parallelism algorithms (SO and BNDM) are always fastest for small alphabets and short patterns. Also, the SO algorithm produces linear running time similar to the BF and KMP algorithms. In particular, the BNDM algorithm is the fastest and outperforms BM-like
24 STRING MATCHING ALGORITHMS 433 algorithms for moderate patterns. However, the main advantage of the algorithms, is that it is simple to implement and support class of characters (i.e. [a-z]), don't care symbols (a don't care symbol matches any symbol), complement of a character or a class, and other extensions developed by [31] such as wild cards (a wild card is a symbol that matches all characters), set of patterns, long patterns, etc., using exactly the same searching time (only the preprocessing is different). On the other hand, these algorithms have the disadvantage that the patterns is limited to 32 or 64 characters (32 or 64 being the word size of many of today's machines). Handling long patterns is fairly easy to do (you need to use multiprecision bit operations), but it can slow down the algorithms significantly. For many applications, however, a maximum pattern length of 32 or 64 characters is not much of a problem. In addition, we notice that the theoretical time complexities of algorithms [24] are valid only in the average case. For instance, the experiments have shown that on average, the algorithms such as BF, BMH, QS, BMS and BNDM have good behavior. On the other hand, the experiments have shown that in the worst and average cases, only the BM, RF and SO algorithms are fast both theoretically and practically. References [I] Apostolico, A. and Giancarlo, R. (1986). The Boyer-Moore-Galil string searching strategies revisited, SZAM Journal on Computing, 15(1), [2] Aho, A. V., Algorithms forjindingpatterns in strings, Chapter 5 (pp ) of Leeuwen J. van (Ed.) Handbook of Theoretical Computer Science, Elsevier Science Publishers, Amsterdam. [3] Boyer, R. S. and Moore, J. S. (1977). A fast string searching algorithm, Communications of the ACM, 20(10), [4] Baeza-Yates, R. (1989). Algorithms for string searching: A survey, ACM SZGZR Forum, 23(3-4), [5] Baeza-Yates, R. (1992). Text Retrieval: Theory and Practice, In: Proc. of the 12th IFZP World Computer Congress, pp (Madrid, Spain), North-Holland. [6] Baeza-Yates, R. and Gonnet, G. H. (1992). A new approach to text searching, Communications of the ACM, 35(10), [7] Crochemore, M., Czumaj, A,, Gasieniec, L., Jarominek, S., Lecroq, T., Plandowski, W. and Rytter, W. (1994). Speeding Up Two String Matching Algorithms, Algorithmica, 12(4-5), [8] Colussi, L. (1991). Correctness and efficiency of the pattern matching algorithms, Information and Computation, 95(2), [9] Colussi, L. (1994). Fastest pattern matching in strings, Journal of Algorithms, 16(2), [lo] Crochemore, M. and Rytter, W. (1994). Text Algorithms, Oxford University Press. [I 11 Davies, G. and Bowsher, S. (1986). Algorithms for pattern matching, Software-Practice and Experience, 16(6), [12] Galil, Z. (1979). On improving the worst case running time of the Boyer-Moore string searching algorithm, Communications of the ACM, 22(9), [I31 Gonnet, G. H. and Baeza-Yates, R. (1991). Handbook of Algorithms and Data Structures in Pascal and C, 2nd edition, Addison-Wesley, Workingham, pp
25 434 P. D. MICHAILIDIS AND K. G. MARGARITIS Hancart, C. (1993). On Simon's string searching algorithm, Information Processing Letters, 47(2), Harrison, M. C. (1971). Implementation of the substring test by hashing, Communications of the ACM, 14(12), Horspool, R. N. (1980). Practical fast searching in strings, Software-Practice and Experience, 10(6), Hume, A. and Sunday, D. (1991). Fast string searching, Software-Practice and Experience, 21(1 I), Knuth, D. E., Morris, J. H, and Pratt, V. R. (1977). Fast pattern matching in strings, SIAM Journal on Computing, 6(2), Kernighan, B. W. and Ritchie, D. M. (1988). The C Programming Language, Prentice Hall, Englewood Cliffs, NJ, 2nd edition. Liu, Z., Du, X. and Ishii, N. (1998). An improved adaptive string searching algorithm, Software-Practice and Experience, 28(2), Lecroq, T. (1992). A variation on the Boyer-Moore algorithm, Theoretical Computer Science, 92(1), Lecroq, T. (1995). Experimental results on string matching algorithms, Software-Practice and Experience, 25(7), Manolopoulos, Y. and Faloutsos, C. (1996). Experimenting with pattern matching algorithms, Information Sciences, 90(1-4), Michailidis, P. and Margaritis, K. (1999). String Matching Algorithms, Technical Report, Department of Ap. Informatics, University of Macedonia (in Greek). Navarro, G. and Raffinot, M. (1998). A Bit-Parallel Approach to Suffix Automata: Fast Extended String Matching, In: Proc. of the 9th Annual Symposium on Combinatorial Pattern Matching, No. 1448, pp , Springer-Verlag, Berlin. Raita, T. (1992). Tunning the Boyer-Moore-Horspool string searching algorithm, Software-Practice and Experience, 22(10), Smit, G. and De, V. (1982). A Comparison of Three String Matching Algorithms, Software-Practice and Experience, 12(1), Smith, P. (1991). Experiments with a very fast substring search algorithm, Sofiware- Practice and Experience, 21(10), Stephen, A. G. (1994). String Searching Algorithms, World Scientific Press. Sunday, D. (1990). A very fast substring search algorithm, Communications ofthe ACM, 33(8), Wu, S. and Manber, U. (1992). Fast text searching allowing errors, Communications of the ACM, 35(10),
Experimental Results on String Matching Algorithms
SOFTWARE PRACTICE AND EXPERIENCE, VOL. 25(7), 727 765 (JULY 1995) Experimental Results on String Matching Algorithms thierry lecroq Laboratoire d Informatique de Rouen, Université de Rouen, Facultés des
More informationExperiments on string matching in memory structures
Experiments on string matching in memory structures Thierry Lecroq LIR (Laboratoire d'informatique de Rouen) and ABISS (Atelier de Biologie Informatique Statistique et Socio-Linguistique), Universite de
More informationFast Substring Matching
Fast Substring Matching Andreas Klein 1 2 3 4 5 6 7 8 9 10 Abstract The substring matching problem occurs in several applications. Two of the well-known solutions are the Knuth-Morris-Pratt algorithm (which
More informationA Performance Evaluation of the Preprocessing Phase of Multiple Keyword Matching Algorithms
A Performance Evaluation of the Preprocessing Phase of Multiple Keyword Matching Algorithms Charalampos S. Kouzinopoulos and Konstantinos G. Margaritis Parallel and Distributed Processing Laboratory Department
More informationA very fast string matching algorithm for small. alphabets and long patterns. (Extended abstract)
A very fast string matching algorithm for small alphabets and long patterns (Extended abstract) Christian Charras 1, Thierry Lecroq 1, and Joseph Daniel Pehoushek 2 1 LIR (Laboratoire d'informatique de
More informationFast exact string matching algorithms
Information Processing Letters 102 (2007) 229 235 www.elsevier.com/locate/ipl Fast exact string matching algorithms Thierry Lecroq LITIS, Faculté des Sciences et des Techniques, Université de Rouen, 76821
More informationIndexing and Searching
Indexing and Searching Introduction How to retrieval information? A simple alternative is to search the whole text sequentially Another option is to build data structures over the text (called indices)
More informationVolume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com
More informationApplication of the BWT Method to Solve the Exact String Matching Problem
Application of the BWT Method to Solve the Exact String Matching Problem T. W. Chen and R. C. T. Lee Department of Computer Science National Tsing Hua University, Hsinchu, Taiwan chen81052084@gmail.com
More informationA Practical Distributed String Matching Algorithm Architecture and Implementation
A Practical Distributed String Matching Algorithm Architecture and Implementation Bi Kun, Gu Nai-jie, Tu Kun, Liu Xiao-hu, and Liu Gang International Science Index, Computer and Information Engineering
More informationString Matching Algorithms
String Matching Algorithms Georgy Gimel farb (with basic contributions from M. J. Dinneen, Wikipedia, and web materials by Ch. Charras and Thierry Lecroq, Russ Cox, David Eppstein, etc.) COMPSCI 369 Computational
More informationWAVEFRONT LONGEST COMMON SUBSEQUENCE ALGORITHM ON MULTICORE AND GPGPU PLATFORM BILAL MAHMOUD ISSA SHEHABAT UNIVERSITI SAINS MALAYSIA
WAVEFRONT LONGEST COMMON SUBSEQUENCE ALGORITHM ON MULTICORE AND GPGPU PLATFORM BILAL MAHMOUD ISSA SHEHABAT UNIVERSITI SAINS MALAYSIA 2010 WAVE-FRONT LONGEST COMMON SUBSEQUENCE ALGORITHM ON MULTICORE AND
More informationMax-Shift BM and Max-Shift Horspool: Practical Fast Exact String Matching Algorithms
Regular Paper Max-Shift BM and Max-Shift Horspool: Practical Fast Exact String Matching Algorithms Mohammed Sahli 1,a) Tetsuo Shibuya 2 Received: September 8, 2011, Accepted: January 13, 2012 Abstract:
More informationOn Performance Evaluation of BM-Based String Matching Algorithms in Distributed Computing Environment
International Journal of Future Computer and Communication, Vol. 6, No. 1, March 2017 On Performance Evaluation of BM-Based String Matching Algorithms in Distributed Computing Environment Kunaphas Kongkitimanon
More informationPractical and Optimal String Matching
Practical and Optimal String Matching Kimmo Fredriksson Department of Computer Science, University of Joensuu, Finland Szymon Grabowski Technical University of Łódź, Computer Engineering Department SPIRE
More informationarxiv: v1 [cs.ds] 3 Jul 2017
Speeding Up String Matching by Weak Factor Recognition Domenico Cantone, Simone Faro, and Arianna Pavone arxiv:1707.00469v1 [cs.ds] 3 Jul 2017 Università di Catania, Viale A. Doria 6, 95125 Catania, Italy
More informationAccelerating Boyer Moore Searches on Binary Texts
Accelerating Boyer Moore Searches on Binary Texts Shmuel T. Klein Miri Kopel Ben-Nissan Department of Computer Science, Bar Ilan University, Ramat-Gan 52900, Israel Tel: (972 3) 531 8865 Email: {tomi,kopel}@cs.biu.ac.il
More informationInexact Pattern Matching Algorithms via Automata 1
Inexact Pattern Matching Algorithms via Automata 1 1. Introduction Chung W. Ng BioChem 218 March 19, 2007 Pattern matching occurs in various applications, ranging from simple text searching in word processors
More informationSurvey of Exact String Matching Algorithm for Detecting Patterns in Protein Sequence
Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 8 (2017) pp. 2707-2720 Research India Publications http://www.ripublication.com Survey of Exact String Matching Algorithm
More informationImproving Practical Exact String Matching
Improving Practical Exact String Matching Branislav Ďurian Jan Holub Hannu Peltola Jorma Tarhio Abstract We present improved variations of the BNDM algorithm for exact string matching. At each alignment
More informationA Unifying Look at the Apostolico Giancarlo String-Matching Algorithm
A Unifying Look at the Apostolico Giancarlo String-Matching Algorithm MAXIME CROCHEMORE, IGM (Institut Gaspard-Monge), Université de Marne-la-Vallée, 77454 Marne-la-Vallée CEDEX 2, France. E-mail: mac@univ-mlv.fr,
More informationEfficient String Matching Using Bit Parallelism
Efficient String Matching Using Bit Parallelism Kapil Kumar Soni, Rohit Vyas, Dr. Vivek Sharma TIT College, Bhopal, Madhya Pradesh, India Abstract: Bit parallelism is an inherent property of computer to
More informationA Survey of String Matching Algorithms
RESEARCH ARTICLE OPEN ACCESS A Survey of String Matching Algorithms Koloud Al-Khamaiseh*, Shadi ALShagarin** *(Department of Communication and Electronics and Computer Engineering, Tafila Technical University,
More informationEfficient Algorithm for Two Dimensional Pattern Matching Problem (Square Pattern)
Efficient Algorithm for Two Dimensional Pattern Matching Problem (Square Pattern) Hussein Abu-Mansour 1, Jaber Alwidian 1, Wael Hadi 2 1 ITC department Arab Open University Riyadh- Saudi Arabia 2 CIS department
More informationText Algorithms (6EAP) Lecture 3: Exact paaern matching II
Text Algorithms (6EA) Lecture 3: Exact paaern matching II Jaak Vilo 2012 fall Jaak Vilo MTAT.03.190 Text Algorithms 1 2 Algorithms Brute force O(nm) Knuth- Morris- raa O(n) Karp- Rabin hir- OR, hir- AND
More informationAn efficient matching algorithm for encoded DNA sequences and binary strings
An efficient matching algorithm for encoded DNA sequences and binary strings Simone Faro 1 and Thierry Lecroq 2 1 Dipartimento di Matematica e Informatica, Università di Catania, Italy 2 University of
More informationStudy of Selected Shifting based String Matching Algorithms
Study of Selected Shifting based String Matching Algorithms G.L. Prajapati, PhD Dept. of Comp. Engg. IET-Devi Ahilya University, Indore Mohd. Sharique Dept. of Comp. Engg. IET-Devi Ahilya University, Indore
More informationString Matching Algorithms
String Matching Algorithms 1. Naïve String Matching The naïve approach simply test all the possible placement of Pattern P[1.. m] relative to text T[1.. n]. Specifically, we try shift s = 0, 1,..., n -
More informationA NEW STRING MATCHING ALGORITHM
Intern. J. Computer Math., Vol. 80, No. 7, July 2003, pp. 825 834 A NEW STRING MATCHING ALGORITHM MUSTAQ AHMED a, *, M. KAYKOBAD a,y and REZAUL ALAM CHOWDHURY b,z a Department of Computer Science and Engineering,
More informationApplication of String Matching in Auto Grading System
Application of String Matching in Auto Grading System Akbar Suryowibowo Syam - 13511048 Computer Science / Informatics Engineering Major School of Electrical Engineering & Informatics Bandung Institute
More informationA New Multiple-Pattern Matching Algorithm for the Network Intrusion Detection System
IACSIT International Journal of Engineering and Technology, Vol. 8, No. 2, April 2016 A New Multiple-Pattern Matching Algorithm for the Network Intrusion Detection System Nguyen Le Dang, Dac-Nhuong Le,
More informationString Searching Algorithm Implementation-Performance Study with Two Cluster Configuration
International Journal of Computer Science & Communication Vol. 1, No. 2, July-December 2010, pp. 271-275 String Searching Algorithm Implementation-Performance Study with Two Cluster Configuration Prasad
More informationFast Hybrid String Matching Algorithms
Fast Hybrid String Matching Algorithms Jamuna Bhandari 1 and Anil Kumar 2 1 Dept. of CSE, Manipal University Jaipur, INDIA 2 Dept of CSE, Manipal University Jaipur, INDIA ABSTRACT Various Hybrid algorithms
More informationBoyer-Moore strategy to efficient approximate string matching
Boyer-Moore strategy to efficient approximate string matching Nadia El Mabrouk, Maxime Crochemore To cite this version: Nadia El Mabrouk, Maxime Crochemore. Boyer-Moore strategy to efficient approximate
More informationText Algorithms (6EAP) Lecture 3: Exact pa;ern matching II
Text Algorithms (6EAP) Lecture 3: Exact pa;ern matching II Jaak Vilo 2010 fall Jaak Vilo MTAT.03.190 Text Algorithms 1 Find occurrences in text P S 2 Algorithms Brute force O(nm) Knuth- Morris- Pra; O(n)
More informationGRASPm: an efficient algorithm for exact pattern-matching in genomic sequences
Int. J. Bioinformatics Research and Applications, Vol. GRASPm: an efficient algorithm for exact pattern-matching in genomic sequences Sérgio Deusdado* Centre for Mountain Research (CIMO), Polytechnic Institute
More informationAlgorithms and Data Structures
Algorithms and Data Structures Charles A. Wuethrich Bauhaus-University Weimar - CogVis/MMC May 11, 2017 Algorithms and Data Structures String searching algorithm 1/29 String searching algorithm Introduction
More informationKnuth-Morris-Pratt. Kranthi Kumar Mandumula Indiana State University Terre Haute IN, USA. December 16, 2011
Kranthi Kumar Mandumula Indiana State University Terre Haute IN, USA December 16, 2011 Abstract KMP is a string searching algorithm. The problem is to find the occurrence of P in S, where S is the given
More informationA Multipattern Matching Algorithm Using Sampling and Bit Index
A Multipattern Matching Algorithm Using Sampling and Bit Index Jinhui Chen, Zhongfu Ye Department of Automation University of Science and Technology of China Hefei, P.R.China jeffcjh@mail.ustc.edu.cn,
More informationInternational Journal of Computer Engineering and Applications, Volume XI, Issue XI, Nov. 17, ISSN
International Journal of Computer Engineering and Applications, Volume XI, Issue XI, Nov. 17, www.ijcea.com ISSN 2321-3469 DNA PATTERN MATCHING - A COMPARATIVE STUDY OF THREE PATTERN MATCHING ALGORITHMS
More informationThis article was published in an Elsevier journal. The attached copy is furnished to the author for non-commercial research and education use, including for instruction at the author s institution, sharing
More informationTuning BNDM with q-grams
Tuning BNDM with q-grams Branislav Ďurian Jan Holub Hannu Peltola Jorma Tarhio Abstract We develop bit-parallel algorithms for exact string matching. Our algorithms are variations of the BNDM and Shift-Or
More informationComputing Patterns in Strings I. Specific, Generic, Intrinsic
Outline : Specific, Generic, Intrinsic 1,2,3 1 Algorithms Research Group, Department of Computing & Software McMaster University, Hamilton, Ontario, Canada email: smyth@mcmaster.ca 2 Digital Ecosystems
More informationFuzzy Optimization of the Constructive Parameters of Laboratory Fermenters
This article was downloaded by: [Bulgarian Academy of Sciences] On: 07 April 2015, At: 00:04 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered
More informationKeywords Pattern Matching Algorithms, Pattern Matching, DNA and Protein Sequences, comparison per character
Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Index Based Multiple
More informationString matching algorithms تقديم الطالب: سليمان ضاهر اشراف المدرس: علي جنيدي
String matching algorithms تقديم الطالب: سليمان ضاهر اشراف المدرس: علي جنيدي للعام الدراسي: 2017/2016 The Introduction The introduction to information theory is quite simple. The invention of writing occurred
More informationAutomatic Export of PubMed Citations to EndNote Sue London a ; Osman Gurdal a ; Carole Gall a a
This article was downloaded by: [B-on Consortium - 2007] On: 20 July 2010 Access details: Access Details: [subscription number 919435511] Publisher Routledge Informa Ltd Registered in England and Wales
More informationFast Exact String Matching Algorithms
Fast Exact String Matching Algorithms Thierry Lecroq Thierry.Lecroq@univ-rouen.fr Laboratoire d Informatique, Traitement de l Information, Systèmes. Part of this work has been done with Maxime Crochemore
More informationSuffix Vector: A Space-Efficient Suffix Tree Representation
Lecture Notes in Computer Science 1 Suffix Vector: A Space-Efficient Suffix Tree Representation Krisztián Monostori 1, Arkady Zaslavsky 1, and István Vajk 2 1 School of Computer Science and Software Engineering,
More informationClone code detector using Boyer Moore string search algorithm integrated with ontology editor
EUROPEAN ACADEMIC RESEARCH Vol. IV, Issue 2/ May 2016 ISSN 2286-4822 www.euacademic.org Impact Factor: 3.4546 (UIF) DRJI Value: 5.9 (B+) Clone code detector using Boyer Moore string search algorithm integrated
More informationBit-Reduced Automaton Inspection for Cloud Security
Bit-Reduced Automaton Inspection for Cloud Security Haiqiang Wang l Kuo-Kun Tseng l* Shu-Chuan Chu 2 John F. Roddick 2 Dachao Li 1 l Department of Computer Science and Technology, Harbin Institute of Technology,
More informationA New String Matching Algorithm Based on Logical Indexing
The 5th International Conference on Electrical Engineering and Informatics 2015 August 10-11, 2015, Bali, Indonesia A New String Matching Algorithm Based on Logical Indexing Daniar Heri Kurniawan Department
More informationTUNING BG MULTI-PATTERN STRING MATCHING ALGORITHM WITH UNROLLING Q-GRAMS AND HASH
Computer Modelling and New Technologies, 2013, Vol.17, No. 4, 58-65 Transport and Telecommunication Institute, Lomonosov 1, LV-1019, Riga, Latvia TUNING BG MULTI-PATTERN STRING MATCHING ALGORITHM WITH
More informationUniversity of Huddersfield Repository
University of Huddersfield Repository Klaib, Ahmad and Osborne, Hugh OE Matching for Searching Biological Sequences Original Citation Klaib, Ahmad and Osborne, Hugh (2009) OE Matching for Searching Biological
More informationEnhanced Two Sliding Windows Algorithm For Pattern Matching (ETSW) University of Jordan, Amman Jordan
Enhanced Two Sliding Windows Algorithm For Matching (ETSW) Mariam Itriq 1, Amjad Hudaib 2, Aseel Al-Anani 2, Rola Al-Khalid 2, Dima Suleiman 1 1. Department of Business Information Systems, King Abdullah
More informationDepartment of Geography, University of North Texas, Denton, TX, USA. Online publication date: 01 April 2010 PLEASE SCROLL DOWN FOR ARTICLE
This article was downloaded by: [Dong, Pinliang] On: 1 April 2010 Access details: Access Details: [subscription number 920717327] Publisher Taylor & Francis Informa Ltd Registered in England and Wales
More informationApplied Databases. Sebastian Maneth. Lecture 14 Indexed String Search, Suffix Trees. University of Edinburgh - March 9th, 2017
Applied Databases Lecture 14 Indexed String Search, Suffix Trees Sebastian Maneth University of Edinburgh - March 9th, 2017 2 Recap: Morris-Pratt (1970) Given Pattern P, Text T, find all occurrences of
More informationThe Exact Online String Matching Problem: A Review of the Most Recent Results
13 The Exact Online String Matching Problem: A Review of the Most Recent Results SIMONE FARO, Università di Catania THIERRY LECROQ, Université derouen This article addresses the online exact string matching
More informationAutomaton-based Sublinear Keyword Pattern Matching. SoC Software. Loek Cleophas, Bruce W. Watson, Gerard Zwaan
SPIRE 2004 Padova, Italy October 5 8, 2004 Automaton-based Sublinear Keyword Pattern Matching Loek Cleophas, Bruce W. Watson, Gerard Zwaan SoC Software Construction Software Construction Group Department
More informationHigh Performance Pattern Matching Algorithm for Network Security
IJCSNS International Journal of Computer Science and Network Security, VOL.6 No., October 6 83 High Performance Pattern Matching Algorithm for Network Security Yang Wang and Hidetsune Kobayashi Graduate
More informationEfficient Implementation of Suffix Trees
SOFTWARE PRACTICE AND EXPERIENCE, VOL. 25(2), 129 141 (FEBRUARY 1995) Efficient Implementation of Suffix Trees ARNE ANDERSSON AND STEFAN NILSSON Department of Computer Science, Lund University, Box 118,
More informationString Matching. Pedro Ribeiro 2016/2017 DCC/FCUP. Pedro Ribeiro (DCC/FCUP) String Matching 2016/ / 42
String Matching Pedro Ribeiro DCC/FCUP 2016/2017 Pedro Ribeiro (DCC/FCUP) String Matching 2016/2017 1 / 42 On this lecture The String Matching Problem Naive Algorithm Deterministic Finite Automata Knuth-Morris-Pratt
More informationMultithreaded Sliding Window Approach to Improve Exact Pattern Matching Algorithms
Multithreaded Sliding Window Approach to Improve Exact Pattern Matching Algorithms Ala a Al-shdaifat Computer Information System Department The University of Jordan Amman, Jordan Bassam Hammo Computer
More informationString Matching using Inverted Lists
nternational Journal of Computer nformation Engineering String Matching using nverted Lists Chouvalit Khancome, Veera Boonjing nternational Science ndex, Computer nformation Engineering aset.org/publication/7400
More informationLecture 7 February 26, 2010
6.85: Advanced Data Structures Spring Prof. Andre Schulz Lecture 7 February 6, Scribe: Mark Chen Overview In this lecture, we consider the string matching problem - finding all places in a text where some
More informationEfficient validation and construction of border arrays
Efficient validation and construction of border arrays Jean-Pierre Duval Thierry Lecroq Arnaud Lefebvre LITIS, University of Rouen, France, {Jean-Pierre.Duval,Thierry.Lecroq,Arnaud.Lefebvre}@univ-rouen.fr
More informationSanil Shanker KP a, Elizabeth Sherly b & Jim Austin c a Department of Computer Science, University of Kerala, Kerala,
This article was downloaded by: [SANIL SHANKER KP] On: 20 September 2011, At: 22:08 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office:
More informationCombined string searching algorithm based on knuth-morris- pratt and boyer-moore algorithms
IOP Conference Series: Materials Science and Engineering PAPER OPEN ACCESS Combined string searching algorithm based on knuth-morris- pratt and boyer-moore algorithms To cite this article: R Yu Tsarev
More informationString matching algorithms
String matching algorithms Deliverables String Basics Naïve String matching Algorithm Boyer Moore Algorithm Rabin-Karp Algorithm Knuth-Morris- Pratt Algorithm Copyright @ gdeepak.com 2 String Basics A
More informationFast Parallel String Prex-Matching. Dany Breslauer. April 6, Abstract. n log m -processor CRCW-PRAM algorithm for the
Fast Parallel String Prex-Matching Dany Breslauer April 6, 1995 Abstract An O(log logm) time n log m -processor CRCW-PRAM algorithm for the string prex-matching problem over general alphabets is presented.
More informationA New Platform NIDS Based On WEMA
I.J. Information Technology and Computer Science, 2015, 06, 52-58 Published Online May 2015 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijitcs.2015.06.07 A New Platform NIDS Based On WEMA Adnan A.
More informationGiven a text file, or several text files, how do we search for a query string?
CS 840 Fall 2016 Text Search and Succinct Data Structures: Unit 4 Given a text file, or several text files, how do we search for a query string? Note the query/pattern is not of fixed length, unlike key
More informationOptimization of Boyer-Moore-Horspool-Sunday Algorithm
Optimization of Boyer-Moore-Horspool-Sunday Algorithm Rionaldi Chandraseta - 13515077 Program Studi Teknik Informatika Sekolah Teknik Elektro dan Informatika, Institut Teknologi Bandung Bandung, Indonesia
More informationBit-Parallel LCS-length Computation Revisited
Bit-Parallel LCS-length Computation Revisited Heikki Hyyrö Abstract The longest common subsequence (LCS) is a classic and well-studied measure of similarity between two strings A and B. This problem has
More informationCSCI S-Q Lecture #13 String Searching 8/3/98
CSCI S-Q Lecture #13 String Searching 8/3/98 Administrivia Final Exam - Wednesday 8/12, 6:15pm, SC102B Room for class next Monday Graduate Paper due Friday Tonight Precomputation Brute force string searching
More informationSWIFT -A Performance Accelerated Optimized String Matching Algorithm for Nvidia GPUs
2016 15th International Symposium on Parallel and Distributed Computing SWIFT -A Performance Accelerated Optimized String Matching Algorithm for Nvidia GPUs Sourabh S. Shenoy, Supriya Nayak U. and B. Neelima
More informationAn introduction to suffix trees and indexing
An introduction to suffix trees and indexing Tomáš Flouri Solon P. Pissis Heidelberg Institute for Theoretical Studies December 3, 2012 1 Introduction Introduction 2 Basic Definitions Graph theory Alphabet
More informationImportance of String Matching in Real World Problems
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 3 Issue 6 June, 2014 Page No. 6371-6375 Importance of String Matching in Real World Problems Kapil Kumar Soni,
More informationAn Index Based Sequential Multiple Pattern Matching Algorithm Using Least Count
2011 International Conference on Life Science and Technology IPCBEE vol.3 (2011) (2011) IACSIT Press, Singapore An Index Based Sequential Multiple Pattern Matching Algorithm Using Least Count Raju Bhukya
More informationProject Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio
Project Proposal ECE 526 Spring 2006 Modified Data Structure of Aho-Corasick Benfano Soewito, Ed Flanigan and John Pangrazio 1. Introduction The internet becomes the most important tool in this decade
More informationPAPER Constructing the Suffix Tree of a Tree with a Large Alphabet
IEICE TRANS. FUNDAMENTALS, VOL.E8??, NO. JANUARY 999 PAPER Constructing the Suffix Tree of a Tree with a Large Alphabet Tetsuo SHIBUYA, SUMMARY The problem of constructing the suffix tree of a tree is
More informationAlgorithms for Weighted Matching
Algorithms for Weighted Matching Leena Salmela and Jorma Tarhio Helsinki University of Technology {lsalmela,tarhio}@cs.hut.fi Abstract. We consider the matching of weighted patterns against an unweighted
More informationAn analysis of the Intelligent Predictive String Search Algorithm: A Probabilistic Approach
I.J. Information Technology and Computer Science, 2017, 2, 66-75 Published Online February 2017 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijitcs.2017.02.08 An analysis of the Intelligent Predictive
More informationFast Searching in Biological Sequences Using Multiple Hash Functions
Fast Searching in Biological Sequences Using Multiple Hash Functions Simone Faro Dip. di Matematica e Informatica, Università di Catania Viale A.Doria n.6, 95125 Catania, Italy Email: faro@dmi.unict.it
More informationA string is a sequence of characters. In the field of computer science, we use strings more often as we use numbers.
STRING ALGORITHMS : Introduction A string is a sequence of characters. In the field of computer science, we use strings more often as we use numbers. There are many functions those can be applied on strings.
More informationGENERATING SUPPLEMENTARY INDEX RECORDS USING MORPHOLOGICAL ANALYSIS FOR HIGH-SPEED PARTIAL MATCHING ABSTRACT
GENERATING SUPPLEMENTARY INDEX RECORDS USING MORPHOLOGICAL ANALYSIS FOR HIGH-SPEED PARTIAL MATCHING Masahiro Oku NTT Affiliated Business Headquarters 20-2 Nishi-shinjuku 3-Chome Shinjuku-ku, Tokyo 163-1419
More informationMultiple Skip Multiple Pattern Matching Algorithm (MSMPMA)
Multiple Skip Multiple Pattern Matching (MSMPMA) Ziad A.A. Alqadi 1, Musbah Aqel 2, & Ibrahiem M. M. El Emary 3 1 Faculty Engineering, Al Balqa Applied University, Amman, Jordan E-mail:ntalia@yahoo.com
More informationMulti-Pattern String Matching with Very Large Pattern Sets
Multi-Pattern String Matching with Very Large Pattern Sets Leena Salmela L. Salmela, J. Tarhio and J. Kytöjoki: Multi-pattern string matching with q-grams. ACM Journal of Experimental Algorithmics, Volume
More informationCOMPARISON AND IMPROVEMENT OF STRIN MATCHING ALGORITHMS FOR JAPANESE TE. Author(s) YOON, Jeehee; TAKAGI, Toshihisa; US
Title COMPARISON AND IMPROVEMENT OF STRIN MATCHING ALGORITHMS FOR JAPANESE TE Author(s) YOON, Jeehee; TAKAGI, Toshihisa; US Citation 数理解析研究所講究録 (1986), 586: 18-34 Issue Date 1986-03 URL http://hdl.handle.net/2433/99393
More informationA Two-Hashing Table Multiple String Pattern Matching Algorithm
2013 10th International Conference on Information Technology: New Generations A Two-Hashing Table Multiple String Pattern Matching Algorithm Chouvalit Khancome Department of Computer Science, Faculty of
More informationData Structures and Algorithms. Course slides: String Matching, Algorithms growth evaluation
Data Structures and Algorithms Course slides: String Matching, Algorithms growth evaluation String Matching Basic Idea: Given a pattern string P, of length M Given a text string, A, of length N Do all
More informationCS/COE 1501
CS/COE 1501 www.cs.pitt.edu/~nlf4/cs1501/ String Pattern Matching General idea Have a pattern string p of length m Have a text string t of length n Can we find an index i of string t such that each of
More informationAGREP A FAST APPROXIMATE PATTERN-MATCHING TOOL. (Preliminary version) Sun Wu and Udi Manber 1
AGREP A FAST APPROXIMATE PATTERN-MATCHING TOOL (Preliminary version) Sun Wu and Udi Manber 1 Department of Computer Science University of Arizona Tucson, AZ 85721 (sw udi)@cs.arizona.edu ABSTRACT Searching
More informationInformation Processing Letters Vol. 30, No. 2, pp , January Acad. Andrei Ershov, ed. Partial Evaluation of Pattern Matching in Strings
Information Processing Letters Vol. 30, No. 2, pp. 79-86, January 1989 Acad. Andrei Ershov, ed. Partial Evaluation of Pattern Matching in Strings Charles Consel Olivier Danvy LITP DIKU { Computer Science
More informationTo cite this article: Raul Rojas (2014) Konrad Zuse's Proposal for a Cipher Machine, Cryptologia, 38:4, , DOI: /
This article was downloaded by: [FU Berlin] On: 26 February 2015, At: 03:28 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer
More information17 dicembre Luca Bortolussi SUFFIX TREES. From exact to approximate string matching.
17 dicembre 2003 Luca Bortolussi SUFFIX TREES From exact to approximate string matching. An introduction to string matching String matching is an important branch of algorithmica, and it has applications
More informationBit-parallel (δ, γ)-matching and Suffix Automata
Bit-parallel (δ, γ)-matching and Suffix Automata Maxime Crochemore a,b,1, Costas S. Iliopoulos b, Gonzalo Navarro c,2,3, Yoan J. Pinzon b,d,2, and Alejandro Salinger c a Institut Gaspard-Monge, Université
More informationAlgorithms and Data Structures Lesson 3
Algorithms and Data Structures Lesson 3 Michael Schwarzkopf https://www.uni weimar.de/de/medien/professuren/medieninformatik/grafische datenverarbeitung Bauhaus University Weimar May 30, 2018 Overview...of
More informationImproved Parallel Rabin-Karp Algorithm Using Compute Unified Device Architecture
Improved Parallel Rabin-Karp Algorithm Using Compute Unified Device Architecture Parth Shah 1 and Rachana Oza 2 1 Chhotubhai Gopalbhai Patel Institute of Technology, Bardoli, India parthpunita@yahoo.in
More informationString Processing Workshop
String Processing Workshop String Processing Overview What is string processing? String processing refers to any algorithm that works with data stored in strings. We will cover two vital areas in string
More information