Adjacency Matrix Based Full-Text Indexing Models

Size: px
Start display at page:

Download "Adjacency Matrix Based Full-Text Indexing Models"

Transcription

1 /2002/13(10) Journal of Software Vol.13, No.10 Ajacency Matrix Base Full-Text Inexing Moels ZHOU Shui-geng 1, HU Yun-fa 2, GUAN Ji-hong 3 1 (Department of Computer Science an Engineering, Fuan University, Shanghai , China); 2 (Department of Computer an Information Technology, Fuan University, Shanghai , China); 3 (School of Computer Science, Wuhan University, Wuhan , China) sgzhou@fuan.eu.cn Receive January 18, 2002; accepte July 1, 2002 Abstract: With the rapi growth of online text information an user accesses, query-processing efficiency has become the major bottleneck of information retrieval (IR) systems. This paper proposes two new full-text inexing moels to improve query-processing efficiency of IR systems. By using irecte graph to represent text string, the ajacency matrix of text string is introuce. Two approaches are propose to implement the ajacency matrix of text string, which leas to two new full-text inexing moels, i.e., ajacency matrix base inverte file an ajacency matrix base PAT array. Query algorithms for the new moels are evelope an performance comparisons between the new moels an the traitional moels are carrie out. Experiments over real-worl text collections are conucte to valiate the effectiveness an efficiency of the new moels. The new moels can improve query-processing efficiency consierably at the cost of much less amount of extra storage overhea compare to the size of original text atabase, so are suitable for applications of large-scale text atabases. Key wors: information retrieval; full-text inexing; inverte file; PAT array; ajacency matrix; moel The rapi evelopment an wie application of Worl Wie Web (WWW) leas to explosively increasing of online text information. More an more people surf the Web for information. Such a situation poses new challenges to researchers an scientists in the information retrieval (IR) fiel. That is, how to effectively an efficiently organize an manage massive text information on Web an how to search what the users nee on Web accurately an completely [1]. Currently, the popular solution is Web search engine, which is essentially an IR system where text-inexing moel constitutes its core technique [2]. As the amount of text information exponentially increases an the number of user queries rapily grows, query-processing efficiency becomes the major bottleneck of Web search engines. Take AltaVista [3], one of the largest Web search engine systems uner running, for example. In 1998, the overall AltaVista system was running on 20 multi-processor machines, all of them have more than 130Gb of RAM an over 500Gb of isk space. Only query processing consumes more than 70% of these resources [1]. To a great extent, efficiency of information retrieval systems epens on the unerlying text inexing moels. That is the reason why efficient text-inexing techniques have been the hot research topic in IR fiel [4]. Up to ate, Supporte by the National Natural Science Founation of China uner Grant No ( ); the Natural Science Founation of Hubei Province of China uner Grant No.2001ABB050 ( ) ZHOU Shui-geng was born in He is an associate professor at the Department of Computer Science an Engineering, Fuan University. His research interests are in atabase, ata warehousing, ata mining an information retrieval. HU Yun-fa was born in He is a professor an octoral supervisor of the Department of Computer an Information Technology, Fuan University. His current research areas are atabase, knowlege base an igital library system. GUAN Ji-hong was born in She is an associate professor at the School of Computer Science, Wuhan University. Her current research areas are spatial atabase, spatial ata mining an geographic information systems.

2 1934 Journal of Software 2002,13(10) several text-inexing moels have been evelope, in which wiely accepte are inverte files [5,6], signature files [1,7], an PAT (Patricia tree) array [1,8,9]. Owing to their avantage of fast response an easy implementation, inverte file an its variations have been use in most text atabase systems an search engines [2]. Generally, text-inexing moels can be classifie into two categories, i.e., character-base inexing an wor (or keywor)-base inexing. The first step towar wor-base inexing is to extract meaningful wors from text string, which is not an easy task that may require eep unerstaning of the context, especially for Chinese text. Conversely, character-base inexing methos inex all characters appearing in text atabase; so character-base inexing is also referre to as full-text inexing. Full-text inexing IR systems main avantage is that they avoi the complicate an expensive process of semantic inexing. From the en-user point of view, full text searching of on-line ocuments is appealing because a vali query is just any wor or sentence of the ocument. This paper intens to evelop new full-text inexing moels of higher efficiency. By seeing a text atabase as a irecte graph an combining the concept of ajacency matrix of irecte graph with the structures of traitional inverte file an PAT array, we propose two new full-text inexing moels, which we call ajacency matrix base inverte file an ajacency matrix base PAT array respectively. The new moels can improve consierably query-processing efficiency of IR systems at the cost of much less amount of extra space overhea compare to the size of original text atabase, thus are suitable for applications of large-scale text atabases an can be use as an effective way to upate the IR systems uner running. 1 Preliminaries Let Σ be an alphabet: a finite, orere set of characters. For an arbitrary character l in Σ, there is an associate natural integer i, which inicates the position of l in Σ, an l is also enote by l i. In the case of Chinese text atabases, Σ correspons to the Stanar Chinese Characters Base, such as GB , where the region-position coe of each character assigne in GB may be use as the character s position ientifier in Σ (The alternative may be the Unicoe system). For English text collections, Σ is a set of letters, igits, punctuation marks an other symbols that may occur in English text, an the ASCII coe of each character in Σ is taken naturally as its position ientifier in Σ. A text string or simply string over Σ is a finite sequence of characters from Σ. Specifically, a string has no characters at all is calle the empty string an is enote by ε. To avoi confusion, we generally use u, v, w, x, y, z to enote strings. The length of a string is its length as a sequence. We enote the length of a string w by w. Alternatively a string w can be consiere as a function w:{1,, w } Σ; the value of w(j), where 1 j w, is the symbol in the jth position of w. To istinguish ientical symbols at ifferent positions in a string, we refer to them as ifferent occurrences of the symbol. That is, the symbol l Σ occurs in the jth position of the string w if w(j)=l. Two strings over the same alphabet can be combine to form a thir by the operation of concatenation. The concatenation of strings x an y, written xy, is the string x followe by the string y; formally, w=xy if an only if w = x + y, w(j)=x(j) for j=1,, x, an w( x +j)=y(j) for j=1,, y. A string v is a substring of a string w if an only if there are strings x an y such that w=xvy. Both x an y coul be ε, so every string is a substring of itself. If w=xv for some x, then v is a suffix of w; if w=vy for some y, then v is a prefix of w. If x is a substring of string w, then we enote P(w, x) the set of positions of x occurring in w, or the occurrences of x in w. For a string w of limite length, let us pa artificially at its right en with an infinite number of null (or any special characters that is not inclue in Σ). Then, a semi-infinite string (abbreviate to sistring) of string w is the sequence of characters starting at a certain position within w an continuing to the right. Obviously, two sistrings starting at ifferent positions are always ifferent. For the simplification of escription, meanwhile to guarantee that no one sistring be a prefix of another, it is enough to en the string w with a unique en-of-string symbol that oes not appear in Σ. Thus, sistrings can be unambiguously ientifie by their starting position. That is to say, P(w, x) is a

3 : 1935 singleton if x is a sistring of string w. The result of a lexicographic comparison between two sistrings is base on the text of the sistrings, instea of their positions. Definition 1. A text atabase TB is a collection of text ocuments, each of which is a string over Σ. Neglecting the bounary between any two ajacent ocuments, a text atabase can be seen as a long string, whose length is the sum of the lengths of all ocuments in the text atabase. We enote TB the length of text atabase TB. From the point of sistring s view, text atabase TB correspons to a sequence of sistrings. In what follows, we treat a text atabase as a string when iscussing inverte file, an see it in the point of sistring s view while ealing with PAT array. Definition 2. Suppose V is the set of all unique characters appearing in text atabase TB, V={t i i=1 V } Σ. For character t i in V, it occurs at ifferent positions in the string of text atabase TB. Let p i be the set of positions that t i occurs in TB, enote p i ={p i1, p i2,, p i pi } where p i is the occurrence number of t i in TB, p ij is the jth position where t i occurs in TB (counting from the starting point of TB string). The full-text inexing inverte file can be written formally as follows. { t i, p i (i=1 V )}. (1) Practically, inexe terms an the corresponing occurrences are store separately, i.e., splitting (1) into two parts: { t i, pt i (i=1 V )}, (2) {p i (i=1 V )}. (3) Above, pt i is a pointer to p i. In (2), all inexe terms are sorte lexicographically an store with corresponing pointers sequentially in the inexe file, while occurrences in (3) are store sequentially in the posting file. Definition 3. Suppose V is the set of all unique characters appearing in text atabase TB, V={t i i=1 V } Σ. Let TB be enote by a string w=c 1 c 2 c n $(n= TB ) where c i is the ith character in w an c i V, $ is the assume unique en-of-string symbol that oes not appear in. String w correspons to a sequence of sistrings: Sis= sis 1, sis 2,, sis n (4) where sis i = c i c i+1 c n $ inicates the ith sistring of w. Sorting the sistrings in (4) lexicographically, we get a new sequence of sistrings: PSis= psis 1, psis 2,, psis n. (5) Obviously, for each sistring psis i in PSis, there is a sistring sis j in Sis that is equal to psis i while i an j are not necessarily the same. (5) is the full-text inexing PAT array of text atabase TB. In practice, the positions of sistrings, instea of the sistrings themselves, are use in (5). Definition 4. For a text atabase TB with a set V of all unique characters appearing in TB, there exists a irecte graph TBG= V g, E g where V g is the set of vertices, each of which correspons to a unique character appearing in TB, i.e., V g =V; E g is the set of irecte eges, each of which correspons to a pair of ajacent characters appearing in TB an its irection points from the first character to the secon one. We call TBG the irecte graph of text atabase TB. Definition 5. As efine in graph theory, a weighte irecte graph is a irecte graph in which each ege has an associate value. In the context of this paper, the value associate with each irecte ege is the position of the character corresponing to the irecte ege s source vertex. Consiering that some ajacent-character pairs occur at ifferent positions in the text atabase, that is, in the irecte graph of text atabase there exists the case of multiple irecte eges having similar starting an en vertices. We compact all irecte eges with similar starting an en vertices to one irecte ege an unite all values associate with these eges to a set of values. We call the result graph weighte irecte graph of text atabase. Formally, text atabase TB correspons to a weighte irecte graph WTBG= V w, E w, L w where V w =V is the set of vertices, E w is the set of irecte eges, an L w is the set of values associate with all irecte eges in E w. Denote L w (l i, l j ) the set of values associate with irecte ege l i l j, then we have

4 1936 Journal of Software 2002,13(10) Example 1. L w (l i, l j ) = P(TB, l i l j ), (6) L w ={L w (l i, l j ) l i l j : l i l j E w }. (7) Given a Chinese text string w:. There are totally eleven unique characters appearing in w, in which nine are Chinese characters, the rest two are punctuation marks. Theses unique characters constitute the vertices set V w ={,,,,,,,,,, }. Fourteen unique ajacent-character pairs constitute the irecte eges set E w ={,,,,,,,,,,,,, }. Values associate with these irecte eges are as follows: L w (, )={1, 7}, L w (, )={13, 19}, L w (, )={25, 31}, L w (, )={6}, L w (, )={12, 18}, L w (, )={24, 30}, L w (, )={2, 8, 14, 20, 26, 32}, L w (, )={3, 15, 27}, L w (, )={9, 21, 33}, L w (, )={5, 17, 29}, L w (, )={11, 23}, L w (, )={35}, L w (, )={4, 16, 28}, L w (, )={10, 22, 34}. Figure 1 illustrates the weighte irecte graph of string w. {6} {24,30} {1, 7} {13, 19} {35} {10,22,34} {11,23} {25, 31} {2,8,14,20,26,32} {5,17,29} {9,21,33} {3,15,27} {4,16,28} {12,18} Fig.1 Weighte irecte graph of string w 2 Ajacency Matrix Base Inverte File an PAT Array Definition 6. For a text atabase, there is a corresponing weighte irecte graph that associates with an ajacency matrix. Thus, a text atabase has an ajacency matrix where each matrix element correspons to a set of values associate with the corresponing irecte ege. Formally, let V Σ be the set of all unique characters appearing in text atabase TB, an enote D the ajacency matrix of TB, we have D=[ ij ], (8.1) ij = (l i,l j )=L w (l i, l j ) = P(TB, l i l j ). (8.2) We term matrix D the ajacency matrix base full text inexing moel of text atabase TB. Example 2. Given the string w in example 1, accoring to efinition 6 an the weighte irecte graph illustrate in Fig.1, it is easy to buil the ajacency matrix inexing moel of w. Let V w ={,,,,,,,,,, }. The text inexing moel D is a matrix of which the elements are as follows: 14 = L w (, )={1, 7}, 24 = L w (, )={13, 19}, 34 = L w (, )={25, 31}, 45 = L w (, )={2, 8, 14, 20, 26, 32}, 56 = L w (, )={9, 21, 33}, 58 = L w (, )={3, 15, 27}, 67 = L w (, )={10, 22, 34}, 7 10 = L w (, )={11, 23}, 7 11 = L w (, )={35}, 89 = L w (, )={4, 16, 28}, 9 10 = L w (, )={5, 17, 29}, 10 1 = L w (, )={6}, 10 2 = L w (, )={12, 18}, 10 3 = L w (, )={24, 30}. The other matrix elements are empty set Φ. Figure 2 illustrates the ajacency

5 : 1937 matrix of string w. D = 101 Fig Ajacency matrix of string w Practically, there are two ways to implement the ajacency matrix base full-text inexing moel: 1) Seeing matrix element ij in D as a set of positions of ajacent-character pair l i l j occurring in TB, an sorting all elements of ij in positional orer, then ij is equivalent to the inverte list of inexe term l i l j. 2) Treating matrix element ij as a set of positions of the sistrings whose prefix is the ajacent-character pair l i l j, an sorting all elements of ij in lexical orer of the corresponing sistrings, then ij is similar to the PAT array of sistrings with a prefix of l i l j. To istinguish this two ifferent implementations, we use D 1 an D 2 to enote the implemente ajacency matrices of approach 1) an approach 2) respectively, an term D 1 the ajacency matrix base full-text inexing inverte file, D 2 the ajacency matrix base full-text inexing PAT array. Formally, D 1 is a kin of reorganization of the traitional inverte file, i.e., transforming the character-base inverte file to a kin of ajacent-character pair base inverte file an organizing all inverte lists in the form of ajacency matrix. Conversely, the traitional inverte file is a kin of compression of D 1, that is, to compact the ajacent-character pair base inverte file to the character-base inverte file. Analogously, D is a kin of ecomposition of the traitional PAT array, i.e., splitting a long PAT array into a collection of short PAT arrays such that each short PAT array correspons a set of sistrings having a similar prefix in the form of ajacent-character pair; on the contrary, the traitional PAT array is a kin of aggregation of D 2. Base on the efinitions above, it is straightforwar to give the algorithms for transformations between D 1 an the traitional inverte file, D 2 an the traitional PAT array as follows. Algorithm 1. Buil D 1 from traitional inverte file. Input: inexing lists of any two inexing terms in text atabase: {l i : p i1,, p in }, {l j :p j1,, p jm }. Output: D 1 =[ ij ]. Process: i = L ( l, l ij w j ). = { k k ( pik p jk : pik = k an p jk = k + 1 while1 k1 n an 1 k Algorithm 2. Buil traitional inverte file from D 1. Input: D 1 =[ ij ]. Output: {l i : p i }. Process: V j = 1 p i = ij. Here, inicates the set union operator. Algorithm 3. Buil D 2 from traitional PAT array. Input: PAT array PSis={psis 1, psis 2,, psis n }. Output: D 2 i j =[ ij ], = L ( l, l ). ij w 1 2 m)}.

6 1938 Journal of Software 2002,13(10) Process: 1) Fin the first sistring psis k matching with l i l j* in PSis by binary search. 2) Starting from psis k, search continuously in PSis the other sistrings matching with l i l j * in the right an left irections till no such sistring can be foun. 3) Assume psis k1 an psis k2 are the two sistrings that match with l i l j* an locate at the furthest right an left positions in PSis respectively, then i = L ( l, l ij w j ) = { 2 psisk1, psisk1 + 1,..., psisk,..., psisk 2 1, psisk }(1 k1 k k2 n). Algorithm 4. Buil traitional PAT array from D 2. Input: D 2 =[ ij ]. Output: PAT array PSis={psis 1, psis 2,, psis n }. Process: PAT = V V i = 1 j = 1 ij. Similarly, is the set union operator. 3 Query Processing 3.1 Query processing base on D 1 Note that in this paper a query is an arbitrary character string an the result is a set of ocuments containing the query string. Query processing base on D 1 is essentially a kin of set operation. Following is a theorem about D 1 base query processing. Theorem 1. Let l 1 l 2... l n be a query string, q(l 1 l 2... l n ) the query result, then 0 2 2k 4 2k 2 1) When n=2k, q( l1 l2... ln) = k 3 2k 2 2k 1 2k ; 0 2 2k 4 2k 2 2k 1 2) When n=2k+1, q( l1 l2... ln) = k 3 2k 2 2k 1 2k 2k 2k + 1. Here, m ij = ( l, l ) m ={(x m) x : x (l i, l j )}. i j Proof. Accoring to Definition 6, it is easy to prove Theorem 1. We omit the etails here. In orer to improve query-processing efficiency, we must reuce the number of isk accesses or set intersections. Some techniques are available as follows. 1) Reucing the number of set intersections. That is equivalent to reucing the number of matrix elements involve in set intersection. We can use the optimizing technique propose in Ref.[4] to cut own matrix elements involve in query processing, which utilizes a irecte graph metho to optimize the search path at the cost of a little more space overhea. 2) Reucing the number of elements in the sets involve in intersection operation. By using of the properties of query string, it is possible to cut own the number of elements in the sets involve in intersection operation. Here consier a special case: substrings occurring repeately in the query string. Suppose l i l i+1 (1 i n 1) occurs twice in the query string l 1 l 2... l n at an interval of k, then the vali elements for intersection operation in (l i, l i+1 ) is (l i, l i+1 ) (l i, l i+1 ) + k. Actually, there exist many of such cases, which we cannot enumerate completely here. 3) Improving the efficiency of set intersection. If the text atabase is very large, the sizes of sets involve in query processing will be large too. Note that intersecting large sets is also time consuming. Given two sets A

7 : 1939 an B that inclue m an n elements respectively (Let m n), a more efficient way to carry out intersection between A an B is 1) sorting the elements of set A an set B in avance; 2) comparing the elements in the larger set with the elements in the smaller one. In such a way, m an n are the lower-boun an upper-boun of element comparisons neee for AB respectively. 3.2 Query processing base on D 2 Algorithm 5. Query processing base on D 2. Input: Full-text inexing moel D 2 an query string l 1 l 2... l n. Output: q(l 1 l 2... l n ). Process: 1) Base on D 2 an l 1 l 2, obtain PAT array (l 1, l 2 )={psis 1, psis 2,, psis m }. 2) Search the first sistring psis k matching with l 1 l 2...l n * in (l 1, l 2 ) by binary search. 3) Starting from psis k, search continuously in (l 1, l 2 ) the other sistrings matching with l 1 l 2...l n * in the right an left irections till no such sistring can be foun. 4) Assume psis k1 an psis k2 are sistrings that match with l 1 l 2...l n * an locate at the furthest right an left positions in (l 1, l 2 ) respectively, then q( l1l2... ln) = { psisk 1, psisk 1+ 1,..., psisk,..., psisk 2 1, psisk 2}( 1 k1 k k2 m). 4 Comparisons with Traitional Inexing Moels Suppose the size of text atabase TB is N (N= TB ), the number of unique characters in the text atabase is m (m= V ), an the length of query is l q. Furthermore, for the simplicity of analysis, we assume all characters occurring in the text atabase at equal probability. Thus the average size of elements in ajacency matrix D 2 is E N/(m*m). 4.1 Traitional inverte file vs. D 1 1) Time cost for inexing builing: O(N). 2) Space overhea a) D 1 : O(N+m*m). b) Inverte file: O (N+m). 3) Search time cost a) D 1 : At most (l q +1)/2 isk accesses an (l q 1)/2 set intersections. b) Inverte file: l q isk accesses an (l q 1) set intersections. Furthermore, consiering that the average size of matrix elements in D 1 is about 1/m of the size of inverte file, which will result in higher query-processing efficiency. Note that 1) isk access is the major factor that influences the value of query efficiency; 2) for large-scale text atabase, we have N>>m because m is limite an N can grow as the text atabase expans. That is to say, the extra space overhea of D 1 is much less than the size of original text atabase. In the case of English text collections, m is not greater than 256. While in Chinese text environment, m is generally between 6000 an Thus, ajacency matrix base inverte file moel can achieve more 50% benefit of query efficiency at the cost of much less extra space overhea compare with the size of original text atabase. 4.2 Traitional PAT array vs. D 2 1) Time cost for inexing builing: O(N log(n)). 2) Space overhea a) D 2 : O(N+m*m).

8 1940 Journal of Software 2002,13(10) b) PAT array: O(N). 3) Search time cost a) D 2 : At most 4log 2 (E) isk accesses an 2log 2 (E) 1 comparisons. b) PAT array: At most 4log 2 (N) isk accesses an 2log 2 (N) 1 comparisons. Let us consier the worst case. For large-scale text atabases, we have N >> m, which means that the extra storage overhea of D 2 is negligible compare with the size of original text atabase. At the same time, D 2 benefits 8log 2 (m) isk accesses an 4log 2 (m) comparisons less than traitional PAT array. Specifically, suppose the size of text atabase N=640Mb, at most 117 isk accesses are requeste for traitional PAT array. In the case of English text collections, let m=256, then 64 isk accesses are cut own by D 2, i.e., the improvement ratio of query efficiency is 54%. For the Chinese text environment, let m=6763 (Taking GB for example), then 101 isk accesses are cut own by D 2, that is, a query efficiency improvement ratio of 86%. Certainly, the results are approximate estimation, which may eviate somewhat from the realistic values. However, because each matrix element in D 2 is only part of the entire PAT array of the text atabase, D 2 can still outperform the traitional PAT array. Finally, we have a comparison between D 1 an D 2. Note that there is no much ifference in space overhea between D 1 an D 2. However, it is noteworthy to analyze the ifference in their search time cost. Still, we consier isk access the major bottleneck of query efficiency. Obviously, query efficiency of D 1 is relate to the length of query string, while query efficiency of D 2 epens on the size of matrix element involve in query processing. When isk accesses of the two moels are equal, we have That is, From (10), we obtain ( + 1) / 2 4log2( E ). (9) l q ( + 1) / 2 4log 2 ( N /( m * m)). (10) l q l q 8log 2 ( N /( m * m)) 1. (11) Equation (11) sets an approximate criterion about when to hanle query using D 1 an D 2. We carry out further estimation in the following two specific cases: 1) In the case of English text atabase, let N=640M an m=256, we have l q 105. That is to say, when the length of query is shorter than 105 characters, it is more efficient to hanle queries using D 1 than using D 2. On the contrary, if the length of query is longer than 105 characters, D 2 is favorable. 2) Uner the Chinese text environment, let N=640M an m=6763, we can obtain l q 30, i.e., when the length of query is shorter than 30 characters, it is more efficient to hanle query using D 1 than using D 2. Otherwise, it is favorable to use D 2 to hanle query. For a certain text atabase, if the query is short, it is more efficient to process query by using D 1 than using D 2. Otherwise, it is favorable to use D 2. In realty, it is likely to accommoate the two query moes simultaneously in the same text atabase. In implementation, we can establish the inexing matrix accoring to D 2, then choose query moe D 1 or D 2 in terms of query length. 5 Experimental Results Experiments are carrie out over five real worl Chinese text collections (liste in Table 1) to valiate the feasibility an efficiency of the new moels. These text collections contain mainly Chinese classics, contemporary Chinese novels an ocuments from several Chinese BBS sites. The experiments are conucte on a PC with a PIII

9 : MHz CPU an 512Mb RAM. The goal of experiments is to measure the improvement ratio of query processing efficiency of the new inexing moels. We efine the query processing efficiency improvement ratio as follows. r ( t t ) / t 100%. (12) = ol new ol Here, t ol represents the time consume for hanling one or more queries with the traitional inexing moels, i.e., inverte file an PAT array, while t new inicates the time use for hanling the same queries with the new inexing moels, i.e., ajacency matrix base inverte file an ajacency matrix base PAT array. To make the experimental results more reasonable an reliable, we chose manually 1000 ifferent queries for testing. The query processing efficiency is evaluate on the average time cost in hanling the 1000 ifferent queries. Each query is a Chinese text string with from 2 to 25 Chinese characters, an its querie result is requeste not to be empty. Table 2 lists the query number istribution over query length of the 1000 ifferent queries. The number of short queries is larger than that of the long queries, which conforms to the realistic situation of user query elivery. Table 1 Test text collections Text atabase TC-1 TC-2 TC-3 TC-4 TC-5 Size (Mb) Table 2 Query number istribution over query length of the 1000 processe queries l q (characters) N The experimental results are presente in Fig.3, which show that the new moels can improve consierably the query processing efficiencies of the traitional inverte file an PAT array. Fig.3 Experimental results of query processing efficiency improvement ratio 6 Conclusions In this paper, we propose, investigate an implemente two new full-text inexing moels that can improve the query efficiency of IR systems consierably at the cost of much less amount of extra storage overhea compare to the size of original text atabase. Experiments over real worl Chinese text collections were carrie, which valiate the effectiveness an efficiency of the new moels. Further research will focus on exploring efficient builing an upating algorithms as well as query optimization approaches for the new inexing moels over large-scale text atabases. References: [1] Baesa-Yates, R., Ribeiro-Neto, B. Moern Information Retrieval. Reaing, MA: Aison Wesley, [2] Sullivan, D. Search Engine Watch. [3] AltaVista, [4] Zhou, Shui-geng. Key techniques of Chinese text atabases [Ph.D. Thesis]. Shanghai: Fuan University, 2000 (in Chinese).

10 1942 Journal of Software 2002,13(10) [5] Tomasic, A., Garcia-Molina, H., Shoens, K. Incremental upates of inverte lists for text ocument retrieval. In: Snograss, R.T., Winslett, M., es. Proceeings of the SIGMOD 94. New York: ACM Press, [6] Ribeiro-Neto, B.A., Silva e Moura, E., Neubert, M.S., Ziviani, N. Efficient istribute algorithms to buil inverte files. In: Hearst, M., Tong, R., es. Proceeings of the SIGIR 99. New York: ACM Press, [7] Faloutsos, C. Signature-Base text retrieval methos: a survey. Data Engineering Bulletin, 1990,13(1): [8] Manber, U., Myers, E. Suffix arrays: a new metho for on-line string searches. SIAM Journal of Computing, 1993,22(5): [9] Chavez, E, Navarro, G., et al. Searching in metric spaces. ACM Computing Surveys, 2001,33(3): [4]. [ ]. :, , 2 3, 1 (,200433); 2 (,200433); 2 (, ) :,.,., ;,, PAT. ;,.,. : ; ; ;PAT ; : TP311 : A

Online Appendix to: Generalizing Database Forensics

Online Appendix to: Generalizing Database Forensics Online Appenix to: Generalizing Database Forensics KYRIACOS E. PAVLOU an RICHARD T. SNODGRASS, University of Arizona This appenix presents a step-by-step iscussion of the forensic analysis protocol that

More information

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 17, No 3 Sofia 017 Print ISSN: 1311-970; Online ISSN: 1314-4081 DOI: 10.1515/cait-017-0030 Particle Swarm Optimization Base

More information

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Generalized Edge Coloring for Channel Assignment in Wireless Networks Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu Institute of Information Science Acaemia Sinica Taipei, Taiwan Da-wei Wang Jan-Jan Wu Institute of Information Science

More information

APPLYING GENETIC ALGORITHM IN QUERY IMPROVEMENT PROBLEM. Abdelmgeid A. Aly

APPLYING GENETIC ALGORITHM IN QUERY IMPROVEMENT PROBLEM. Abdelmgeid A. Aly International Journal "Information Technologies an Knowlege" Vol. / 2007 309 [Project MINERVAEUROPE] Project MINERVAEUROPE: Ministerial Network for Valorising Activities in igitalisation -

More information

Cluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters

Cluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters Available online at www.scienceirect.com Proceia Engineering 4 (011 ) 34 38 011 International Conference on Avances in Engineering Cluster Center Initialization Metho for K-means Algorithm Over Data Sets

More information

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control Almost Disjunct Coes in Large Scale Multihop Wireless Network Meia Access Control D. Charles Engelhart Anan Sivasubramaniam Penn. State University University Park PA 682 engelhar,anan @cse.psu.eu Abstract

More information

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Generalized Edge Coloring for Channel Assignment in Wireless Networks TR-IIS-05-021 Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu, Pangfeng Liu, Da-Wei Wang, Jan-Jan Wu December 2005 Technical Report No. TR-IIS-05-021 http://www.iis.sinica.eu.tw/lib/techreport/tr2005/tr05.html

More information

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation DEIM Forum 2018 I4-4 Abstract Ranom Clustering for Multiple Sampling Units to Spee Up Run-time Sample Generation uzuru OKAJIMA an Koichi MARUAMA NEC Solution Innovators, Lt. 1-18-7 Shinkiba, Koto-ku, Tokyo,

More information

Finite Automata Implementations Considering CPU Cache J. Holub

Finite Automata Implementations Considering CPU Cache J. Holub Finite Automata Implementations Consiering CPU Cache J. Holub The finite automata are mathematical moels for finite state systems. More general finite automaton is the noneterministic finite automaton

More information

Skyline Community Search in Multi-valued Networks

Skyline Community Search in Multi-valued Networks Syline Community Search in Multi-value Networs Rong-Hua Li Beijing Institute of Technology Beijing, China lironghuascut@gmail.com Jeffrey Xu Yu Chinese University of Hong Kong Hong Kong, China yu@se.cuh.eu.h

More information

Chapter 9 Memory Management

Chapter 9 Memory Management Contents 1. Introuction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threas 6. CPU Scheuling 7. Process Synchronization 8. Dealocks 9. Memory Management 10.Virtual Memory

More information

Lecture 1 September 4, 2013

Lecture 1 September 4, 2013 CS 84r: Incentives an Information in Networks Fall 013 Prof. Yaron Singer Lecture 1 September 4, 013 Scribe: Bo Waggoner 1 Overview In this course we will try to evelop a mathematical unerstaning for the

More information

Coupling the User Interfaces of a Multiuser Program

Coupling the User Interfaces of a Multiuser Program Coupling the User Interfaces of a Multiuser Program PRASUN DEWAN University of North Carolina at Chapel Hill RAJIV CHOUDHARY Intel Corporation We have evelope a new moel for coupling the user-interfaces

More information

6.823 Computer System Architecture. Problem Set #3 Spring 2002

6.823 Computer System Architecture. Problem Set #3 Spring 2002 6.823 Computer System Architecture Problem Set #3 Spring 2002 Stuents are strongly encourage to collaborate in groups of up to three people. A group shoul han in only one copy of the solution to the problem

More information

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. Preface Here are my online notes for my Calculus I course that I teach here at Lamar University. Despite the fact that these are my class notes, they shoul be accessible to anyone wanting to learn Calculus

More information

Using Vector and Raster-Based Techniques in Categorical Map Generalization

Using Vector and Raster-Based Techniques in Categorical Map Generalization Thir ICA Workshop on Progress in Automate Map Generalization, Ottawa, 12-14 August 1999 1 Using Vector an Raster-Base Techniques in Categorical Map Generalization Beat Peter an Robert Weibel Department

More information

Learning convex bodies is hard

Learning convex bodies is hard Learning convex boies is har Navin Goyal Microsoft Research Inia navingo@microsoftcom Luis Raemacher Georgia Tech lraemac@ccgatecheu Abstract We show that learning a convex boy in R, given ranom samples

More information

A Convex Clustering-based Regularizer for Image Segmentation

A Convex Clustering-based Regularizer for Image Segmentation Vision, Moeling, an Visualization (2015) D. Bommes, T. Ritschel an T. Schultz (Es.) A Convex Clustering-base Regularizer for Image Segmentation Benjamin Hell (TU Braunschweig), Marcus Magnor (TU Braunschweig)

More information

Overlap Interval Partition Join

Overlap Interval Partition Join Overlap Interval Partition Join Anton Dignös Department of Computer Science University of Zürich, Switzerlan aignoes@ifi.uzh.ch Michael H. Böhlen Department of Computer Science University of Zürich, Switzerlan

More information

A Plane Tracker for AEC-automation Applications

A Plane Tracker for AEC-automation Applications A Plane Tracker for AEC-automation Applications Chen Feng *, an Vineet R. Kamat Department of Civil an Environmental Engineering, University of Michigan, Ann Arbor, USA * Corresponing author (cforrest@umich.eu)

More information

Study of Network Optimization Method Based on ACL

Study of Network Optimization Method Based on ACL Available online at www.scienceirect.com Proceia Engineering 5 (20) 3959 3963 Avance in Control Engineering an Information Science Stuy of Network Optimization Metho Base on ACL Liu Zhian * Department

More information

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method Southern Cross University epublications@scu 23r Australasian Conference on the Mechanics of Structures an Materials 214 Transient analysis of wave propagation in 3D soil by using the scale bounary finite

More information

Image compression predicated on recurrent iterated function systems

Image compression predicated on recurrent iterated function systems 2n International Conference on Mathematics & Statistics 16-19 June, 2008, Athens, Greece Image compression preicate on recurrent iterate function systems Chol-Hui Yun *, Metzler W. a an Barski M. a * Faculty

More information

1 Surprises in high dimensions

1 Surprises in high dimensions 1 Surprises in high imensions Our intuition about space is base on two an three imensions an can often be misleaing in high imensions. It is instructive to analyze the shape an properties of some basic

More information

Multilevel Linear Dimensionality Reduction using Hypergraphs for Data Analysis

Multilevel Linear Dimensionality Reduction using Hypergraphs for Data Analysis Multilevel Linear Dimensionality Reuction using Hypergraphs for Data Analysis Haw-ren Fang Department of Computer Science an Engineering University of Minnesota; Minneapolis, MN 55455 hrfang@csumneu ABSTRACT

More information

Comparison of Methods for Increasing the Performance of a DUA Computation

Comparison of Methods for Increasing the Performance of a DUA Computation Comparison of Methos for Increasing the Performance of a DUA Computation Michael Behrisch, Daniel Krajzewicz, Peter Wagner an Yun-Pang Wang Institute of Transportation Systems, German Aerospace Center,

More information

Image Segmentation using K-means clustering and Thresholding

Image Segmentation using K-means clustering and Thresholding Image Segmentation using Kmeans clustering an Thresholing Preeti Panwar 1, Girhar Gopal 2, Rakesh Kumar 3 1M.Tech Stuent, Department of Computer Science & Applications, Kurukshetra University, Kurukshetra,

More information

BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES

BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES OLIVIER BERNARDI AND ÉRIC FUSY Abstract. We present bijections for planar maps with bounaries. In particular, we obtain bijections for triangulations an quarangulations

More information

Solution Representation for Job Shop Scheduling Problems in Ant Colony Optimisation

Solution Representation for Job Shop Scheduling Problems in Ant Colony Optimisation Solution Representation for Job Shop Scheuling Problems in Ant Colony Optimisation James Montgomery, Carole Faya 2, an Sana Petrovic 2 Faculty of Information & Communication Technologies, Swinburne University

More information

State Indexed Policy Search by Dynamic Programming. Abstract. 1. Introduction. 2. System parameterization. Charles DuHadway

State Indexed Policy Search by Dynamic Programming. Abstract. 1. Introduction. 2. System parameterization. Charles DuHadway State Inexe Policy Search by Dynamic Programming Charles DuHaway Yi Gu 5435537 503372 December 4, 2007 Abstract We consier the reinforcement learning problem of simultaneous trajectory-following an obstacle

More information

Frequent Pattern Mining. Frequent Item Set Mining. Overview. Frequent Item Set Mining: Motivation. Frequent Pattern Mining comprises

Frequent Pattern Mining. Frequent Item Set Mining. Overview. Frequent Item Set Mining: Motivation. Frequent Pattern Mining comprises verview Frequent Pattern Mining comprises Frequent Pattern Mining hristian Borgelt School of omputer Science University of Konstanz Universitätsstraße, Konstanz, Germany christian.borgelt@uni-konstanz.e

More information

Preamble. Singly linked lists. Collaboration policy and academic integrity. Getting help

Preamble. Singly linked lists. Collaboration policy and academic integrity. Getting help CS2110 Spring 2016 Assignment A. Linke Lists Due on the CMS by: See the CMS 1 Preamble Linke Lists This assignment begins our iscussions of structures. In this assignment, you will implement a structure

More information

EFFICIENT ON-LINE TESTING METHOD FOR A FLOATING-POINT ADDER

EFFICIENT ON-LINE TESTING METHOD FOR A FLOATING-POINT ADDER FFICINT ON-LIN TSTING MTHOD FOR A FLOATING-POINT ADDR A. Droz, M. Lobachev Department of Computer Systems, Oessa State Polytechnic University, Oessa, Ukraine Droz@ukr.net, Lobachev@ukr.net Abstract In

More information

Verifying performance-based design objectives using assemblybased vulnerability

Verifying performance-based design objectives using assemblybased vulnerability Verying performance-base esign objectives using assemblybase vulnerability K.A. Porter Calornia Institute of Technology, Pasaena, Calornia, USA A.S. Kiremijian Stanfor University, Stanfor, Calornia, USA

More information

Indexing Variable Length Substrings for Exact and Approximate Matching

Indexing Variable Length Substrings for Exact and Approximate Matching Indexing Variable Length Substrings for Exact and Approximate Matching Gonzalo Navarro 1, and Leena Salmela 2 1 Department of Computer Science, University of Chile gnavarro@dcc.uchile.cl 2 Department of

More information

Overview. Operating Systems I. Simple Memory Management. Simple Memory Management. Multiprocessing w/fixed Partitions.

Overview. Operating Systems I. Simple Memory Management. Simple Memory Management. Multiprocessing w/fixed Partitions. Overview Operating Systems I Management Provie Services processes files Manage Devices processor memory isk Simple Management One process in memory, using it all each program nees I/O rivers until 96 I/O

More information

CS 106 Winter 2016 Craig S. Kaplan. Module 01 Processing Recap. Topics

CS 106 Winter 2016 Craig S. Kaplan. Module 01 Processing Recap. Topics CS 106 Winter 2016 Craig S. Kaplan Moule 01 Processing Recap Topics The basic parts of speech in a Processing program Scope Review of syntax for classes an objects Reaings Your CS 105 notes Learning Processing,

More information

A FUZZY FRAMEWORK FOR SEGMENTATION, FEATURE MATCHING AND RETRIEVAL OF BRAIN MR IMAGES

A FUZZY FRAMEWORK FOR SEGMENTATION, FEATURE MATCHING AND RETRIEVAL OF BRAIN MR IMAGES A FUZZY FRAMEWORK FOR SEGMENTATION, FEATURE MATCHING AND RETRIEVAL OF BRAIN MR IMAGES Archana.S 1 an Srihar.S 2 1 Department of Information Science an Technology, College of Engineering, Guiny archana.santhira@gmail.com

More information

Politecnico di Torino. Porto Institutional Repository

Politecnico di Torino. Porto Institutional Repository Politecnico i Torino Porto Institutional Repository [Proceeing] Automatic March tests generation for multi-port SRAMs Original Citation: Benso A., Bosio A., i Carlo S., i Natale G., Prinetto P. (26). Automatic

More information

Improving Spatial Reuse of IEEE Based Ad Hoc Networks

Improving Spatial Reuse of IEEE Based Ad Hoc Networks mproving Spatial Reuse of EEE 82.11 Base A Hoc Networks Fengji Ye, Su Yi an Biplab Sikar ECSE Department, Rensselaer Polytechnic nstitute Troy, NY 1218 Abstract n this paper, we evaluate an suggest methos

More information

Compiler Optimisation

Compiler Optimisation Compiler Optimisation Michael O Boyle mob@inf.e.ac.uk Room 1.06 January, 2014 1 Two recommene books for the course Recommene texts Engineering a Compiler Engineering a Compiler by K. D. Cooper an L. Torczon.

More information

A New Search Algorithm for Solving Symmetric Traveling Salesman Problem Based on Gravity

A New Search Algorithm for Solving Symmetric Traveling Salesman Problem Based on Gravity Worl Applie Sciences Journal 16 (10): 1387-1392, 2012 ISSN 1818-4952 IDOSI Publications, 2012 A New Search Algorithm for Solving Symmetric Traveling Salesman Problem Base on Gravity Aliasghar Rahmani Hosseinabai,

More information

Non-homogeneous Generalization in Privacy Preserving Data Publishing

Non-homogeneous Generalization in Privacy Preserving Data Publishing Non-homogeneous Generalization in Privacy Preserving Data Publishing W. K. Wong, Nios Mamoulis an Davi W. Cheung Department of Computer Science, The University of Hong Kong Pofulam Roa, Hong Kong {wwong2,nios,cheung}@cs.hu.h

More information

Frequency Domain Parameter Estimation of a Synchronous Generator Using Bi-objective Genetic Algorithms

Frequency Domain Parameter Estimation of a Synchronous Generator Using Bi-objective Genetic Algorithms Proceeings of the 7th WSEAS International Conference on Simulation, Moelling an Optimization, Beijing, China, September 15-17, 2007 429 Frequenc Domain Parameter Estimation of a Snchronous Generator Using

More information

On Effectively Determining the Downlink-to-uplink Sub-frame Width Ratio for Mobile WiMAX Networks Using Spline Extrapolation

On Effectively Determining the Downlink-to-uplink Sub-frame Width Ratio for Mobile WiMAX Networks Using Spline Extrapolation On Effectively Determining the Downlink-to-uplink Sub-frame With Ratio for Mobile WiMAX Networks Using Spline Extrapolation Panagiotis Sarigianniis, Member, IEEE, Member Malamati Louta, Member, IEEE, Member

More information

New Version of Davies-Bouldin Index for Clustering Validation Based on Cylindrical Distance

New Version of Davies-Bouldin Index for Clustering Validation Based on Cylindrical Distance New Version of Davies-Boulin Inex for lustering Valiation Base on ylinrical Distance Juan arlos Roas Thomas Faculta e Informática Universia omplutense e Mari Mari, España correoroas@gmail.com Abstract

More information

An Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources

An Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources An Algorithm for Builing an Enterprise Network Topology Using Wiesprea Data Sources Anton Anreev, Iurii Bogoiavlenskii Petrozavosk State University Petrozavosk, Russia {anreev, ybgv}@cs.petrsu.ru Abstract

More information

Indexing the Edges A simple and yet efficient approach to high-dimensional indexing

Indexing the Edges A simple and yet efficient approach to high-dimensional indexing Inexing the Eges A simple an yet efficient approach to high-imensional inexing Beng Chin Ooi Kian-Lee Tan Cui Yu Stephane Bressan Department of Computer Science National University of Singapore 3 Science

More information

A shortest path algorithm in multimodal networks: a case study with time varying costs

A shortest path algorithm in multimodal networks: a case study with time varying costs A shortest path algorithm in multimoal networks: a case stuy with time varying costs Daniela Ambrosino*, Anna Sciomachen* * Department of Economics an Quantitative Methos (DIEM), University of Genoa Via

More information

Fast Fractal Image Compression using PSO Based Optimization Techniques

Fast Fractal Image Compression using PSO Based Optimization Techniques Fast Fractal Compression using PSO Base Optimization Techniques A.Krishnamoorthy Visiting faculty Department Of ECE University College of Engineering panruti rishpci89@gmail.com S.Buvaneswari Visiting

More information

Indexing and Searching

Indexing and Searching Indexing and Searching Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Modern Information Retrieval, chapter 8 2. Information Retrieval:

More information

Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2

Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2 This paper appears in J. of Parallel an Distribute Computing 10 (1990), pp. 167 181. Intensive Hypercube Communication: Prearrange Communication in Link-Boun Machines 1 2 Quentin F. Stout an Bruce Wagar

More information

Indexing and Searching

Indexing and Searching Indexing and Searching Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Modern Information Retrieval, chapter 9 2. Information Retrieval:

More information

THE BAYESIAN RECEIVER OPERATING CHARACTERISTIC CURVE AN EFFECTIVE APPROACH TO EVALUATE THE IDS PERFORMANCE

THE BAYESIAN RECEIVER OPERATING CHARACTERISTIC CURVE AN EFFECTIVE APPROACH TO EVALUATE THE IDS PERFORMANCE БСУ Международна конференция - 2 THE BAYESIAN RECEIVER OPERATING CHARACTERISTIC CURVE AN EFFECTIVE APPROACH TO EVALUATE THE IDS PERFORMANCE Evgeniya Nikolova, Veselina Jecheva Burgas Free University Abstract:

More information

Characterizing Decoding Robustness under Parametric Channel Uncertainty

Characterizing Decoding Robustness under Parametric Channel Uncertainty Characterizing Decoing Robustness uner Parametric Channel Uncertainty Jay D. Wierer, Wahee U. Bajwa, Nigel Boston, an Robert D. Nowak Abstract This paper characterizes the robustness of ecoing uner parametric

More information

Advanced method of NC programming for 5-axis machining

Advanced method of NC programming for 5-axis machining Available online at www.scienceirect.com Proceia CIRP (0 ) 0 07 5 th CIRP Conference on High Performance Cutting 0 Avance metho of NC programming for 5-axis machining Sergej N. Grigoriev a, A.A. Kutin

More information

Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means

Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means Classifying Facial Expression with Raial Basis Function Networks, using Graient Descent an K-means Neil Allrin Department of Computer Science University of California, San Diego La Jolla, CA 9237 nallrin@cs.ucs.eu

More information

Learning to Rank: A New Technology for Text Processing

Learning to Rank: A New Technology for Text Processing TFANT 07 Tokyo Univ. March 2, 2007 Learning to Rank: A New Technology for Text Processing Hang Li Microsoft Research Asia Talk Outline What is Learning to Rank? Ranking SVM Definition Search Ranking SVM

More information

Loop Scheduling and Partitions for Hiding Memory Latencies

Loop Scheduling and Partitions for Hiding Memory Latencies Loop Scheuling an Partitions for Hiing Memory Latencies Fei Chen Ewin Hsing-Mean Sha Dept. of Computer Science an Engineering University of Notre Dame Notre Dame, IN 46556 Email: fchen,esha @cse.n.eu Tel:

More information

Shift-map Image Registration

Shift-map Image Registration Shift-map Image Registration Svärm, Linus; Stranmark, Petter Unpublishe: 2010-01-01 Link to publication Citation for publishe version (APA): Svärm, L., & Stranmark, P. (2010). Shift-map Image Registration.

More information

A Novel PAT-Tree Approach to Chinese Document Clustering

A Novel PAT-Tree Approach to Chinese Document Clustering A Novel PAT-Tree Approach to Chinese Document Clustering Kenny Kwok, Michael R. Lyu, Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, N.T., Hong Kong

More information

On the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems

On the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems On the Role of Multiply Sectione Bayesian Networks to Cooperative Multiagent Systems Y. Xiang University of Guelph, Canaa, yxiang@cis.uoguelph.ca V. Lesser University of Massachusetts at Amherst, USA,

More information

Recitation Caches and Blocking. 4 March 2019

Recitation Caches and Blocking. 4 March 2019 15-213 Recitation Caches an Blocking 4 March 2019 Agena Reminers Revisiting Cache Lab Caching Review Blocking to reuce cache misses Cache alignment Reminers Due Dates Cache Lab (Thursay 3/7) Miterm Exam

More information

Feature Extraction and Rule Classification Algorithm of Digital Mammography based on Rough Set Theory

Feature Extraction and Rule Classification Algorithm of Digital Mammography based on Rough Set Theory Feature Extraction an Rule Classification Algorithm of Digital Mammography base on Rough Set Theory Aboul Ella Hassanien Jafar M. H. Ali. Kuwait University, Faculty of Aministrative Science, Quantitative

More information

d 3 d 4 d d d d d d d d d d d 1 d d d d d d

d 3 d 4 d d d d d d d d d d d 1 d d d d d d Proceeings of the IASTED International Conference Software Engineering an Applications (SEA') October 6-, 1, Scottsale, Arizona, USA AN OBJECT-ORIENTED APPROACH FOR MANAGING A NETWORK OF DATABASES Shu-Ching

More information

EXACT SIMULATION OF A BOOLEAN MODEL

EXACT SIMULATION OF A BOOLEAN MODEL Original Research Paper oi:10.5566/ias.v32.p101-105 EXACT SIMULATION OF A BOOLEAN MODEL CHRISTIAN LANTUÉJOULB MinesParisTech 35 rue Saint-Honoré 77305 Fontainebleau France e-mail: christian.lantuejoul@mines-paristech.fr

More information

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama an Hayato Ohwaa Faculty of Sci. an Tech. Tokyo University of Science, 2641 Yamazaki, Noa-shi, CHIBA, 278-8510, Japan hiroyuki@rs.noa.tus.ac.jp,

More information

Robust Camera Calibration for an Autonomous Underwater Vehicle

Robust Camera Calibration for an Autonomous Underwater Vehicle obust Camera Calibration for an Autonomous Unerwater Vehicle Matthew Bryant, Davi Wettergreen *, Samer Aballah, Alexaner Zelinsky obotic Systems Laboratory Department of Engineering, FEIT Department of

More information

SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH

SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH Galen H Sasaki Dept Elec Engg, U Hawaii 2540 Dole Street Honolul HI 96822 USA Ching-Fong Su Fuitsu Laboratories of America 595 Lawrence Expressway

More information

Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks

Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks Queueing Moel an Optimization of Packet Dropping in Real-Time Wireless Sensor Networks Marc Aoun, Antonios Argyriou, Philips Research, Einhoven, 66AE, The Netherlans Department of Computer an Communication

More information

Bends, Jogs, And Wiggles for Railroad Tracks and Vehicle Guide Ways

Bends, Jogs, And Wiggles for Railroad Tracks and Vehicle Guide Ways Ben, Jogs, An Wiggles for Railroa Tracks an Vehicle Guie Ways Louis T. Klauer Jr., PhD, PE. Work Soft 833 Galer Dr. Newtown Square, PA 19073 lklauer@wsof.com Preprint, June 4, 00 Copyright 00 by Louis

More information

PERFECT ONE-ERROR-CORRECTING CODES ON ITERATED COMPLETE GRAPHS: ENCODING AND DECODING FOR THE SF LABELING

PERFECT ONE-ERROR-CORRECTING CODES ON ITERATED COMPLETE GRAPHS: ENCODING AND DECODING FOR THE SF LABELING PERFECT ONE-ERROR-CORRECTING CODES ON ITERATED COMPLETE GRAPHS: ENCODING AND DECODING FOR THE SF LABELING PAMELA RUSSELL ADVISOR: PAUL CULL OREGON STATE UNIVERSITY ABSTRACT. Birchall an Teor prove that

More information

Exploring Context with Deep Structured models for Semantic Segmentation

Exploring Context with Deep Structured models for Semantic Segmentation 1 Exploring Context with Deep Structure moels for Semantic Segmentation Guosheng Lin, Chunhua Shen, Anton van en Hengel, Ian Rei between an image patch an a large backgroun image region. Explicitly moeling

More information

Open Access Adaptive Image Enhancement Algorithm with Complex Background

Open Access Adaptive Image Enhancement Algorithm with Complex Background Sen Orers for Reprints to reprints@benthamscience.ae 594 The Open Cybernetics & Systemics Journal, 205, 9, 594-600 Open Access Aaptive Image Enhancement Algorithm with Complex Bacgroun Zhang Pai * epartment

More information

William S. Law. Erik K. Antonsson. Engineering Design Research Laboratory. California Institute of Technology. Abstract

William S. Law. Erik K. Antonsson. Engineering Design Research Laboratory. California Institute of Technology. Abstract Optimization Methos for Calculating Design Imprecision y William S. Law Eri K. Antonsson Engineering Design Research Laboratory Division of Engineering an Applie Science California Institute of Technology

More information

Improving Performance of Sparse Matrix-Vector Multiplication

Improving Performance of Sparse Matrix-Vector Multiplication Improving Performance of Sparse Matrix-Vector Multiplication Ali Pınar Michael T. Heath Department of Computer Science an Center of Simulation of Avance Rockets University of Illinois at Urbana-Champaign

More information

Depth Sizing of Surface Breaking Flaw on Its Open Side by Short Path of Diffraction Technique

Depth Sizing of Surface Breaking Flaw on Its Open Side by Short Path of Diffraction Technique 17th Worl Conference on Nonestructive Testing, 5-8 Oct 008, Shanghai, China Depth Sizing of Surface Breaking Flaw on Its Open Sie by Short Path of Diffraction Technique Hiroyuki FUKUTOMI, Shan LIN an Takashi

More information

arxiv: v2 [math.co] 5 Jun 2018

arxiv: v2 [math.co] 5 Jun 2018 Some useful lemmas on the ege Szege inex arxiv:1805.06578v [math.co] 5 Jun 018 Shengjie He 1 1. Department of Mathematics, Beijing Jiaotong University, Beijing, 100044, China Abstract The ege Szege inex

More information

Data Mining: Concepts and Techniques. Chapter 7. Cluster Analysis. Examples of Clustering Applications. What is Cluster Analysis?

Data Mining: Concepts and Techniques. Chapter 7. Cluster Analysis. Examples of Clustering Applications. What is Cluster Analysis? Data Mining: Concepts an Techniques Chapter Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.eu/~hanj Jiawei Han an Micheline Kamber, All rights reserve

More information

Rough Set Approach for Classification of Breast Cancer Mammogram Images

Rough Set Approach for Classification of Breast Cancer Mammogram Images Rough Set Approach for Classification of Breast Cancer Mammogram Images Aboul Ella Hassanien Jafar M. H. Ali. Kuwait University, Faculty of Aministrative Science, Quantitative Methos an Information Systems

More information

Message Transport With The User Datagram Protocol

Message Transport With The User Datagram Protocol Message Transport With The User Datagram Protocol User Datagram Protocol (UDP) Use During startup For VoIP an some vieo applications Accounts for less than 10% of Internet traffic Blocke by some ISPs Computer

More information

PART 2. Organization Of An Operating System

PART 2. Organization Of An Operating System PART 2 Organization Of An Operating System CS 503 - PART 2 1 2010 Services An OS Supplies Support for concurrent execution Facilities for process synchronization Inter-process communication mechanisms

More information

Threshold Based Data Aggregation Algorithm To Detect Rainfall Induced Landslides

Threshold Based Data Aggregation Algorithm To Detect Rainfall Induced Landslides Threshol Base Data Aggregation Algorithm To Detect Rainfall Inuce Lanslies Maneesha V. Ramesh P. V. Ushakumari Department of Computer Science Department of Mathematics Amrita School of Engineering Amrita

More information

Computer Organization

Computer Organization Computer Organization Douglas Comer Computer Science Department Purue University 250 N. University Street West Lafayette, IN 47907-2066 http://www.cs.purue.eu/people/comer Copyright 2006. All rights reserve.

More information

Considering bounds for approximation of 2 M to 3 N

Considering bounds for approximation of 2 M to 3 N Consiering bouns for approximation of to (version. Abstract: Estimating bouns of best approximations of to is iscusse. In the first part I evelop a powerseries, which shoul give practicable limits for

More information

Just-In-Time Software Pipelining

Just-In-Time Software Pipelining Just-In-Time Software Pipelining Hongbo Rong Hyunchul Park Youfeng Wu Cheng Wang Programming Systems Lab Intel Labs, Santa Clara What is software pipelining? A loop optimization exposing instruction-level

More information

MORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks

MORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks : a Movement-Base Routing Algorithm for Vehicle A Hoc Networks Fabrizio Granelli, Senior Member, Giulia Boato, Member, an Dzmitry Kliazovich, Stuent Member Abstract Recent interest in car-to-car communications

More information

Chalmers Publication Library

Chalmers Publication Library Chalmers Publication Library All-to-all Broacast for Vehicular Networks Base on Coe Slotte ALOHA This ocument has been ownloae from Chalmers Publication Library (CPL). It is the author s version of a work

More information

Calculation on diffraction aperture of cube corner retroreflector

Calculation on diffraction aperture of cube corner retroreflector November 10, 008 / Vol., No. 11 / CHINESE OPTICS LETTERS 8 Calculation on iffraction aperture of cube corner retroreflector Song Li (Ó Ø, Bei Tang (», an Hui Zhou ( ï School of Electronic Information,

More information

Shift-map Image Registration

Shift-map Image Registration Shift-map Image Registration Linus Svärm Petter Stranmark Centre for Mathematical Sciences, Lun University {linus,petter}@maths.lth.se Abstract Shift-map image processing is a new framework base on energy

More information

NAND flash memory is widely used as a storage

NAND flash memory is widely used as a storage 1 : Buffer-Aware Garbage Collection for Flash-Base Storage Systems Sungjin Lee, Dongkun Shin Member, IEEE, an Jihong Kim Member, IEEE Abstract NAND flash-base storage evice is becoming a viable storage

More information

A PSO Optimized Layered Approach for Parametric Clustering on Weather Dataset

A PSO Optimized Layered Approach for Parametric Clustering on Weather Dataset Vol.3, Issue.1, Jan-Feb. 013 pp-504-508 ISSN: 49-6645 A PSO Optimize Layere Approach for Parametric Clustering on Weather Dataset Shikha Verma, 1 Kiran Jyoti 1 Stuent, Guru Nanak Dev Engineering College

More information

CS269I: Incentives in Computer Science Lecture #8: Incentives in BGP Routing

CS269I: Incentives in Computer Science Lecture #8: Incentives in BGP Routing CS269I: Incentives in Computer Science Lecture #8: Incentives in BGP Routing Tim Roughgaren October 19, 2016 1 Routing in the Internet Last lecture we talke about elay-base (or selfish ) routing, which

More information

All-to-all Broadcast for Vehicular Networks Based on Coded Slotted ALOHA

All-to-all Broadcast for Vehicular Networks Based on Coded Slotted ALOHA Preprint, August 5, 2018. 1 All-to-all Broacast for Vehicular Networks Base on Coe Slotte ALOHA Mikhail Ivanov, Frerik Brännström, Alexanre Graell i Amat, an Petar Popovski Department of Signals an Systems,

More information

A Novel Density Based Clustering Algorithm by Incorporating Mahalanobis Distance

A Novel Density Based Clustering Algorithm by Incorporating Mahalanobis Distance Receive: November 20, 2017 121 A Novel Density Base Clustering Algorithm by Incorporating Mahalanobis Distance Margaret Sangeetha 1 * Velumani Paikkaramu 2 Rajakumar Thankappan Chellan 3 1 Department of

More information

Cloud Search Service Product Introduction. Issue 01 Date HUAWEI TECHNOLOGIES CO., LTD.

Cloud Search Service Product Introduction. Issue 01 Date HUAWEI TECHNOLOGIES CO., LTD. 1.3.15 Issue 01 Date 2018-11-21 HUAWEI TECHNOLOGIES CO., LTD. Copyright Huawei Technologies Co., Lt. 2019. All rights reserve. No part of this ocument may be reprouce or transmitte in any form or by any

More information

Mining Sequential Patterns with Periodic Wildcard Gaps

Mining Sequential Patterns with Periodic Wildcard Gaps Mining Sequential Patterns with Perioic Wilcar Gaps Youxi Wu, Lingling Wang, Jiaong Ren, Wei Ding, Xinong Wu Abstract Mining frequent patterns with perioic wilcar gaps is a critical ata mining problem

More information

Tight Wavelet Frame Decomposition and Its Application in Image Processing

Tight Wavelet Frame Decomposition and Its Application in Image Processing ITB J. Sci. Vol. 40 A, No., 008, 151-165 151 Tight Wavelet Frame Decomposition an Its Application in Image Processing Mahmu Yunus 1, & Henra Gunawan 1 1 Analysis an Geometry Group, FMIPA ITB, Banung Department

More information

THE APPLICATION OF ARTICLE k-th SHORTEST TIME PATH ALGORITHM

THE APPLICATION OF ARTICLE k-th SHORTEST TIME PATH ALGORITHM International Journal of Physics an Mathematical Sciences ISSN: 2277-2111 (Online) 2016 Vol. 6 (1) January-March, pp. 24-6/Mao an Shi. THE APPLICATION OF ARTICLE k-th SHORTEST TIME PATH ALGORITHM Hua Mao

More information

Throughput Characterization of Node-based Scheduling in Multihop Wireless Networks: A Novel Application of the Gallai-Edmonds Structure Theorem

Throughput Characterization of Node-based Scheduling in Multihop Wireless Networks: A Novel Application of the Gallai-Edmonds Structure Theorem Throughput Characterization of Noe-base Scheuling in Multihop Wireless Networks: A Novel Application of the Gallai-Emons Structure Theorem Bo Ji an Yu Sang Dept. of Computer an Information Sciences Temple

More information