Mining Sequential Patterns with Periodic Wildcard Gaps

Size: px
Start display at page:

Download "Mining Sequential Patterns with Periodic Wildcard Gaps"

Transcription

1 Mining Sequential Patterns with Perioic Wilcar Gaps Youxi Wu, Lingling Wang, Jiaong Ren, Wei Ding, Xinong Wu Abstract Mining frequent patterns with perioic wilcar gaps is a critical ata mining problem to eal with complex real-worl problems. This problem can be escribe as follows: given a subect sequence, a pre-specifie threshol, an a variable gap-length with wilcars between each two consecutive letters. The task is to gain all frequent patterns with perioic wilcar gaps. State-of-the-art mining algorithms which use matrix or other linear ata structures to solve the problem not only consume a large amount of memory but also run slowly. In this stuy, we use an Incomplete Nettree structure (the last layer of a Nettree which is an extension of a tree) of a sub-pattern P to efficiently create Incomplete Nettrees of all its super-patterns with prefix pattern P an compute the numbers of their supports in a one-way scan. We propose two new algorithms, MAPB (Mining sequential Pattern using incomplete Nettree with Breath first search) an MAPD (Mining sequential Pattern using incomplete Nettree with Depth first search), to solve the problem effectively with low memory requirements. Furthermore, we esign a heuristic algorithm MAPBOK (MAPB for top-k) base on MAPB to eal with the Top-K frequent patterns for each length. Experimental results on real-worl biological ata emonstrate the superiority of the propose algorithms in running time an space consumption an also show that the pattern matching approach can be employe to mine special frequent patterns effectively. Keywors Sequential pattern mining, Perioic wilcar gap, Pattern matching, Heuristic algorithm, Nettree Y. Wu ( ), L. Wang, J. Ren, W. Ding, X. Wu School of Computer Science an Engineering, Hebei University of Technology, Tianin 33, China wuc567@63.com L. Wang wangling927927@26.com J. Ren School of Information Science an Engineering, Yanshan Univeristy, Qinghuangao 664, China ren@ysu.eu.cn W. Ding Department of Computer Science, University of Massachusetts Boston, Boston, 225, USA wei.ing@umb.eu X. Wu Department of Computer Science, University of Vermont, Burlington, VT 545, USA xwu@cs.uvm.eu. Introuction Pattern mining plays an essential role in many critical ata mining tasks, such as graph mining [], spatiotemporal ata mining [2], an univariate uncertain ata streams [3]. Sequential pattern mining, first introuce by Agrawal an Srikant [4], is a very important ata mining problem [5] with broa applications incluing analyzing moving obect ata [6], etecting intrusion [7], an iscovering changes in customer behaviour [8] etc. In orer to solve the specific applications, researchers have also propose some special approaches. For example, Liao an Chen [9] propose a Depth-First SPelling algorithm for mining sequential patterns in long biological sequences. To avoi proucing a large number of uninteresting an meaningless patterns, Hu et al. [] propose an algorithm which consiers compactness, repetition, recency an frequency ointly to select sequential patterns an it is efficient in the business-to-business environment. Shie et al. [] propose an efficient algorithm name IM-Span to mine user behaviour patterns in mobile commerce environments. Yin et al. [2] propose an efficient algorithm USpan for mining high-utility sequential patterns. Zhu et al. [3] propose an efficient algorithm Spier-Mine to mine the Top-K largest frequent patterns from a single massive network with any user-specifie probability. Wu et al. [4] propose a novel framework for mining the Top-K high-utility itemsets. Classical sequential pattern mining algorithms inclue GSP [4] an Prefix Span [5]. Researchers have recently focuse on perioic etection [6] an mining frequent patterns with perioic wilcar gaps [7], since this kin of pattern can flexibly reflect sequential behaviours an is often exhibite in many real-worl fiels. For example, in business, retail companies may want to know what proucts customers will usually purchase at regular time intervals rather than in continuous time accoring to time gaps. So managers can aust marketing methos or strategies an improve sales profit [8, 9]. In biology, perioic patterns are reeeme as having significant biological an meical values. Consiering that genes an proteins both contain many repetitive fragments, many perioic patterns of ifferent lengths an types can therefore be foun. Some of the patterns with excessively high frequency have been etermine as the suspecte cause of some iseases [2, 2]. In ata mining, researchers foun that some frequent perioic patterns can be use in feature selection for the purpose of classification an clustering [22, 23]. So an extraorinary task is how to mine all these frequent patterns an the Top-K patterns for each length from a mass of ata to improve its efficiency in each application fiel. In this stuy, we focus on mining frequent patterns with perioic wilcar gaps. This problem can be escribe as follows: given a subect sequence, a pre-specifie threshol, an a variable gap-length with wilcars between each two consecutive letters. Our goal is to gain all frequent patterns with perioic wilcar gaps an the Top-K frequent patterns for each length. The pattern with perioic wilcar gaps means that all these wilcar gap constraints are the same. For example, pattern P=p [min, max ]p [min, max ]p 2 =A[,

2 2]T[, 2]G is a pattern with perioic wilcar gaps since min =min = an max =max =2. We can easily know that both patterns P2=p [min, max ]p [min, max ]p 2 =A[,2]T[,2]G an P3=p [min, max ]p [min, max ]p 2 =A[,3]T[,2]G are not patterns with perioic wilcar gaps since the wilcar gaps between letters are not completely the same. In existing work, a series of algorithms ha been propose on the pattern mining with flexible wilcars, as etaile in Sect. 2. But all these state-of-the-art mining algorithms use a matrix or other linear ata structure to solve the problem an consume a large amount of memory an run slowly [2, 24, 25]. It is also very ifficult to mine frequent patterns in long sequences. Therefore, we propose two more efficient mining algorithms by using an incomplete Nettree structure an can get all frequent patterns in a shorter time with less memory space. In aition, we also esign a heuristic algorithm to gain the Top-K frequent patterns for each length effectively. The contributions of this paper are escribe specifically as follows: We propose an incomplete Nettree structure (see Part 4.2) which can be use to compute the numbers of supports of those super-patterns in a one-way scan. We esign two algorithms name MAPB (Mining sequential Pattern using incomplete Nettree with Breath first search) an MAPD (Mining sequential Pattern using incomplete Nettree with Depth first search) to solve the problem effectively with low memory requirements. Experiments on real-worl bio-ata show that MAPD is much faster than MAPB an can be employe to mine frequent patterns in long sequences. Furthermore, a heuristic algorithm name MAPBOK (MAPB for top-k) which is esigne base on MAPB is presente to solve the Top-K mining problem, although MAPB is slower than MAPD. This algorithm can efficiently get the Top-K frequent patterns for each length without having to sort all frequent patterns or to check all possible caniate patterns. The paper is organize as follows. Section 2 reviews relate work. Section 3 presents the problem efinition an preliminaries. Section 4 introuces Nettree an the incomplete Nettree structure an then proposes the MAPB an MAPD algorithms. The analysis of the time an space complexities of these two algorithms an an illustrative example are also shown in section 4. Section 5 proposes a heuristic algorithm to fin the Top-K frequent patterns for each length an the longest frequent patterns. Section 6 reports comparative stuies an Section 7 conclues the paper. 2. Relate work We iscuss the existing work from the following features: () Pattern P has many wilcar gap constraints. In some stuies the gap constraints can be ifferent [26]. But in this paper all these wilcar gap constraints are the same. The pattern is calle a pattern with perioic wilcar gaps. (2) In some stuies, only one or two positions of the given sequence are consiere [23], whereas we consier that a group of positions are use to enote a support of a pattern in the given sequence. This means that each position p shoul be consiere. For example, for a given sequence S=s s s 2 s 3 s 4 s 5 = AATTGG an a pattern P=p [min, max ]p [min, max ]p 2 =A[,2]T[,2]G, <, 2, 4> is a support of P in S ue to the fact that s =p =A, s 2 =p =T, an s 4 =p 2 =G. So it is easy for us to know that there are 2*2*2=8 supports in the problem. (3) Each position can be use many times in this stuy. In the literature [26-28] each position in the sequence can be use at most once, which is calle the one-off conition first introuce by Chen et al. [29]. Uner the one-off conition there are only two supports of P=A[,]T[,]G in S=s s s 2 s 3 s 4 s 5 =AATTGG which are <, 2, 4> an <, 3, 5>. Another kin of stuy is uner the non-overlapping conition [3]. For instance, given S=s s s 2 s 3 s 4 =ATATA an P=A[,]T[,]A, it is easy to know that <,, 2> an <2, 3, 4> are two supports of P in S. We notice that position 2 is use twice in <,, 2> an <2, 3, 4>, but position 2 oes not appear at the same inication of the supports. Hence <,, 2> an <2, 3, 4> are two supports of P in S uner the non-overlapping conition rather than the one-off conition. But in this stuy, we o not have these kins of constraints. (4) In the literature [26, 27, 3] it is shown that the Apriori property can be use to prune caniate patterns if the sequential pattern mining has the one-off conition or the non-overlapping conition, since the number of supports of any super-pattern is less than that of its sub-patterns uner these conitions. So if a pattern is not frequent, all its super-patterns are not frequent. Whereas the Apriori property oes not hol in our mining problem, since the number of supports (even support ratio) of a super-pattern can be greater than that of its sub-patterns. To reuce reunant caniate patterns, the Apriori-like property was propose [2]. The property can be escribe as that all super-patterns are not frequent if the support ratio of a pattern is less than a value which is less than the given threshol. (5) Generally, pattern mining algorithms can fin all frequent patterns whose number of supports is not less than the given threshol. However, it is not easy for users to use the frequent patterns when they face thousans of patterns or more. In some cases, parts of the uplicate frequent patterns nee to be remove to prune the uninteresting patterns because these patterns an their super-patterns have the same number of supports. This is calle close frequent patterns. Moreover, some issues focus on the Top-K mining problem [3, 4, 3] since the Top-K frequent patterns are more useful than other frequent patterns. Mining frequent patterns with perioic wilcar gaps essentially relies on a counting mechanism to calculate the number of supports (or occurrences) of a pattern an then etermine whether the pattern is frequent or not. However, on one han, the number of supports generally grows exponentially with the length of the checke patterns. On the other han, both the number of supports an support ratio o not satisfy monotonicity an the Apriori property cannot be use to reuce the size of the set of caniate patterns. Thus it is infeasible for traitional algorithms to solve this problem. Researchers have explore several maor approaches to untangle this har problem: () The set of length-m caniate patterns is use to generate a set of length-(m+) caniate patterns. Then select frequent patterns from the set of caniate patterns an iterate this process till it is one [2, 22, 23]. (2) The Apriori-like property is employe to reuce the size of the set of caniate patterns [2]. (3) Min et al. [24] reefine the problem of [2] an the support ratio is monotonous in the new efinitions. So the Apriori property can be use. (4) Pattern matching techniques are use to calculate the numbers of supports of patterns in the given sequence to fin frequent patterns [24, 25]. Table shows a comparison of relate work.

3 Table. Comparison of relate work Algorithms Perioic wilcar gaps Apriori property Occurrences Output patterns He et al. [26] No Apriori One-off All frequent patterns Ding et al. [3] No Apriori Non-overlap All/close frequent patterns Xie et al. [27] Yes Apriori One-off All frequent patterns Li et al. [23] Yes Apriori First-last Close/long frequent patterns Ji et al. [2] Yes Apriori-like All Minimal istinguishing subsequence patterns Min et al. [24] Yes Apriori All All frequent patterns Zhang et al. [2] Yes Apriori-like All All frequent patterns Zhu an Wu [25] Yes Apriori-like All All frequent patterns This paper Yes Apriori-like All All/Top-K frequent patterns for each length In Table, Zhang et al. [2] stuie the problem of mining frequent patterns with perioic wilcar gaps in a genome sequence an propose the MPP algorithm base on the Apriori-like property which employe an effective pruning mechanism to reuce the size of the set of caniate patterns. Min et al. [24] reefine the issue of [2] an the Apriori property can become available. The mining algorithm uses a pattern matching approach to calculate the number of supports (or occurrences) of a pattern in the given sequence. Zhu an Wu [25] explore the MCPaS algorithm to iscover frequent patterns with perioic wilcar gaps from multi-sequences. The propose Gap Constraine Search (GCS) algorithm use a sparse array to calculate the number of supports of a pattern. Ji et al. [2] propose an algorithm to mine all minimal istinguishing subsequences satisfying the minimum an maximum gap constraints an a maximum length constraint. Li et al. [23] propose Gap-BIDE to iscover the complete set of close sequential patterns with gap constraints, an Gap-Connect to mine the approximate set of long patterns. They further explore several feature selection methos from the set of gap-constraine patterns on the issue of classification an clustering. A triple (bp, ep, count) was use to represent a set of supports of a pattern which share the same beginning position an ening position, where bp, ep, an count are the beginning position of the supports in the sequence, the ening position of the supports in the sequence, an the total number of supports, respectively. Xie et al. [27] an He et al. [26] both stuie the problem of mining frequent patterns uner the one-off conition which can greatly improve the time performance. In [27], the MAIL algorithm employing the left-most an the right-most pruning methos is esigne to fin frequent patterns with perioic wilcar gaps. In [26], two heuristic algorithms, namely one-way scan an two-way scan, are propose to mine all frequent patterns. Ding et al. [3] stuie the repetitive gappe subsequence mining problem an focuse on mining close frequent patterns with non-overlapping supports. Accoring to Table, we can see that stuies [2, 24, 25] are the most closely relate with our stuy. The support ratio is use to etermine whether a pattern is frequent or not in this kin of issue an a pattern an its super-patterns generally have ifferent support ratios. So we pay more attention to mining all frequent patterns rather than close frequent patterns. Whereas the algorithm in [2] is less effective to mine all frequent patterns, especially in long sequences. Moreover, traitional Top-K mining stuies such as Zhu et al. [3] an Wu et al. [4] focus on mining the Top-K frequent patterns in all frequent patterns. But those strategies are invali when users are intereste in the Top-K frequent patterns with length t (<=t <=m an m is the length of the longest frequent patterns). For example, suppose the length of the longest frequent patterns is an the numbers of frequent patterns for various lengths are 4, 6, 64, 256, 24, 496, 3438, 5767, 64, an, respectively. Users may be intereste in the Top- frequent patterns with length 8 an the Top- frequent patterns with length 9. But the traitional Top-K mining algorithm can only fin the Top- frequent patterns in all frequent patterns. Therefore, more effective mining algorithms especially in long sequences an mining the Top-K frequent patterns for each length are worth exploring. 3. Problem efinition an preliminaries We first give some efinitions relate with the propose methos. Definition. S=s... s i... s n- is calle a subect sequence, where s i is a symbol an n is the length of S. can be a ifferent symbol set in ifferent applications an the size of is enote by. In a DNA sequence, ={A,T,C,G } an =4. Definition 2. P=p [min,max ]p... [min -,max - ] p... [min m-2,max m-2 ] p m- is a pattern, where p, m is the length of P, min - an max - are integer numbers an min - max -. min - an max - are gap constraints, presenting the minimum an maximum number of wilcars between two consecutive letters p - an p, respectively. If min = min = = min m-2 =M an max = max = = max m-2 =N, pattern P is calle a pattern with perioic wilcar gaps. For example, A[,3]C[,3]C is a pattern with perioic wilcar gaps [,3]. Definition 3. If a sequence of inices D=<,,, m- > is subect to M N ( m-), D is an offset sequence of P with perioic wilcar gaps [M, N] in S. The number of offset sequences of P in S is enote by ofs(p, S). Definition 4. If an offset sequence I=<i,,i,,i m- > is subect to si = p ( m- an i n-), I is a support (or occurrence) of P in S. The number of supports of P in S is enote by sup(p, S). Definition 5. r(p, S)= sup(p, S)/ofs(P,S) is calle the support ratio. If r(p, S) is not less than a pre-specifie threshol ρ, pattern P is a frequent pattern, otherwise P is an infrequent pattern. Apparently, sup(p, S) is not greater than ofs(p, S) since each offset sequence can be a support. So r(p, S). Zhang et al. [2] gave a metho to calculate ofs(p,s) with gap constraints [M, N]. Suppose m an n are the length of pattern P an the given sequence, respectively. Let W=N-M+, l = ( n + M ) /( M + ), an l 2 = ( n + N) /( N + ). ofs(p,s) is calculate as follows. For m>l, ofs(p,s)=. () For m l 2, ofs(p,s)=(n-(m-)((m+n)/2+) )W^(m-) (2) For l 2 <m l, ofs(p,s) can be calculate by a recursive formula. (3) Example. Suppose patterns P= A[,3]C an P2=G[, 3]C, a sequence S=s s s 2 s 3 s 4 s 5 =AGCCCT an ρ =.25. We can easily enumerate all 9 offset sequences of P an P2 in S which are <,2>,<,3>,<,4>, <,3>, <,4>, <,5>, <2,4>, <2,5>, an <3,5>. Hence ofs(p, S)=ofs(P2, S)=9. The

4 supports of P in S inclue <,2>,<,3>, an <,4> an sup(p, S)=3. P is a frequent pattern since the support ratio r(p, S)=3/9=.333. We can see that sup(p2, S)=2 an the supports are <, 3> an <, 4>. So r(p2, S)=2/9=.222 < ρ an P2 is an infrequent pattern. Definition 6. Given patterns P an Q, if Q is a sub-string of P, P an Q are calle the super-pattern an sub-pattern, respectively. If sub-pattern Q contains the first P - characters of P, Q is a prefix pattern of P enote by Prefix (P). Similarly, if sub-pattern Q contains the last P - characters of P, Q is a suffix pattern of P enote by Suffix (P). Example 2. Suppose patterns P3= A[, 3]C[, 3]T, Q=A, Q2=C, an Q3=A[, 3]C. Q, Q2, an Q3 are sub-patterns of P3 an P3 is a super-pattern of Q, Q2, an Q3. The prefix pattern an suffix pattern of P3 are Q3 an C[,3]T, respectively. One of the most important tasks of the mining problem is how to calculate the number of supports of a pattern in orer to etermine whether the pattern is frequent or not. Zhu an Wu [25] propose the GCS algorithm to calculate the number of supports of pattern P in sequence S. An illustrative example shows the principle of GCS. Example 3. Suppose a pattern P=T[,3]C[,3]G an a sequence S=s s s 2 s 3 s 4 s 5 s 6 s 7 s 8 s 9 = TTCCTCCGCG. Figure shows the principle of the GCS algorithm which creates a two-imensional array A. If p i <>s ( i m-), then A(i, )=. If i= an p i =s, then A(i, )=, otherwise A(i, )= k = N k= M A (i-, -k-) (<i m-). The sum of the elements in the last row is the number of supports of the pattern in the sequence. Hence we know that the number of supports of P in S is = T T C C T C C G C G T C G 5 4 Figure. Gap constraine pattern search We know that the array in Figure is a sparse array. So GCS must eal with much useless zero ata that makes it ineffective to compute the number of supports of a pattern. We employ an incomplete Nettree ata structure to overcome its shortcomings an improve the effectiveness. It is necessary to esign a goo ata structure to spee up the calculation of the number of supports of a pattern, in terms of the mining efficiency. What is more, a preferable pruning strategy also plays an important role. As we know, the Apriori property oes not hol in our mining problem. Example 4 shows that not only the number of supports but also the support ratio of a pattern oes not satisfy monotonicity. Example 4. Suppose a sequence S=TCGGG, a pattern Q=T[,3]C an its super-pattern P=T[,3]C[,3]G. We know that sup(q,s)= an sup(p,s)=3. So the number of supports of a pattern can excee that of its sub-pattern. The offset sequences of Q in S are <,>, <,2>, <,3>, <,4>, <,2>, <,3>, <,4>, <2,3>, <2,4>, an <3,4>. The offset sequences of P in S are <,,2>, <,,3>, <,,4>, <,2,3>, <,2,4>, <,3,4>, <,2,3>, <,2,4>, <,3,4>, an <2,3,4>. So both ofs(q,s) an ofs(p,s) are. Hence r(q, S) is less than r(p, S). Therefore, this example shows that not only is sup(q,s) less than sup(p,s) but also ofs(q,s) is less than ofs(p,s). But luckily, Zhang et al. [2] propose the Apriori-like property to prune caniate patterns effectively an can get all frequent patterns. This property means that if the support ratio of a pattern is less than some value, all its super-patterns are not frequent. Lemma. For any length-m pattern Q an its length-(m+) super-pattern P, we know that sup(p)<=sup(q)*w, where W=N-M+ an M an N are the minimum an maximum gap, respectively. Proof: Let I=<i,i, i,,i m- > be a support of the pattern Q. P has at most W supports with prefix-support I in sequence S which are I =<i, i,, i,, i m-, i m- +M+>, I 2 =<i,i,, i,, i m-,i m- +M+2>,...I W =<i,i,, i,, i m-, i m- +N+>. So sup(p)<= sup(q)* W. Likewise, when Q is the suffix pattern of P, we can get the same formula as above. Theorem. If the support ratio of a length-m pattern Q is n ( )*( w + ) less than * ρ, all its super-patterns which n ( m )*( w + ) contain it can be thought infrequent, where (>m) is the length of the longest frequent patterns. sup( Q, S) n ( )*( w + ) Proof: We know that < * ρ. ofs( Q, S) n ( m )*( w + ) Suppose P is a super-pattern of pattern Q with length k (m<k<=), then we know that ofs(q,s)=(n-(m-)*(w+))*w m- an ofs(p)=(n-(k-) *(w+))*w k- accoring to the offset n ( k )*( w + ) k m equation. So ofs(p,s)= * W * ofs(q,s). n ( m )*( w + ) We know that sup(p,s)<= sup(q,s)*w k-m accoring to Lemma. Hence sup(p,s)/ofs(p,s)<= sup( Q, S) n ( m )*( w + ) * * ρ <= ofs( Q, S) n ( k )*( w + ) sup( Q, S) n ( m )*( w + ) * * ρ <ρ. Therefore Theorem ofs( Q, S) n ( )*( w + ) is prove. Definition 7. All frequent patterns with the Apriori or Apriori-like property can be escribe by a tree which is calle a frequent pattern tree. The root of the tree is null. A path from the root to a noe in the tree is a frequent pattern. Example 5. Suppose all frequent patterns of a DNA sequence are {A, T, C, G, AA, AC, AG, TA, TT, TC, CA, CC, CG, GA, GT, GC, GG, AGA, AGT, AGC} which can be expresse as a frequent pattern tree as shown in Figure 2. null A T C G A C G A T C A C G A T C G A T C Figure 2. A frequent pattern tree 4. Nettree an the algorithms 4.. Nettree Definition 8. A Nettree ata structure is a collection of noes. The collection can be empty, otherwise a Nettree consists of some istinguishe noes r,,r m (root), an or more non-empty subnettrees T,T 2,, T n, each of whose roots can be connecte by at least an ege from root r i, where m, n, an i n. A generic Nettree is shown in Figure 3.

5 r rm T T2 Tn Figure 3. A generic Nettree than once, calle the one-off conition [29], supports <,3> an <4,5> are not the optimal supports uner the one-off conition. This instance has many optimal solutions an one of the solutions is <,3>, <,5> an <4,8>. Literature [33] esigne an algorithm to fin the optimal supports using Nettree. st level T 3 4 A Nettree is an extension of a tree because it has similar concepts of a tree, such as the root, leaf, level, parent, chil, an so on. In orer to show the ifferences between tree ata structure an Nettree ata structure, Property is given as follows. Property. A Nettree [32,33] has the following four properties. () A Nettree may have more than one root. (2) Some noes except roots in a Nettree may have more than one parent. (3) There may be more than one path from a noe to a root. (4) Some noes may occur more than once. Definition 9. Given a Nettree, if the same noe can occur once or more, we use the noe name to enote each of them if i each noe occurs only once. Otherwise n enotes noe i on the th level if a noe occurs more than once on ifferent levels in the Nettree [32, 33]. Example 6. Figure 4 shows a Nettree. Noe 3 occurs 3 3 times in the Nettree. n, st, 2 n an 3 r levels, respectively. Noe n an n 3. Figure 4. A Nettree 3 n 2 an n 3 3 enote noe 3 on the Definition. A path from a root to noe 4 n 2 has two parents, i n is calle a root path. R( n ) enotes the Number of Root Paths (NRP) of noe i i n. The NRPs of a root an a th level noe i n (>) are an the sum of the NRPs of its parents, respectively. A pattern matching problem with gaps can be use to construct a Nettree accoring to the pattern an the sequence [32, 33]. On one han, a Nettree can clearly express all supports of the pattern in the sequence [32]. On the other han, it is easy to calculate the number of supports an fin the optimal supports uner the one-off conitions [33]. An illustrative example is shown as follows: Example 7. Suppose pattern P = T[,3]C an sequence S=s s s 2 s 3 s 4 s 5 s 6 s 7 s 8 s 9 =TTCCTCCGCG. Figure 5 shows a Nettree for the pattern matching problem. The numbers in the white an grey circles are the name of a noe an its NRP, respectively. A path from a root to a 2 n level noe in the Nettree is a support of pattern P in sequence S. For example, path (,3) which is from noe in the first level to noe 3 in the secon level is a support <,3> of P in S. So a Nettree can show all supports of P in S effectively. Hence, the number of supports is =8. This is the solution of literature [32]. However, if each position can be use no more 2 n level C 5 Figure 5. A Nettree for P in S Therefore, literatures [32] an [33] which eal with two pattern matching problems use Nettree to calculate the number of supports an fin the optimal supports uner the one-off conition when pattern P an sequence S are given, respectively. Literature [32] can be seen as calculating the searching space of literature [33]. Whereas this stuy focuses on a pattern mining problem that is to fin all frequent patterns when sequence S is given Algorithms It is very easy to get a path from a noe in the first level to a noe in the last level in a Nettree if a Nettree has parent-chil an chil-parent relationships. A path is a support (or occurrence) of pattern P in sequence S for the pattern matching problem. So a Nettree has parent-chil an chil-parent relationships in [32, 33]. However, we only want to know the number of supports in the pattern mining problem. Hence it is not necessary to store the parent-chil an chil-parent relationships of the Nettree in this stuy. The Nettree is usually store in a noe array an each element in the noe array stores the noe name an its NRP. Lemma 2. The sum of NRPs of the th ( P ) level noes is the number of supports of sub-pattern Q which is the first characters of pattern P. Proof: Let Q be the first characters of pattern P. R( n ) is the number of supports of Q at position i. So the sum of NRPs of the th ( P ) level noes is the number of supports of Q in S. By Lemma 2, we know that the sum of NRPs of the first level noes, ++=3, is the number of supports of Prefix (P) in S in Example 7. The sum of NRPs of the last level noes ( =8) is the number of supports of pattern P. It is easily seen that the top m- level noes of the Nettree of the length-m pattern P are reunant in calculating the number of supports of its super-pattern. So we only store the last level noes of the Nettree of a pattern which is calle an incomplete Nettree. The reasons for using an incomplete Nettree to solve the problem are as follows. () The incomplete Nettree only stores useful information an avois calculating useless ata, which makes the algorithm more effective an consumes less memory. But in [24, 25], a two-imensional array is use to calculate the number of supports. So the two algorithms in [24, 25] have to eal with useless information which influences the efficiency of mining since the array is sparse. (2) We can use the incomplete Nettree of pattern P to construct a new incomplete Nettree an calculate the number of supports of its super-pattern. This can take full avantage of the previous calculating results. However, if we can simultaneously calculate the numbers of supports of super-patterns with the same prefix pattern in a one-way scan, the pattern mining algorithm will further i

6 enhance its efficiency. For example, suppose that a pattern P=T[,3]C is frequent in a DNA sequence an we nee to etermine whether super-patterns P = T[,3]C[,3]A, P 2 =T[,3]C[,3]T, P 3 =T[,3]C[,3]C, an P 4 =T[,3]C[,3]G are frequent or not. It will be less effective to calculate the numbers of supports of these four super-patterns in a four-way scan than in a one-way scan. The principle of calculating the numbers of supports of all super-patterns P[M, N]a (a Σ) in a one-way scan is as follows. Accoring to gap constraints [M, N], we create all chil noes of noe q which is a noe in an incomplete Nettree of P. If s = Σ k (q+m+ q+n+, k< Σ ), we check whether noe is a noe in the incomplete Nettree of super-pattern P k or not. If it is not in an incomplete Nettree of P k, we create noe an store it in the incomplete Nettree an the NRP of noe is the NRP of noe q, otherwise we only upate the NRP of noe, making it plus the NRP of noe q. So the number of supports of super-pattern P k is the sum of the NRPs of all noes of the incomplete Nettree of P k. Hence, we can calculate the numbers of supports of these super-patterns in a one-way scan. The algorithm is name INSupport. Here we employ a structure name IINettree (Information of the Incomplete Nettree) to escribe the mine pattern string, the number of its supports an its incomplete Nettree in INSupport. For the sake of simplicity, we use the terms pattern, sup an INtree for short. So IINettree can be expresse as {pattern, sup, INtree}. Due to that the representation of the incomplete Nettree is relatively straightforwar, an incomplete Nettree is compose of a size an an array which contains names of all noes an their corresponing NRPs. Hence, IINettree can again be written as {pattern, sup, {size, (name, NRP ),, (name size-, NRP size- )}}. The inputs of INSupport are P, S an INtree which is an incomplete Nettree of P. The output of INSupport is superps which is an array of IINettree. The size of superps is Σ. For example, given a sequence S= TTCCTCCGCG an a prefix pattern P=T[,3]C (shown in Example 8). superps 2 can be expresse as {T[,3]C[,3]C, 5, {4, (3,2),(5,4),(6,6),(8,3)}} since C is the thir letter in in a DNA sequence. Algorithm INSupport is shown as follows. Algorithm : INSupport (P,S, INtree); Input: P, S, INtree Output: superps : superps.sup=; 2: superps.pattern= P[M, N]Σ; 3:for (i=; i< INtree ; i++) 4: olnoe= INtree i; 5: for (=olnoe.name+m+; olnoe.name +N+; ++) 6: if (s ==Σ k ) then 7: superps k.sup +=olnoe.nrp; 8: position=search(superps k..intree, ); 9: //The result will be - if is not in superps k.intree. : if (position==-) then : newnoe.name= ; 2: newnoe.nrp=olnoe.nrp; 3: superps k.intree size++ =newnoe; 4: else 5: superps k.intree position.nrp+= olnoe. NRP; 6: en if 7: en if 8: en for 9:en for 2: return superps; The principle of MAPB is given as follows. Firstly, each character in Σ is seen as a length- pattern P. We create Σ incomplete Nettrees for each pattern an calculate the numbers of their supports. Then we nee to etermine whether the support ratio of each pattern is not less than β = ρ *(n-(-)*(w+))/(n-(m-)*(w+)) or not, where w=(m+n)/2, an m are the length of the longest frequent patterns an the length of pattern P, respectively. If it is not less than the value, pattern P an its incomplete Nettree will be store in a queue. After this, prefix pattern P an its incomplete Nettree is equeue an we nee to check whether pattern P is a frequent pattern or not. Algorithm is use to calculate the number of supports of all super-patterns with pattern P an create Σ incomplete Nettrees of the super-patterns. Finally, we check whether each super-pattern is not less than β or not. If yes, we store it an its incomplete Nettree in the queue an then iterate this process till the queue is empty. Apparently, this metho is a kin of Apriori-like property to prune the number of caniate patterns an constructs the frequent pattern tree base on BFS. The algorithm name MAPB is given as follows. Algorithm 2. MAPB Input: S=s s s n-, M, N, ρ, an, where is the length of the longest frequent patterns Output: All patterns with frequency not less than ρ :patterns.pattern=σ; 2:for (i=; i< S ; i++) 3: if (s i ==Σ ) then 4: noe.name=i; 5: noe.nrp=; 6: patterns.intree size++ =noe; 7: en if 8:en for 9:for (=; < Σ ; ++) : patterns.sup= patterns.intree.size; : if (patterns.sup/ S >= ρ *(n-(-)*(w+))/n)) then meta. enqueue(patterns ); 2:en for 3:while (!meta.empty()) 4: subp =meta.equeue (); 5: P=subP.pattern; 6: INtree= subp.intree; 7: length= P ; 8: calculate r(p,s); 9: if (r(p,s)>= ρ ) then C length = C length U P; 2: superps = INSupport(P,S, INtree); 2: for (=; < Σ ; ++) 22: Q= superps. pattern; 23: calculate r (Q,S); 24: if(length+<=) then 25: if (r(q,s) >= ρ *(n-(-)*(w+))/ (n-length* (w+))) then meta.enqueue(superps ); 26: else 27: if (r(q,s) >= ρ ) then meta.enqueue(superps ); 28: en if 29: en for 3:en while 3:return C=U C i Although MAPB employs a pattern matching strategy (INSupport) to calculate the numbers of supports of caniate patterns, INSupport an the algorithm in [32] are significantly ifferent. The reasons are given as follows. Firstly, the algorithm in [32] is use to calculate the number of supports for

7 only one pattern, while INSupport can simultaneously calculate the numbers of supports for many patterns with a common prefix pattern. Seconly, the algorithm in [32] oes not nee any previous result to solve the problem, while INSupport calculates the numbers of supports epening on previous results. MAPB stores the frequent patterns in a queue. A new algorithm name MAPD stores the frequent patterns in a stack an the frequent pattern tree is constructe base on DFS. We can know that lines, 4, 25, an 27 of MAPD are then moifie as: : if (patterns.sup/ S >= ρ *(n-(-)*(w+))/n) then meta.push(patterns ); 4: subpattern=meta.pop(); 25: if (r(q,s) >= ρ *(n-(-)*(w+))/(n-length* (w+))) then meta.push(superps ); 27: if (r(q,s) >= ρ ) then meta.push(superps ); 4.3. A running example An illustrative example is use to show how Algorithm works. Example 8. Suppose sequence S=s s s 2 s 3 s 4 s 5 s 6 s 7 s 8 s 9 = TTCCTCCGCG, patterns P = T[,3]C[,3]A, P = T[,3]C[,3]T, P 2 =T[,3]C[,3]C, P 3 =T[, 3]C[,3]G an the incomplete Nettree of P=T[,3]C which is shown in the black frame in Figure 5. R(noe) is use to express the NRP of a noe accoring to Definition. We create chil noes of n 2 2 from s 3 to s 6 since gap constraints are [, 3]. Because s 3 =C, P 2 =T[,3]C[,3]C an 3 noe n 3 is not in the incomplete Nettree of P 2, we create noe n 3 3 in the incomplete Nettree of P 2 an R(n 3 3 )=R(n 2 2 )=2. We 4 create noe n 3 in the incomplete Nettree of P an R(n 4 3 )=R(n 2 2 )=2. Similarly, we create noes n 5 3 an n 6 3 in the incomplete Nettree of P 2, R(n 5 3 )= R(n 6 3 )= R(n 2 2 )=2. Then we create chil noes of n 3 2 from s 4 to s 7. Because s 4 =T an noe n 4 3 is in the incomplete Nettree of P, we upate the value of R(n 4 3 )= R(n 4 3 )+ R(n 3 2 )=4. Similarly, we can know that R(n 5 3 )= R(n 5 3 )+ R(n 3 2 )=4, R(n 6 3 )= R(n 6 3 )+ R(n 3 2 )=4, an R(n 7 3 ) = R(n 3 2 )=2. Finally, we create all chil noes of noe n 5 2, n 6 2, an n 8 2. Figure 6 shows incomplete Nettrees of P, P, P, P 2, an P 3. It is straightforwar to know that the numbers of supports of P, P, P 2, an P 3 in S are, 4, =5, an 5+4=9, respectively. 2 n level 3 r level P P P P2 P C A T 4 4 C G Figure 6. Incomplete Nettrees of P, P, P, P 2, an P Correctness an completeness The main ifference between MAPB an MAPD is using ifferent strategies to create the frequent pattern tree. So we only prove the correctness an completeness of MAPB. The same proofs apply to the correctness an completeness of MAPD. Theorem 2. (Correctness of MAPB) The output of MAPB is the frequent patterns. Proof: We know that Algorithm is correct accoring to Lemma 2. So MAPB can calculate the numbers of supports of 9 4 super-patterns with prefix P correctly. Zhang et al. [2] gave the metho to calculate ofs(p,s) an prove the correctness of the metho. Therefore Theorem 2 is prove. Theorem 3. (Completeness of MAPB) MAPB can fin all frequent patterns whose lengths are less than l 2. Proof: We know that is less than or equal to l 2 an if the n ( )*( w + ) support ratio of pattern P is less than * ρ, n ( m )*( w + ) all its super-patterns which contain it can be thought infrequent accoring to Theorem, where w= (M+N)/2 an l 2 = ( n + N) /( N + ) an n, m,, M, an N are the length of sequence, the length of pattern, the length of the longest frequent patterns, the minimum gap, an the maximum gap, respectively. So MAPB enqueues pattern P if its support ratio is not less than n ( )*( w + ) n ( m )*( w + ) * ρ an checks whether its super-patterns are frequent or not. Therefore all frequent patterns can be foun by MAPB. Hence Theorem 3 is prove. Generally is far less than l 2 in our mining problem, so the algorithms can usually fin all frequent patterns in a sequence Complexities analysis Both MAPB an MAPD can create the same frequent pattern tree. So the time complexities of these two algorithms are the same. Algorithm creates the chil noes of the incomplete Nettree of a sub-pattern. It is easy to see that the average size of the incomplete Nettrees is n/ Σ, where n is the length of the sequence. Each noe has W=N-M+ chilren. So the time complexity of Algorithm is O(W*n/ Σ ). Algorithm runs = len times, where len an are the number of length- frequent patterns an the length of the longest frequent pattern, respectively. Hence the time complexities of MAPB an MAPD are both O( len = *W*n/ Σ ). However, the mining algorithm using GCS (MGCS) creates rows for pattern P an Σ rows for all super-patterns with prefix P to calculate the numbers of supports of these patterns. Each row has n elements an each element is calculate W times. So the time complexity of GCS is O((+ Σ )*n*w). Thus, the time complexity of MGCS is O( len = *(+ Σ )*n*w). The average size of an incomplete Nettree is n/ Σ an each noe stores its name an NRP. So the space complexity of an incomplete Nettree is O(n/ Σ ). MAPB uses a queue an the max size of the queue is O( Σ ^x), where x is the position of max len. So the space complexity of MAPB is O( Σ ^(x-)*n). MAPD uses a stack an the max size of the stack is O(* Σ ). So the space complexity of MAPD is O(*n). We can see that the space complexity of MGCS is O((+ Σ )*n). Table 2 gives a comparison of the time an space complexities among MGCS, MAPB, an MAPD. Table 2. Comparison of the time an space complexities Algorithm Time complexity Space complexity MGCS O( = len *(+ Σ )*n*w) O((+ Σ )*n) MAPB O( = MAPD O( = len len *W*n/ Σ ) *W*n/ Σ ) O( Σ ^(x-)*n) O(*n)

8 5. Top-K mining Both MAPB an MAPD are mining algorithms base on the pattern matching approach to iscover all possible frequent patterns. This kin of pattern mining approach can also be employe to fin special frequent patterns effectively. For example, when we iscover thousans of frequent patterns or more, it is ifficult to use all these patterns. The most valuable patterns are Top-K frequent patterns with various lengths an the longest frequent patterns, where K is a specifie parameter, an this is calle the Top-K mining problem. Like Apriori, algorithm MPP-best [2] acts iteratively, generating length-(m+) caniate patterns using length-m frequent patterns an verifying their supports, till there is no caniate pattern. So it is ifficult to employ this kin of approach to construct an algorithm to solve this problem. An easy metho is to propose an algorithm which fins all frequent patterns an sorts these patterns an then outputs the Top-K frequent patterns to satisfy users nees. Another similar metho is to introuce an algorithm that checks all possible caniate patterns an selects the Top-K frequent patterns. But these methos take a long time to solve the Top-K mining problem in long sequences using MAPD because many useless frequent patterns are foun or useless possible caniate patterns are checke. To iscover these patterns effectively, a heuristic algorithm name MAPBOK is propose. The principle of MAPBOK is as follows. If sub-pattern P is a Top-K length-m frequent pattern, its super-pattern Q is a Top-K length-(m+) frequent pattern in high probability. Firstly, we mine the Top-(e*K) length-m frequent patterns an output the Top-K frequent patterns, where e is not less than. Then the Top-(e*K) frequent patterns are use to mine the length-(m+) frequent super-patterns. We select the Top-(e*K) frequent super-patterns an iterate this process till no new frequent patterns can be foun. Apparently, this metho constructs the frequent pattern tree base on BFS. The algorithm of MAPBOK is not given, since the change is straightforwar from MAPB. It is easy to see that the time complexity of MAPBOK is O(e*K**W*n/ Σ ) since MAPBOK runs Algorithm O(e*K*) times. The space complexity of MAPBOK is O(e*K* n/ Σ ) since the max size of the queue is O(e*K). 6. Performance evaluation From Table in section 2, we know that this stuy focuses on the same pattern mining problem as literatures [2] an [25]. Besies, the most relate issue is literature [24] in which the problem is reefine an the Apriori property is use. Here we call the algorithm in [24] AMIN. Therefore we present experimental results by comparing MPP-best, MGCS, AMIN, MAPB, an MAPD. In orer to show that our algorithms are superior to the state-of-the-art algorithms, we evaluate their performance base on the running time an memory requirements. For the Top-K mining problem for each length, we pay more attention to the mining accuracy of longer frequent patterns. Weighte accuracy can be calculate accoring to the following equation. = ( i * a ) /( i * ) (4) i= 3 i b i= 3 i where a i, b i, an are the numbers of correct Top-K frequent patterns, Top-K length-i frequent patterns, an the length of the longest frequent patterns, respectively. Generally, b i is K. But when c is less than K, b i becomes c, where c is the number of length-i frequent patterns. 6.. Experimental environment an ata The ata use in this paper are DNA sequences provie by the National Center for Biotechnology Information website. Homo Sapiens AX82974, AL587 an AB3849 are chosen as our test ata an can be ownloae from nlm.nih.gov/nuccore/ax82974, nuccore/al587. an nuccore/ AB3849, respectively. The source coes of MPP-best, MGCS, MAPB, MAPD, an MAPBOK can be obtaine from All experiments were run on a laptop with Pentium(R) Dual-Core T45@ 2.3GHz CPU an 2. GB of RAM, Winows 7. Java Development Kit (JDK).6. is use to evelop all algorithms. In this stuy, the greatest length of frequent patterns is consiere to be 3, the minimum an maximum gap constraints are 9 an 2, respectively an the threshol ρ is 3*^-5 because all these parameters were use in [2, 24, 25]. What is more, consiering that the efault stack memory of JVM is too small, we assign.5gb memory space for every algorithm which is the maximal memory space of the Java virtual machine on the laptop. Table 3 shows all the sequences use in this paper. Table 3. Bio-ata sequences Sequence From S Homo Sapiens AX82974 S2 Homo Sapiens AX S3 Homo Sapiens AX S4 Homo Sapiens AX S5 Homo Sapiens AX82974 S6 Homo Sapiens AL587 2 S7 Homo Sapiens AL587 4 S8 Homo Sapiens AL587 8 S9 Homo Sapiens AL S Homo Sapiens AB S Homo Sapiens AB S2 Homo Sapiens AB S3 Homo Sapiens AB With the above presente test environment an ata, Table 4 an Table 5 show the mining results an the comparison of max size of MAPB an MAPD, respectively. Table 4. Mining results Sequence The length of the longest frequent patterns, the number of frequent patterns for various lengths an total frequent patterns S 3, {4,6,64,256,24,496,3374,5678,54,623, 242, 55, 2}, S2 2, {4,6,64,256,24,496,525,3436,35,85,8,3}, S3, {4,6,64,256,24,496,5965,937,59,3}, S4, {4,6,64,256,24,496,497,4283,24,}, S5, {4,6,64,256,24,496,4422,48,299,}, S6, {4,6,64,256,24,496,269,768,64,8}, S7, {4,6,64,256,24,496,2388,696,749,}, S8, {4,6,64,256,24,496,2947,633,666,},25387 S9, {4,6,64,256,24,496,3438,5767,64,}, 2527 S, {4,6,64,256,24,496,257,797,563,2}, S, {4,6,64,256,24,496,26,7634,64,54,}, S2, {4,6,64,256,24,496,2799,644,699,}, S3, {4,6,64,256,24,496,293,6558,672,},2564

9 Table 5. Comparison of max size Sequence Max size of MAPB Max size of MAPD S S S S S S S S8 / 27 S9 / 26 S S S S3 / 28 Note: Empty means overflow error Running time evaluation Here we compare the running time on several real DNA fragments of ifferent lengths (shown in Figures 7~9). It is worth noting that the running time of MPP-best refers to the time ata on the left an the other algorithms refer to the right because the MPP-best nees to take much time to finish the mining task. We can analyze the running results from the following aspects: Running time (s) Homo Sapiens AX MPP-best MGCS AMIN MAPB MAPD Figure 7. Comparison of the running time on Homo Sapiens AX82974 Running time (s) Homo Sapiens AL MPP-best MGCS AMIN MAPB MAPD Figure 8. Comparison of the running time on Sapiens AL Running time (s) Homo Sapiens AB MPP-best MGCS AMIN MAPB MAPD Figure 9. Comparison of the running time on Homo Sapiens AB3849 Note: Empty means overflow error. () From Figures 7~9, we can see that MPP-best not only is the slowest, but also cannot be use to fin frequent patterns in long sequences. Only MPP-best uses the left axis to show its running time an other algorithms use the right axis to show theirs running time. The value of the left axis is far greater than that of the right axis. So we can easily notice that MPP-best is the slowest. For example, MPP-best costs nearly hours to fin the frequent patterns in sequence with length 4 (shown in Figure 8), while MAPD costs no more than minute. Furthermore, we observe that the running time of MPP-best grows faster than the length of a sequence. For example, the ata table in Figure 7 shows that the running time of MPP-best grows about 29 times when the length of a sequence grows about times from S to S5. Moreover, accoring to the running time in the tables of Figures 7~9, we can see that the running time of MGCS, AMIN MAPB, an MAPD is in linear growth with the length of a sequence. For example, when the length of a sequence grows about times from S to S5, theirs running time grows about times. However, if we mine frequent patterns in long sequences using MPP-best, we will also face the risk of out of memory when the length of a sequence is longer than 6. The reason is that MPP-best employs a PIL (Partial Inex List) structure which consumes lots of memory to calculate the number of supports. (2) It is clear that MGCS an AMIN run faster than MPP-best accoring to Figures 7~9. In [25], MCPaS is about 4 times faster than MPP-best in S but MGCS is about 6.84 times faster than MPP-best in our experiment. The reason is that MCPaS employs not only GCS but also an effective pruning strategy. If both MPP-best an MCPaS employ the same pruning strategy, MCPaS cannot be about 4 times faster than MPP-best. AMIN is a little faster than MGCS accoring to the figures. The reason is that AMIN reefines the issue an can use Apriori property which is a more effective pruning strategy than Apriori-like property. AMIN is a kin of approximate algorithm an [24] etaile the ifference between the approximate solutions an the accurate solutions. Moreover, both MGCS an AMIN can be use to mine in long sequences. The reason is that MGCS an AMIN use pattern matching approaches to calculate the number of supports for each caniate pattern. An this kin of approach consumes less memory than MPP-best. (3) Both MAPB an MAPD run faster than MPP-best an MGCS although they employ the same pruning strategy. MAPD runs about 45 an 28 times faster than MPP-best in S an S5, respectively. An MAPD runs about 5 times faster than MGCS in all sequences. So oes MAPB. Hence these two

NOSEP: Non-Overlapping Sequence Pattern Mining with Gap Constraints

NOSEP: Non-Overlapping Sequence Pattern Mining with Gap Constraints 1 NOSEP: Non-Overlapping Sequence Pattern Mining with Gap Constraints Youxi Wu, Yao Tong, Xingquan Zhu, Senior, IEEE, and Xindong Wu, Fellow, IEEE Abstract Sequence pattern mining aims to discover frequent

More information

Cluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters

Cluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters Available online at www.scienceirect.com Proceia Engineering 4 (011 ) 34 38 011 International Conference on Avances in Engineering Cluster Center Initialization Metho for K-means Algorithm Over Data Sets

More information

Frequent Pattern Mining. Frequent Item Set Mining. Overview. Frequent Item Set Mining: Motivation. Frequent Pattern Mining comprises

Frequent Pattern Mining. Frequent Item Set Mining. Overview. Frequent Item Set Mining: Motivation. Frequent Pattern Mining comprises verview Frequent Pattern Mining comprises Frequent Pattern Mining hristian Borgelt School of omputer Science University of Konstanz Universitätsstraße, Konstanz, Germany christian.borgelt@uni-konstanz.e

More information

THE APPLICATION OF ARTICLE k-th SHORTEST TIME PATH ALGORITHM

THE APPLICATION OF ARTICLE k-th SHORTEST TIME PATH ALGORITHM International Journal of Physics an Mathematical Sciences ISSN: 2277-2111 (Online) 2016 Vol. 6 (1) January-March, pp. 24-6/Mao an Shi. THE APPLICATION OF ARTICLE k-th SHORTEST TIME PATH ALGORITHM Hua Mao

More information

Uninformed search methods

Uninformed search methods CS 1571 Introuction to AI Lecture 4 Uninforme search methos Milos Hauskrecht milos@cs.pitt.eu 539 Sennott Square Announcements Homework assignment 1 is out Due on Thursay, September 11, 014 before the

More information

BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES

BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES BIJECTIONS FOR PLANAR MAPS WITH BOUNDARIES OLIVIER BERNARDI AND ÉRIC FUSY Abstract. We present bijections for planar maps with bounaries. In particular, we obtain bijections for triangulations an quarangulations

More information

Skyline Community Search in Multi-valued Networks

Skyline Community Search in Multi-valued Networks Syline Community Search in Multi-value Networs Rong-Hua Li Beijing Institute of Technology Beijing, China lironghuascut@gmail.com Jeffrey Xu Yu Chinese University of Hong Kong Hong Kong, China yu@se.cuh.eu.h

More information

Online Appendix to: Generalizing Database Forensics

Online Appendix to: Generalizing Database Forensics Online Appenix to: Generalizing Database Forensics KYRIACOS E. PAVLOU an RICHARD T. SNODGRASS, University of Arizona This appenix presents a step-by-step iscussion of the forensic analysis protocol that

More information

d 3 d 4 d d d d d d d d d d d 1 d d d d d d

d 3 d 4 d d d d d d d d d d d 1 d d d d d d Proceeings of the IASTED International Conference Software Engineering an Applications (SEA') October 6-, 1, Scottsale, Arizona, USA AN OBJECT-ORIENTED APPROACH FOR MANAGING A NETWORK OF DATABASES Shu-Ching

More information

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 17, No 3 Sofia 017 Print ISSN: 1311-970; Online ISSN: 1314-4081 DOI: 10.1515/cait-017-0030 Particle Swarm Optimization Base

More information

Study of Network Optimization Method Based on ACL

Study of Network Optimization Method Based on ACL Available online at www.scienceirect.com Proceia Engineering 5 (20) 3959 3963 Avance in Control Engineering an Information Science Stuy of Network Optimization Metho Base on ACL Liu Zhian * Department

More information

Adjacency Matrix Based Full-Text Indexing Models

Adjacency Matrix Based Full-Text Indexing Models 1000-9825/2002/13(10)1933-10 2002 Journal of Software Vol.13, No.10 Ajacency Matrix Base Full-Text Inexing Moels ZHOU Shui-geng 1, HU Yun-fa 2, GUAN Ji-hong 3 1 (Department of Computer Science an Engineering,

More information

Solution Representation for Job Shop Scheduling Problems in Ant Colony Optimisation

Solution Representation for Job Shop Scheduling Problems in Ant Colony Optimisation Solution Representation for Job Shop Scheuling Problems in Ant Colony Optimisation James Montgomery, Carole Faya 2, an Sana Petrovic 2 Faculty of Information & Communication Technologies, Swinburne University

More information

Data Mining: Clustering

Data Mining: Clustering Bi-Clustering COMP 790-90 Seminar Spring 011 Data Mining: Clustering k t 1 K-means clustering minimizes Where ist ( x, c i t i c t ) ist ( x m j 1 ( x ij i, c c t ) tj ) Clustering by Pattern Similarity

More information

SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH

SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH Galen H Sasaki Dept Elec Engg, U Hawaii 2540 Dole Street Honolul HI 96822 USA Ching-Fong Su Fuitsu Laboratories of America 595 Lawrence Expressway

More information

Preamble. Singly linked lists. Collaboration policy and academic integrity. Getting help

Preamble. Singly linked lists. Collaboration policy and academic integrity. Getting help CS2110 Spring 2016 Assignment A. Linke Lists Due on the CMS by: See the CMS 1 Preamble Linke Lists This assignment begins our iscussions of structures. In this assignment, you will implement a structure

More information

A shortest path algorithm in multimodal networks: a case study with time varying costs

A shortest path algorithm in multimodal networks: a case study with time varying costs A shortest path algorithm in multimoal networks: a case stuy with time varying costs Daniela Ambrosino*, Anna Sciomachen* * Department of Economics an Quantitative Methos (DIEM), University of Genoa Via

More information

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Generalized Edge Coloring for Channel Assignment in Wireless Networks TR-IIS-05-021 Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu, Pangfeng Liu, Da-Wei Wang, Jan-Jan Wu December 2005 Technical Report No. TR-IIS-05-021 http://www.iis.sinica.eu.tw/lib/techreport/tr2005/tr05.html

More information

Loop Scheduling and Partitions for Hiding Memory Latencies

Loop Scheduling and Partitions for Hiding Memory Latencies Loop Scheuling an Partitions for Hiing Memory Latencies Fei Chen Ewin Hsing-Mean Sha Dept. of Computer Science an Engineering University of Notre Dame Notre Dame, IN 46556 Email: fchen,esha @cse.n.eu Tel:

More information

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation DEIM Forum 2018 I4-4 Abstract Ranom Clustering for Multiple Sampling Units to Spee Up Run-time Sample Generation uzuru OKAJIMA an Koichi MARUAMA NEC Solution Innovators, Lt. 1-18-7 Shinkiba, Koto-ku, Tokyo,

More information

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama an Hayato Ohwaa Faculty of Sci. an Tech. Tokyo University of Science, 2641 Yamazaki, Noa-shi, CHIBA, 278-8510, Japan hiroyuki@rs.noa.tus.ac.jp,

More information

Software Reliability Modeling and Cost Estimation Incorporating Testing-Effort and Efficiency

Software Reliability Modeling and Cost Estimation Incorporating Testing-Effort and Efficiency Software Reliability Moeling an Cost Estimation Incorporating esting-effort an Efficiency Chin-Yu Huang, Jung-Hua Lo, Sy-Yen Kuo, an Michael R. Lyu -+ Department of Electrical Engineering Computer Science

More information

Threshold Based Data Aggregation Algorithm To Detect Rainfall Induced Landslides

Threshold Based Data Aggregation Algorithm To Detect Rainfall Induced Landslides Threshol Base Data Aggregation Algorithm To Detect Rainfall Inuce Lanslies Maneesha V. Ramesh P. V. Ushakumari Department of Computer Science Department of Mathematics Amrita School of Engineering Amrita

More information

Comparison of Methods for Increasing the Performance of a DUA Computation

Comparison of Methods for Increasing the Performance of a DUA Computation Comparison of Methos for Increasing the Performance of a DUA Computation Michael Behrisch, Daniel Krajzewicz, Peter Wagner an Yun-Pang Wang Institute of Transportation Systems, German Aerospace Center,

More information

CS350 - Exam 4 (100 Points)

CS350 - Exam 4 (100 Points) Fall 0 Name CS0 - Exam (00 Points).(0 points) Re-Black Trees For the Re-Black tree below, inicate where insert() woul initially occur before rebalancing/recoloring. Sketch the tree at ALL intermeiate steps

More information

Image Segmentation using K-means clustering and Thresholding

Image Segmentation using K-means clustering and Thresholding Image Segmentation using Kmeans clustering an Thresholing Preeti Panwar 1, Girhar Gopal 2, Rakesh Kumar 3 1M.Tech Stuent, Department of Computer Science & Applications, Kurukshetra University, Kurukshetra,

More information

EFFICIENT ON-LINE TESTING METHOD FOR A FLOATING-POINT ADDER

EFFICIENT ON-LINE TESTING METHOD FOR A FLOATING-POINT ADDER FFICINT ON-LIN TSTING MTHOD FOR A FLOATING-POINT ADDR A. Droz, M. Lobachev Department of Computer Systems, Oessa State Polytechnic University, Oessa, Ukraine Droz@ukr.net, Lobachev@ukr.net Abstract In

More information

Non-homogeneous Generalization in Privacy Preserving Data Publishing

Non-homogeneous Generalization in Privacy Preserving Data Publishing Non-homogeneous Generalization in Privacy Preserving Data Publishing W. K. Wong, Nios Mamoulis an Davi W. Cheung Department of Computer Science, The University of Hong Kong Pofulam Roa, Hong Kong {wwong2,nios,cheung}@cs.hu.h

More information

filtering LETTER An Improved Neighbor Selection Algorithm in Collaborative Taek-Hun KIM a), Student Member and Sung-Bong YANG b), Nonmember

filtering LETTER An Improved Neighbor Selection Algorithm in Collaborative Taek-Hun KIM a), Student Member and Sung-Bong YANG b), Nonmember 107 IEICE TRANS INF & SYST, VOLE88 D, NO5 MAY 005 LETTER An Improve Neighbor Selection Algorithm in Collaborative Filtering Taek-Hun KIM a), Stuent Member an Sung-Bong YANG b), Nonmember SUMMARY Nowaays,

More information

Divide-and-Conquer Algorithms

Divide-and-Conquer Algorithms Supplment to A Practical Guie to Data Structures an Algorithms Using Java Divie-an-Conquer Algorithms Sally A Golman an Kenneth J Golman Hanout Divie-an-conquer algorithms use the following three phases:

More information

THE BAYESIAN RECEIVER OPERATING CHARACTERISTIC CURVE AN EFFECTIVE APPROACH TO EVALUATE THE IDS PERFORMANCE

THE BAYESIAN RECEIVER OPERATING CHARACTERISTIC CURVE AN EFFECTIVE APPROACH TO EVALUATE THE IDS PERFORMANCE БСУ Международна конференция - 2 THE BAYESIAN RECEIVER OPERATING CHARACTERISTIC CURVE AN EFFECTIVE APPROACH TO EVALUATE THE IDS PERFORMANCE Evgeniya Nikolova, Veselina Jecheva Burgas Free University Abstract:

More information

A New Search Algorithm for Solving Symmetric Traveling Salesman Problem Based on Gravity

A New Search Algorithm for Solving Symmetric Traveling Salesman Problem Based on Gravity Worl Applie Sciences Journal 16 (10): 1387-1392, 2012 ISSN 1818-4952 IDOSI Publications, 2012 A New Search Algorithm for Solving Symmetric Traveling Salesman Problem Base on Gravity Aliasghar Rahmani Hosseinabai,

More information

Calculation on diffraction aperture of cube corner retroreflector

Calculation on diffraction aperture of cube corner retroreflector November 10, 008 / Vol., No. 11 / CHINESE OPTICS LETTERS 8 Calculation on iffraction aperture of cube corner retroreflector Song Li (Ó Ø, Bei Tang (», an Hui Zhou ( ï School of Electronic Information,

More information

Positions, Iterators, Tree Structures and Tree Traversals. Readings - Chapter 7.3, 7.4 and 8

Positions, Iterators, Tree Structures and Tree Traversals. Readings - Chapter 7.3, 7.4 and 8 Positions, Iterators, Tree Structures an Reaings - Chapter 7.3, 7.4 an 8 1 Positional Lists q q q q A position acts as a marker or token within the broaer positional list. A position p is unaffecte by

More information

Tight Wavelet Frame Decomposition and Its Application in Image Processing

Tight Wavelet Frame Decomposition and Its Application in Image Processing ITB J. Sci. Vol. 40 A, No., 008, 151-165 151 Tight Wavelet Frame Decomposition an Its Application in Image Processing Mahmu Yunus 1, & Henra Gunawan 1 1 Analysis an Geometry Group, FMIPA ITB, Banung Department

More information

The Reconstruction of Graphs. Dhananjay P. Mehendale Sir Parashurambhau College, Tilak Road, Pune , India. Abstract

The Reconstruction of Graphs. Dhananjay P. Mehendale Sir Parashurambhau College, Tilak Road, Pune , India. Abstract The Reconstruction of Graphs Dhananay P. Mehenale Sir Parashurambhau College, Tila Roa, Pune-4030, Inia. Abstract In this paper we iscuss reconstruction problems for graphs. We evelop some new ieas lie

More information

Learning convex bodies is hard

Learning convex bodies is hard Learning convex boies is har Navin Goyal Microsoft Research Inia navingo@microsoftcom Luis Raemacher Georgia Tech lraemac@ccgatecheu Abstract We show that learning a convex boy in R, given ranom samples

More information

Cloud Search Service Product Introduction. Issue 01 Date HUAWEI TECHNOLOGIES CO., LTD.

Cloud Search Service Product Introduction. Issue 01 Date HUAWEI TECHNOLOGIES CO., LTD. 1.3.15 Issue 01 Date 2018-11-21 HUAWEI TECHNOLOGIES CO., LTD. Copyright Huawei Technologies Co., Lt. 2019. All rights reserve. No part of this ocument may be reprouce or transmitte in any form or by any

More information

INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM

INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM G.Amlu #1 S.Chandralekha #2 and PraveenKumar *1 # B.Tech, Information Technology, Anand Institute of Higher Technology, Chennai, India

More information

An Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources

An Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources An Algorithm for Builing an Enterprise Network Topology Using Wiesprea Data Sources Anton Anreev, Iurii Bogoiavlenskii Petrozavosk State University Petrozavosk, Russia {anreev, ybgv}@cs.petrsu.ru Abstract

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

Reconstructing the Nonlinear Filter Function of LILI-128 Stream Cipher Based on Complexity

Reconstructing the Nonlinear Filter Function of LILI-128 Stream Cipher Based on Complexity Reconstructing the Nonlinear Filter Function of LILI-128 Stream Cipher Base on Complexity Xiangao Huang 1 Wei Huang 2 Xiaozhou Liu 3 Chao Wang 4 Zhu jing Wang 5 Tao Wang 1 1 College of Engineering, Shantou

More information

Indexing the Edges A simple and yet efficient approach to high-dimensional indexing

Indexing the Edges A simple and yet efficient approach to high-dimensional indexing Inexing the Eges A simple an yet efficient approach to high-imensional inexing Beng Chin Ooi Kian-Lee Tan Cui Yu Stephane Bressan Department of Computer Science National University of Singapore 3 Science

More information

Coupling the User Interfaces of a Multiuser Program

Coupling the User Interfaces of a Multiuser Program Coupling the User Interfaces of a Multiuser Program PRASUN DEWAN University of North Carolina at Chapel Hill RAJIV CHOUDHARY Intel Corporation We have evelope a new moel for coupling the user-interfaces

More information

Principles of B-trees

Principles of B-trees CSE465, Fall 2009 February 25 1 Principles of B-trees Noes an binary search Anoe u has size size(u), keys k 1,..., k size(u) 1 chilren c 1,...,c size(u). Binary search property: for i = 1,..., size(u)

More information

NEW METHOD FOR FINDING A REFERENCE POINT IN FINGERPRINT IMAGES WITH THE USE OF THE IPAN99 ALGORITHM 1. INTRODUCTION 2.

NEW METHOD FOR FINDING A REFERENCE POINT IN FINGERPRINT IMAGES WITH THE USE OF THE IPAN99 ALGORITHM 1. INTRODUCTION 2. JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 13/009, ISSN 164-6037 Krzysztof WRÓBEL, Rafał DOROZ * fingerprint, reference point, IPAN99 NEW METHOD FOR FINDING A REFERENCE POINT IN FINGERPRINT IMAGES

More information

Modifying ROC Curves to Incorporate Predicted Probabilities

Modifying ROC Curves to Incorporate Predicted Probabilities Moifying ROC Curves to Incorporate Preicte Probabilities Cèsar Ferri DSIC, Universitat Politècnica e València Peter Flach Department of Computer Science, University of Bristol José Hernánez-Orallo DSIC,

More information

On the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems

On the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems On the Role of Multiply Sectione Bayesian Networks to Cooperative Multiagent Systems Y. Xiang University of Guelph, Canaa, yxiang@cis.uoguelph.ca V. Lesser University of Massachusetts at Amherst, USA,

More information

Dual Arm Robot Research Report

Dual Arm Robot Research Report Dual Arm Robot Research Report Analytical Inverse Kinematics Solution for Moularize Dual-Arm Robot With offset at shouler an wrist Motivation an Abstract Generally, an inustrial manipulator such as PUMA

More information

Additional Divide and Conquer Algorithms. Skipping from chapter 4: Quicksort Binary Search Binary Tree Traversal Matrix Multiplication

Additional Divide and Conquer Algorithms. Skipping from chapter 4: Quicksort Binary Search Binary Tree Traversal Matrix Multiplication Aitional Divie an Conquer Algorithms Skipping from chapter 4: Quicksort Binary Search Binary Tree Traversal Matrix Multiplication Divie an Conquer Closest Pair Let s revisit the closest pair problem. Last

More information

APPLYING GENETIC ALGORITHM IN QUERY IMPROVEMENT PROBLEM. Abdelmgeid A. Aly

APPLYING GENETIC ALGORITHM IN QUERY IMPROVEMENT PROBLEM. Abdelmgeid A. Aly International Journal "Information Technologies an Knowlege" Vol. / 2007 309 [Project MINERVAEUROPE] Project MINERVAEUROPE: Ministerial Network for Valorising Activities in igitalisation -

More information

Throughput Characterization of Node-based Scheduling in Multihop Wireless Networks: A Novel Application of the Gallai-Edmonds Structure Theorem

Throughput Characterization of Node-based Scheduling in Multihop Wireless Networks: A Novel Application of the Gallai-Edmonds Structure Theorem Throughput Characterization of Noe-base Scheuling in Multihop Wireless Networks: A Novel Application of the Gallai-Emons Structure Theorem Bo Ji an Yu Sang Dept. of Computer an Information Sciences Temple

More information

Kinematic Analysis of a Family of 3R Manipulators

Kinematic Analysis of a Family of 3R Manipulators Kinematic Analysis of a Family of R Manipulators Maher Baili, Philippe Wenger an Damien Chablat Institut e Recherche en Communications et Cybernétique e Nantes, UMR C.N.R.S. 6597 1, rue e la Noë, BP 92101,

More information

Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks

Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks Queueing Moel an Optimization of Packet Dropping in Real-Time Wireless Sensor Networks Marc Aoun, Antonios Argyriou, Philips Research, Einhoven, 66AE, The Netherlans Department of Computer an Communication

More information

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Generalized Edge Coloring for Channel Assignment in Wireless Networks Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu Institute of Information Science Acaemia Sinica Taipei, Taiwan Da-wei Wang Jan-Jan Wu Institute of Information Science

More information

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey G. Shivaprasad, N. V. Subbareddy and U. Dinesh Acharya

More information

arxiv: v2 [math.co] 5 Jun 2018

arxiv: v2 [math.co] 5 Jun 2018 Some useful lemmas on the ege Szege inex arxiv:1805.06578v [math.co] 5 Jun 018 Shengjie He 1 1. Department of Mathematics, Beijing Jiaotong University, Beijing, 100044, China Abstract The ege Szege inex

More information

CS 106 Winter 2016 Craig S. Kaplan. Module 01 Processing Recap. Topics

CS 106 Winter 2016 Craig S. Kaplan. Module 01 Processing Recap. Topics CS 106 Winter 2016 Craig S. Kaplan Moule 01 Processing Recap Topics The basic parts of speech in a Processing program Scope Review of syntax for classes an objects Reaings Your CS 105 notes Learning Processing,

More information

Considering bounds for approximation of 2 M to 3 N

Considering bounds for approximation of 2 M to 3 N Consiering bouns for approximation of to (version. Abstract: Estimating bouns of best approximations of to is iscusse. In the first part I evelop a powerseries, which shoul give practicable limits for

More information

EFFICIENT STEREO MATCHING BASED ON A NEW CONFIDENCE METRIC. Won-Hee Lee, Yumi Kim, and Jong Beom Ra

EFFICIENT STEREO MATCHING BASED ON A NEW CONFIDENCE METRIC. Won-Hee Lee, Yumi Kim, and Jong Beom Ra th European Signal Processing Conference (EUSIPCO ) Bucharest, omania, August 7-3, EFFICIENT STEEO MATCHING BASED ON A NEW CONFIDENCE METIC Won-Hee Lee, Yumi Kim, an Jong Beom a Department of Electrical

More information

Feature Extraction and Rule Classification Algorithm of Digital Mammography based on Rough Set Theory

Feature Extraction and Rule Classification Algorithm of Digital Mammography based on Rough Set Theory Feature Extraction an Rule Classification Algorithm of Digital Mammography base on Rough Set Theory Aboul Ella Hassanien Jafar M. H. Ali. Kuwait University, Faculty of Aministrative Science, Quantitative

More information

Improving Spatial Reuse of IEEE Based Ad Hoc Networks

Improving Spatial Reuse of IEEE Based Ad Hoc Networks mproving Spatial Reuse of EEE 82.11 Base A Hoc Networks Fengji Ye, Su Yi an Biplab Sikar ECSE Department, Rensselaer Polytechnic nstitute Troy, NY 1218 Abstract n this paper, we evaluate an suggest methos

More information

Bends, Jogs, And Wiggles for Railroad Tracks and Vehicle Guide Ways

Bends, Jogs, And Wiggles for Railroad Tracks and Vehicle Guide Ways Ben, Jogs, An Wiggles for Railroa Tracks an Vehicle Guie Ways Louis T. Klauer Jr., PhD, PE. Work Soft 833 Galer Dr. Newtown Square, PA 19073 lklauer@wsof.com Preprint, June 4, 00 Copyright 00 by Louis

More information

Lab work #8. Congestion control

Lab work #8. Congestion control TEORÍA DE REDES DE TELECOMUNICACIONES Grao en Ingeniería Telemática Grao en Ingeniería en Sistemas e Telecomunicación Curso 2015-2016 Lab work #8. Congestion control (1 session) Author: Pablo Pavón Mariño

More information

A FUZZY FRAMEWORK FOR SEGMENTATION, FEATURE MATCHING AND RETRIEVAL OF BRAIN MR IMAGES

A FUZZY FRAMEWORK FOR SEGMENTATION, FEATURE MATCHING AND RETRIEVAL OF BRAIN MR IMAGES A FUZZY FRAMEWORK FOR SEGMENTATION, FEATURE MATCHING AND RETRIEVAL OF BRAIN MR IMAGES Archana.S 1 an Srihar.S 2 1 Department of Information Science an Technology, College of Engineering, Guiny archana.santhira@gmail.com

More information

Rough Set Approach for Classification of Breast Cancer Mammogram Images

Rough Set Approach for Classification of Breast Cancer Mammogram Images Rough Set Approach for Classification of Breast Cancer Mammogram Images Aboul Ella Hassanien Jafar M. H. Ali. Kuwait University, Faculty of Aministrative Science, Quantitative Methos an Information Systems

More information

This paper proposes: Mining Frequent Patterns without Candidate Generation

This paper proposes: Mining Frequent Patterns without Candidate Generation Mining Frequent Patterns without Candidate Generation a paper by Jiawei Han, Jian Pei and Yiwen Yin School of Computing Science Simon Fraser University Presented by Maria Cutumisu Department of Computing

More information

Chapter 9 Memory Management

Chapter 9 Memory Management Contents 1. Introuction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threas 6. CPU Scheuling 7. Process Synchronization 8. Dealocks 9. Memory Management 10.Virtual Memory

More information

Computer Organization

Computer Organization Computer Organization Douglas Comer Computer Science Department Purue University 250 N. University Street West Lafayette, IN 47907-2066 http://www.cs.purue.eu/people/comer Copyright 2006. All rights reserve.

More information

A Novel Density Based Clustering Algorithm by Incorporating Mahalanobis Distance

A Novel Density Based Clustering Algorithm by Incorporating Mahalanobis Distance Receive: November 20, 2017 121 A Novel Density Base Clustering Algorithm by Incorporating Mahalanobis Distance Margaret Sangeetha 1 * Velumani Paikkaramu 2 Rajakumar Thankappan Chellan 3 1 Department of

More information

Interior Permanent Magnet Synchronous Motor (IPMSM) Adaptive Genetic Parameter Estimation

Interior Permanent Magnet Synchronous Motor (IPMSM) Adaptive Genetic Parameter Estimation Interior Permanent Magnet Synchronous Motor (IPMSM) Aaptive Genetic Parameter Estimation Java Rezaie, Mehi Gholami, Reza Firouzi, Tohi Alizaeh, Karim Salashoor Abstract - Interior permanent magnet synchronous

More information

Mining Sensor Streams for Discovering Human Activity Patterns Over Time

Mining Sensor Streams for Discovering Human Activity Patterns Over Time Mining Sensor Streams for Discovering Human Activity Patterns Over Time Parisa Rashii CS Department ashington State University Pullman, US mail: prashii@wsu.eu Diane J. Cook CS Department ashington State

More information

MORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks

MORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks : a Movement-Base Routing Algorithm for Vehicle A Hoc Networks Fabrizio Granelli, Senior Member, Giulia Boato, Member, an Dzmitry Kliazovich, Stuent Member Abstract Recent interest in car-to-car communications

More information

Algebraic transformations of Gauss hypergeometric functions

Algebraic transformations of Gauss hypergeometric functions Algebraic transformations of Gauss hypergeometric functions Raimunas Viūnas Faculty of Mathematics, Kobe University Abstract This article gives a classification scheme of algebraic transformations of Gauss

More information

Overlap Interval Partition Join

Overlap Interval Partition Join Overlap Interval Partition Join Anton Dignös Department of Computer Science University of Zürich, Switzerlan aignoes@ifi.uzh.ch Michael H. Böhlen Department of Computer Science University of Zürich, Switzerlan

More information

Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means

Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means Classifying Facial Expression with Raial Basis Function Networks, using Graient Descent an K-means Neil Allrin Department of Computer Science University of California, San Diego La Jolla, CA 9237 nallrin@cs.ucs.eu

More information

Chapter 4: Mining Frequent Patterns, Associations and Correlations

Chapter 4: Mining Frequent Patterns, Associations and Correlations Chapter 4: Mining Frequent Patterns, Associations and Correlations 4.1 Basic Concepts 4.2 Frequent Itemset Mining Methods 4.3 Which Patterns Are Interesting? Pattern Evaluation Methods 4.4 Summary Frequent

More information

A PSO Optimized Layered Approach for Parametric Clustering on Weather Dataset

A PSO Optimized Layered Approach for Parametric Clustering on Weather Dataset Vol.3, Issue.1, Jan-Feb. 013 pp-504-508 ISSN: 49-6645 A PSO Optimize Layere Approach for Parametric Clustering on Weather Dataset Shikha Verma, 1 Kiran Jyoti 1 Stuent, Guru Nanak Dev Engineering College

More information

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method Southern Cross University epublications@scu 23r Australasian Conference on the Mechanics of Structures an Materials 214 Transient analysis of wave propagation in 3D soil by using the scale bounary finite

More information

A Classification of 3R Orthogonal Manipulators by the Topology of their Workspace

A Classification of 3R Orthogonal Manipulators by the Topology of their Workspace A Classification of R Orthogonal Manipulators by the Topology of their Workspace Maher aili, Philippe Wenger an Damien Chablat Institut e Recherche en Communications et Cybernétique e Nantes, UMR C.N.R.S.

More information

Research and Application of E-Commerce Recommendation System Based on Association Rules Algorithm

Research and Application of E-Commerce Recommendation System Based on Association Rules Algorithm Research and Application of E-Commerce Recommendation System Based on Association Rules Algorithm Qingting Zhu 1*, Haifeng Lu 2 and Xinliang Xu 3 1 School of Computer Science and Software Engineering,

More information

A Graph-Based Approach for Mining Closed Large Itemsets

A Graph-Based Approach for Mining Closed Large Itemsets A Graph-Based Approach for Mining Closed Large Itemsets Lee-Wen Huang Dept. of Computer Science and Engineering National Sun Yat-Sen University huanglw@gmail.com Ye-In Chang Dept. of Computer Science and

More information

AnyTraffic Labeled Routing

AnyTraffic Labeled Routing AnyTraffic Labele Routing Dimitri Papaimitriou 1, Pero Peroso 2, Davie Careglio 2 1 Alcatel-Lucent Bell, Antwerp, Belgium Email: imitri.papaimitriou@alcatel-lucent.com 2 Universitat Politècnica e Catalunya,

More information

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH International Journal of Information Technology and Knowledge Management January-June 2011, Volume 4, No. 1, pp. 27-32 DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY)

More information

Evolutionary Optimisation Methods for Template Based Image Registration

Evolutionary Optimisation Methods for Template Based Image Registration Evolutionary Optimisation Methos for Template Base Image Registration Lukasz A Machowski, Tshilizi Marwala School of Electrical an Information Engineering University of Witwatersran, Johannesburg, South

More information

Table-based division by small integer constants

Table-based division by small integer constants Table-base ivision by small integer constants Florent e Dinechin, Laurent-Stéphane Diier LIP, Université e Lyon (ENS-Lyon/CNRS/INRIA/UCBL) 46, allée Italie, 69364 Lyon Ceex 07 Florent.e.Dinechin@ens-lyon.fr

More information

Fast Window Based Stereo Matching for 3D Scene Reconstruction

Fast Window Based Stereo Matching for 3D Scene Reconstruction The International Arab Journal of Information Technology, Vol. 0, No. 3, May 203 209 Fast Winow Base Stereo Matching for 3D Scene Reconstruction Mohamma Mozammel Chowhury an Mohamma AL-Amin Bhuiyan Department

More information

Shift-map Image Registration

Shift-map Image Registration Shift-map Image Registration Svärm, Linus; Stranmark, Petter Unpublishe: 2010-01-01 Link to publication Citation for publishe version (APA): Svärm, L., & Stranmark, P. (2010). Shift-map Image Registration.

More information

Research Article REALFLOW: Reliable Real-Time Flooding-Based Routing Protocol for Industrial Wireless Sensor Networks

Research Article REALFLOW: Reliable Real-Time Flooding-Based Routing Protocol for Industrial Wireless Sensor Networks Hinawi Publishing Corporation International Journal of Distribute Sensor Networks Volume 2014, Article ID 936379, 17 pages http://x.oi.org/10.1155/2014/936379 Research Article REALFLOW: Reliable Real-Time

More information

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks 1 Backpressure-base Packet-by-Packet Aaptive Routing in Communication Networks Eleftheria Athanasopoulou, Loc Bui, Tianxiong Ji, R. Srikant, an Alexaner Stolyar Abstract Backpressure-base aaptive routing

More information

Improving Performance of Sparse Matrix-Vector Multiplication

Improving Performance of Sparse Matrix-Vector Multiplication Improving Performance of Sparse Matrix-Vector Multiplication Ali Pınar Michael T. Heath Department of Computer Science an Center of Simulation of Avance Rockets University of Illinois at Urbana-Champaign

More information

On the Placement of Internet Taps in Wireless Neighborhood Networks

On the Placement of Internet Taps in Wireless Neighborhood Networks 1 On the Placement of Internet Taps in Wireless Neighborhoo Networks Lili Qiu, Ranveer Chanra, Kamal Jain, Mohamma Mahian Abstract Recently there has emerge a novel application of wireless technology that

More information

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control Almost Disjunct Coes in Large Scale Multihop Wireless Network Meia Access Control D. Charles Engelhart Anan Sivasubramaniam Penn. State University University Park PA 682 engelhar,anan @cse.psu.eu Abstract

More information

Fast Fractal Image Compression using PSO Based Optimization Techniques

Fast Fractal Image Compression using PSO Based Optimization Techniques Fast Fractal Compression using PSO Base Optimization Techniques A.Krishnamoorthy Visiting faculty Department Of ECE University College of Engineering panruti rishpci89@gmail.com S.Buvaneswari Visiting

More information

An Algorithm for Mining Large Sequences in Databases

An Algorithm for Mining Large Sequences in Databases 149 An Algorithm for Mining Large Sequences in Databases Bharat Bhasker, Indian Institute of Management, Lucknow, India, bhasker@iiml.ac.in ABSTRACT Frequent sequence mining is a fundamental and essential

More information

Multilevel Linear Dimensionality Reduction using Hypergraphs for Data Analysis

Multilevel Linear Dimensionality Reduction using Hypergraphs for Data Analysis Multilevel Linear Dimensionality Reuction using Hypergraphs for Data Analysis Haw-ren Fang Department of Computer Science an Engineering University of Minnesota; Minneapolis, MN 55455 hrfang@csumneu ABSTRACT

More information

Data Mining Part 3. Associations Rules

Data Mining Part 3. Associations Rules Data Mining Part 3. Associations Rules 3.2 Efficient Frequent Itemset Mining Methods Fall 2009 Instructor: Dr. Masoud Yaghini Outline Apriori Algorithm Generating Association Rules from Frequent Itemsets

More information

New Geometric Interpretation and Analytic Solution for Quadrilateral Reconstruction

New Geometric Interpretation and Analytic Solution for Quadrilateral Reconstruction New Geometric Interpretation an Analytic Solution for uarilateral Reconstruction Joo-Haeng Lee Convergence Technology Research Lab ETRI Daejeon, 305 777, KOREA Abstract A new geometric framework, calle

More information

Problem Paper Atoms Tree. atoms.pas. atoms.cpp. atoms.c. atoms.java. Time limit per test 1 second 1 second 2 seconds. Number of tests

Problem Paper Atoms Tree. atoms.pas. atoms.cpp. atoms.c. atoms.java. Time limit per test 1 second 1 second 2 seconds. Number of tests ! " # %$ & Overview Problem Paper Atoms Tree Shen Tian Harry Wiggins Carl Hultquist Program name paper.exe atoms.exe tree.exe Source name paper.pas atoms.pas tree.pas paper.cpp atoms.cpp tree.cpp paper.c

More information

Adaptive Load Balancing based on IP Fast Reroute to Avoid Congestion Hot-spots

Adaptive Load Balancing based on IP Fast Reroute to Avoid Congestion Hot-spots Aaptive Loa Balancing base on IP Fast Reroute to Avoi Congestion Hot-spots Masaki Hara an Takuya Yoshihiro Faculty of Systems Engineering, Wakayama University 930 Sakaeani, Wakayama, 640-8510, Japan Email:

More information