UNCORRECTED PROOF ARTICLE IN PRESS. 1 Expert Systems with Applications. 2 Mining knowledge from object-oriented instances

Size: px

Start display at page:

Download "UNCORRECTED PROOF ARTICLE IN PRESS. 1 Expert Systems with Applications. 2 Mining knowledge from object-oriented instances"

Gwen O’Connor’
5 years ago
Views:

1 1 Expert Systems with Applications Expert Systems with Applications xxx (26) xxx xxx wwwelseviercom/locate/eswa 2 Mining knowledge from object-oriented instances 3 Cheng-Ming Huang a, Tzung-Pei Hong b, *, Shi-Jinn Horng c,d 4 a Department of Electrical Engineering, National Taiwan University of Science and Technology, Taipei 16, Taiwan, ROC 5 b Department of Electrical Engineering, National University of Kaohsiung, Kaohsiung 811, Taiwan, ROC 6 c Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, 7 Taipei 16, Taiwan, ROC 8 d Department of Electronic Engineering, National United University, Miao-li 36, Taiwan, ROC 9 1 Abstract 11 Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes 12 Recently, the object concept has been very popular and used in a variety of applications, especially for complex data description This 13 paper thus proposes a new data-mining algorithm for extracting interesting knowledge from transactions stored as object data Each item 14 itself is thought of as a class, and each item purchased in a transaction is thought of as an instance Instances with the same class (item 15 name) may have different attribute values since they may appear in different transactions The proposed algorithm is divided into two 16 main phases, one for intra-object association rules, and the other for inter-object association rules Two apriori-like procedures are 17 adopted to find the two kinds of rules The first phase finds out the association relation within the same kind of objects Each large item- 18 set found in this phase can be thought of as a composite item used in phase 2 The second phase then finds the relationship among dif- 19 ferent kinds of objects Both the intra-object and inter-object association rules can thus be easily derived by the proposed algorithm at the 2 same time Experiments are also made to show the effect of the proposed algorithm 21 Ó 26 Elsevier Ltd All rights reserved 22 Keywords: Association rule; Data mining; Object transaction; Object-oriented mining Introduction 25 Knowledge discovery in databases (KDD) has become a 26 process of considerable interest in recent years as the 27 amounts of data in many databases have grown tremen- 28 dously large KDD means the application of nontrivial 29 procedures for identifying effective, coherent, potentially 3 useful, and previously unknown patterns in large databases 31 (Frawley, Piatetsky-Shapiro, & Matheus, 1991) The KDD 32 process generally consists of the following three phases 33 (Famili, Shen, Weber, & Simoudis, 1997; Mannila, 1997) 34 (1) Pre-processing: This consists of all the actions taken 35 before the actual data analysis process starts (Famili et al, 1997) Famili et al think that it may be performed on the data for the following reasons: solving data problems that may prevent us from performing any type of analysis on the data, understanding the nature of the data, performing a more meaningful data analysis, and extracting more meaningful knowledge from a given set of data (Famili et al, 1997) (2) Data mining: This involves applying specific algorithms for extracting patterns or rules from data sets in a particular representation (3) Post-processing: This translates discovered patterns into forms acceptable for human beings It may also make possible visualization of extracted patterns * Corresponding author addresses: apcmhl@ms8hinetnet (C-M Huang), tphong@nukedutw (T-P Hong), horng@mouseeentustedutw (S-J Horng) Due to the importance of data mining to KDD, many researchers in database and machine learning fields are primarily interested in this new research topic because it offers opportunities to discover useful information and important /$ - see front matter Ó 26 Elsevier Ltd All rights reserved doi:1116/jeswa26529

2 2 C-M Huang et al / Expert Systems with Applications xxx (26) xxx xxx 54 relevant patterns in large databases, thus helping decision- 55 makers easily analyze the data and make good decisions 56 regarding the domains concerned 57 Recently, the object concept has been very popular and 58 used in different applications such as databases, software 59 engineering, knowledge representation (Clair, Liu, & Pissi- 6 nou, 1998; Clark & Niblett, 1989), geographic information 61 systems and even computer architecture (Kim, 199; Kim- 62 ura, 1995) An object represents an instance with several 63 related attribute values and methods integrated together 64 In the past, data mining is most commonly used in 65 attempts to induce association rules from transaction data 66 In this paper, we will try to generalize it and propose a min- 67 ing algorithm to derive association rules from object data 68 The proposed algorithm is divided into two main phases, 69 one for intra-object association rules, and the other for 7 inter-object association rules Two apriori-like (Agrawal 71 & Srikant, 1994) procedures are adopted to find the two 72 kinds of rules 73 The remaining parts of this paper are organized as fol- 74 lows Related mining algorithms are reviewed in Section 75 2 The object-oriented concept is introduced in Section 3 76 The proposed data-mining algorithm for object-oriented 77 transaction data is described in Section 4 An example to 78 illustrate the proposed algorithm is given in Section 5 79 Experimental results are described in Section 6 Conclusion 8 and future work are given in Section Review of related mining approaches 82 The goal of data mining is to discover important asso- 83 ciations among items such that the presence of some items 84 in a transaction will imply the presence of some other 85 items To achieve this purpose, Agrawal and his co-work- 86 ers proposed several mining algorithms based on the con- 87 cept of large itemsets to find association rules in 88 transaction data (Agrawal & Srikant, 1994; Agrawal & 89 Srikant, 1995; Agrawal, Imielinksi, & Swami, 1993a; Agra- 9 wal, Imielinksi, & Swami, 1993b; Srikant, Vu, & Agrawal, ) They divided the mining process into two phases In 92 the first phase, candidate itemsets were generated and 93 counted by scanning the transaction data If the number 94 of an itemset appearing in the transactions was larger than 95 a predefined threshold value (called minimum support), the 96 itemset was considered a large itemset Itemsets containing 97 only one item were first processed Large itemsets contain- 98 ing only single items were then combined to form candi- 99 date itemsets containing two items This process was 1 repeated until all large itemsets had been found In the sec- 11 ond phase, association rules were induced from the large 12 itemsets found in the first phase All possible association 13 combinations for each large itemset were formed, and 14 those with calculated confidence values larger than a pre- 15 defined threshold (called minimum confidence) were out- 16 put as association rules 17 In addition to proposing methods for mining association 18 rules from transactions of binary values, Agrawal et al also proposed a method (Srikant & Agrawal, 1996) for mining association rules from those with quantitative and categorical attributes Their proposed method first determines the number of partitions for each quantitative attribute, and then maps all possible values of each attribute into a set of consecutive integers It then finds large itemsets whose support values are greater than the user-specified minimum support levels These large itemsets are then processed to generate association rules, and rules of interest to users are output Agrawal and Srikant also proposed the AprioriAll mining approach to mine sequential patterns from a set of transactions (Agrawal & Srikant, 1995) Five phases are included in this approach In the first phase, the transactions are sorted first by customer ID as the major key and then by transaction time as the minor key This phase thus converts the original transactions into customer sequences In the second phase, the set of all large itemsets are found from the customer sequences by comparing their counts with a predefined support parameter This phase is similar to the process of mining association rules Note that when an itemset occurs more than one time in a customer sequence, it is counted once for this customer sequence In the third phase, each large itemset is mapped to a contiguous integer and the original customer sequences are transformed into the mapped integer sequences In the fourth phase, the set of transformed integer sequences are used to find large sequences among them In the fifth phase, the maximally large sequences are then derived and output to users In this paper, a mining algorithm is proposed for finding inter- and intra-association rules from object data It includes two apriori-like procedures and is a little like one between the above two approaches 3 Object-oriented transactions An object-oriented transaction includes one or more purchased items, each of which is represented as an object or an instance Each instance inherits its characteristics from a superior object, called class, which defines the basic structure of objects with common properties, including attributes, default values, and methods The roles of classes and instances in an object-oriented transaction data are like those that schema and tuples play in a relational database (Kim, 199) A simple structure of a class is shown in Fig 1, which includes at least three major components: the class name, the attributes and the methods The class name is an identifier used to identify a class, the attributes are used to represent the characteristics of a class, and the methods are used to implement the operations and functions of a class An example for a class wine is shown in Fig 2 to illustrate the above concept The class name is specified as wine The class includes four attributes, on_sale, discount, take_out service and free trial It also has two methods, confirmation and acknowledgement

3 C-M Huang et al / Expert Systems with Applications xxx (26) xxx xxx 3 message class name attribute 1 attribute 2 attribute n method 1 method 2 method m Fig 1 Structure of a typical class wine 1 on_sale 2 discount 3 take_out service 4 free trial 1 confirmation 2 acknowledgement Fig 2 An example of the class wine 164 In this paper, each item itself (or item name) is thought 165 of as a class, and each item purchased in a transaction is 166 thought of as an instance Instances with the same class 167 (item name) may have different attribute values since they 168 may appear in different transactions The data-mining algorithm for object-oriented 17 transaction data response 171 In this section, an algorithm is proposed for discovering 172 useful association rules from objected-oriented transaction 173 data The notation used in the algorithm is first listed 174 below 175 n the total amount of object-oriented transaction 176 data; 177 w the total amount of items (classes); 178 m the number of attributes for each item; 179 I j the jth item (class), 1 6 j 6 w; 18 A k the kth attribute, 1 6 k 6 m; I j A k the kth attribute of the jth item; count jk the count of I j A k ; support jk the support of I j A k ; a the predefined minimum support value; k the predefined minimum confidence value; C r the set of candidate itemsets with r intra-object items; L r the set of large itemsets with r intra-object items; C z the set of candidate itemsets with z inter-object items; the set of large itemsets with z inter-object items L z In this paper, the attributes in each item (class) are assumed to be binary, with 1 representing that an instance of the item has the attribute property The proposed algorithm can be divided into two main phases The first phase is called the intra-object mining phase, in which the large itemsets associated with the same classes (items) but with different attributes are divided The phase can find out the association relation within the same kind of objects Each large itemset found in this phase can thus be thought of as a composite item used in phase 2 The second phase is called the inter-object mining phase, in which the large itemsets from the composite items are obtained to get relationship among different kinds of objects Both the intraobject and inter-object association rules can thus be easily derived by the proposed algorithm at the same time Two apriori-like procedures are adopted to find the two kinds of rules The details of the proposed algorithm are described below The object-oriented data-mining algorithm for association 211 rules: 212 INPUT: A set of w items (classes) with m attributes, a 213 body of n transaction data, each with some 214 items (objects) and their attribute values, a pre- 215 defined minimum support value a, and a prede- 216 fined confidence value k 217 OUTPUT: A set of intra- and inter-object association 218 rules Step 1: Calculate the number (count jk ) of each class attri- 221 bute I j A k appearing in the n transaction data, where I j is 222 the jth class (item), A k is the kth attribute, 1 6 k 6 m, j 6 w; set the support (support jk )ofi j A k as count jk /n 224 Step 2: Check whether the support of each class attri- 225 bute I j A k is larger than or equal to the predefined mini- 226 mum support value a IfI j A k satisfies the condition, put 227 it in the set of large 1-itemsets (L 1 ) That is, 228 L 1 ¼fI j :A k jcount jk =n P a; 1 6 k 6 m; 1 6 j 6 wg: 23 Step 3: IfL 1 is null, then exit the algorithm; otherwise, 231 do the next step 232 Step 4: Set r = 1, where r is the number of items in the 233 itemsets currently being processed 234 Step 5: Generate the candidate set C r+1 by joining L r in 235 a way similar to that in the apriori algorithm (Agrawal & 236

4 4 C-M Huang et al / Expert Systems with Applications xxx (26) xxx xxx 237 Srikant, 1994) except that the two (r 1)-itemsets to be 238 joined must have the same classes (items) 239 Step 6: Calculate the number (count s ) of each candidate 24 (r + 1)-itemset s (with all their attribute values in the item- 241 set equal to 1) in C r+1 ; set its support (support s )ascount s /n 242 Step 7: Check whether the support of each candidate 243 (r + 1)-itemset s is larger than or equal to the predefined 244 minimum support valuea Ifs satisfies the condition, put 245 it in the set of large (r + 1)-itemsets (L r +1 ) That is, 247 L rþ1 ¼fsjcount s =n P a; s 2 C rþ1 g: 248 Step 8: IFL r+1 is null, do the next step; otherwise, set 249 r = r + 1 and repeat Steps Step 9: Each large itemset found so far is then thought 251 of as an intra-object composite item and is put in the 252 inter-object large 1-itemset (L 1 ) 253 Step 1: Set z = 1, where z is used to represent the num- 254 ber of composite items in the intra-object itemsets currently 255 being processed 256 Step 11: Generate the candidate set C zþ1 from L z in a 257 way similar to that in the apriori algorithm under the con- 258 dition that each (z + 1)-itemset must not include composite 259 items from the same classes 26 Step 12: Calculate the number (count s ) of each candidate (z + 1)-itemset s in C zþ1 appearing in the transaction data; set its support (support s )ascount s /n 263 Step 13: Check whether the support of each candidate 264 (z + 1)-itemset s is larger than or equal to the predefined 265 minimum support value a Ifs satisfies the condition, put 266 it in the set of large (z + 1)-itemsets (L zþ1 ) That is, 268 L zþ1 ¼fsjcount s=n P a; s 2 C zþ1 g: 269 Step 14: IFL zþ1 is null, do the next step; otherwise, set 27 z = z + 1 and repeat Steps Step 15: Derive intra-object association rules with confi- 272 dence values larger than or equal to k from the large item- 273 sets L 2 to L r 274 Step 16: Derive inter-object association rules with confi- 275 dence values larger than or equal to k from the large item- 276 sets L 2 to L z 277 After Step 16, the two kinds of intra- and inter-object 278 association rules are found from the given set of object-ori- 279 ented transactions 5 An example 28 In Table 1, I A represents the value of the attribute A Step 5: The candidate set C is formed by joining L In this section, an example is given to illustrate the pro- 281 posed data-mining algorithm This is a simple example to 282 show how the proposed algorithm can be used to generate 283 inter-object and intra-object sale strategy of commodities 284 in a store Assume there are four items, I 1 to I 4, to be sold 285 in this example and each item has the same four attributes 286 related to the sale behavior The attributes are on_sale, dis- 287 count, take_out service and free trial, represented as A 1 to 288 A 4 Their attribute values are either or 1 Also assume the 289 data set includes 1 transactions, as shown in Table in item I 1 is 1, meaning I 1 with the characteristic A 1 For 292 the transaction data in Table 1, the proposed mining algo- 293 rithm proceeds as follows: Step 1: The count value of each item attribute appearing 296 in the ten transaction data is first calculated Take the class 297 attribute I 1 A 1 as an example Its count value = ( ) = 8 This step is 299 repeated for the other item attributes, with the results 3 shown in Table 2 31 Step 2: The support of each item attribute can be derived 32 by the count value over the number of transactions The 33 support of each item attribute is checked to determine 34 whether it is larger than or equal to the predefined mini- 35 mum support value a Assume in this example, the mini- 36 mum support a is set at 4 Since the support values of 37 I 1 A 1,I 1 A 2,I 1 A 3,I 2 A 2,I 3 A 3,I 4 A 1 and I 4 A 3 are larger 38 than or equal to 4, these item attributes are put in the 39 set of large L 1 (Table 3) 31 Step 3: IfL 1 is null, then the algorithm is exited; other- 311 wise, the next step is done In this example, since L 1 is not 312 null, step 4 is then executed 313 Step 4: r is set at 1, where r is the number of item attri- 314 butes in the itemsets currently being processed r+1 r such that the two (r 1)-itemsets to be joined must have 317 the same items (classes) C 2 is first generated from L 2 as fol- 318 lows: (I 1 A 1,I 1 A 2 ),(I 1 A 1,I 1 A 3 ),(I 1 A 2,I 1 A 3 ), and (I 4 A 1, 319 I 4 A 3 ) 32 Table 1 The set of 1 transactions in the example Transaction ID Purchased items Attribute values of purchased items 1 I 1,I 3,I 4 (I 1 A 1,I 1 A 2,I 1 A 3,I 1 A 4 ),(I 3 A 1,I 3 A 2,I 3 A 3 ),(I 4 A 1,I 4 A 3 ) 2 I 2,I 4 (I 2 A 1,I 2 A 2,I 2 A 4 ),(I 4 A 1,I 4 A 3,I 4 A 4 ) 3 I 1,I 4 (I 1 A 1,I 1 A 3 ),(I 4 A 2 ) 4 I 2,I 3 (I 2 A 2 ),(I 3 A 2,I 3 A 3,I 3 A 4 ) 5 I 1,I 2 (I 1 A 1,I 1 A 2,I 1 A 3,I 1 A 4 ),(I 2 A 1,I 2 A 2,I 2 A 3 ) 6 I 1,I 4 (I 1 A 1,I 1 A 2,I 1 A 3 ),(I 4 A 1,I 4 A 2,I 4 A 3 ) 7 I 1,I 2,I 3,I 4 (I 1 A 1,I 1 A 2,I 1 A 4 ),(I 2 A 1,I 2 A 2,I 2 A 4 ),(I 3 A 1,I 3 A 2,I 3 A 3 ),(I 4 A 2 ) 8 I 1,I 2,I 3,I 4 (I 1 A 1,I 1 A 2,I 1 A 3 ),(I 2 A 3,I 2 A 4 ),(I 3 A 1,I 3 A 3 ),(I 4 A 1,I 4 A 3 ) 9 I 1,I 2 (I 1 A 1,I 1 A 3 ),(I 2 A 3 ) 1 I 1,I 3,I 4 (I 1 A 1,I 1 A 2,I 1 A 3 ),(I 3 A 4 ),(I 4 A 3,I 4 A 4 )

5 C-M Huang et al / Expert Systems with Applications xxx (26) xxx xxx 5 Table 2 The counts of the item attributes in Table 1 Trans ID I 1 I 2 I 3 I 4 A 1 A 2 A 3 A 4 A 1 A 2 A 3 A 4 A 1 A 2 A 3 A 4 A 1 A 2 A 3 A Count Table 3 The set of large 1-itemsets L 1 in this example Itemset Count I 1 A 1 8 I 1 A 2 6 I 1 A 3 7 I 2 A 2 4 I 3 A 3 4 I 4 A 1 4 I 4 A 3 5 Table 4 The counts of the itemsets in C 2 Itemset Count (I 1 A 1,I 1 A 2 ) 6 (I 1 A 1,I 1 A 3 ) 7 (I 1 A 2,I 1 A 3 ) 5 (I 4 A 1,I 4 A 3 ) Step 6: The count of each candidate in C 2 is calculated, 322 with the results shown in Table Step 7: The support of each candidate is then calculated 324 as its count divided by 1 The support of each 2-candidate 325 is then compared with the predefined minimum support 326 value 4 In this example, all the four 2-candi- 327 dates, (I 1 A 1,I 1 A 2 ), (I 1 A 1,I 1 A 3 ), (I 1 A 2,I 1 A 3 ), and 328 (I 4 A 1,I 4 A 3 ), are large and thus kept in L 2 (Table 5) 329 Step 8: IFL r+1 is null, the next step is done; otherwise, 33 r = r + 1 and Steps 5 7 are repeated Since L 2 is not null in 331 the example, r = r + 1 = 2 Steps 5 7 are then repeated to 332 find L 3 C 3 is first generated from L 2, and only the 3-itemset 333 (I 1 A 1,I 1 A 2,I 1 A 3 ) is formed Its support is calculated as Table 5 The itemsets kept in L 2 5, larger than 4 It is thus put in L 3 Since L 3 contains 334 only one item set, no 4-itemsets are formed and L 4 is null 335 Step 9 then begins 336 Step 9: Each large itemset found so far is thought of as 337 an intra-object composite item and is put in the inter-object 338 large 1-itemset (L 1 ) In this example, the large itemsets I 1 A 1,I 1 A 2,I 1 A 3,I 2 A 2,I 3 A 3,I 4 A 1,I 4 A 3,(I 1 A 1,I 1 A 2 ),(I 1 A 1, I 1 A 3 ),(I 1 A 2,I 1 A 3 ),(I 4 A 1,I 4 A 3 ),(I 1 A 1,I 1 A 2,I 1 A 3 ) are put 341 in the set of large L 1 Table 6 shows the results 342 Step 1: z is set at 1, where z is used to represent the 343 number of composite items in the intra-object itemsets cur- 344 rently being processed 345 Step 11: The candidate set C zþ1 is generated by joining L z 346 under the condition that each (z + 1)-itemset must not 347 include composite items from the same classes C 2 is first 348 generated from L 1 as follows: [I 1A 1,I 2 A 2 ], [I 1 A 1,I 3 A 3 ], 349 [I 1 A 1,I 4 A 1 ], [I 1 A 1,I 4 A 3 ], [I 1 A 1,(I 4 A 1,I 4 A 3 )], [I 1 A 2, 35 I 2 A 2 ], [I 1 A 2,I 3 A 3 ], [I 1 A 2,I 4 A 1 ], [I 1 A 2,I 4 A 3 ], [I 1 A 2, 351 (I 4 A 1,I 4 A 3 )], [I 1 A 3,I 2 A 2 ], [I 1 A 3,I 3 A 3 ], [I 1 A 3,I 4 A 1 ], 352 [I 1 A 3,I 4 A 3 ], [I 1 A 3,(I 4 A 1,I 4 A 3 )], [I 2 A 2,I 3 A 3 ], [I 2 A 2, 353 I 4 A 1 ], [I 2 A 2,I 4 A 3 ], [I 2 A 2,(I 1 A 1,I 1 A 2 )], [I 2 A 2,(I 1 A 1, 354 I 1 A 3 )], [I 2 A 2,(I 1 A 2,I 1 A 3 )], [I 2 A 2,(I 4 A 1,I 4 A 3 )], [I 2 A 2, 355 (I 1 A 1,I 1 A 2,I 1 A 3 )], [I 3 A 3,I 4 A 1 ], [I 3 A 3,I 4 A 3 ], [I 3 A 3, 356 (I 1 A 1,I 1 A 2 )], [I 3 A 3,(I 1 A 1,I 1 A 3 )], [I 3 A 3,(I 1 A 2,I 1 A 3 )], 357 [I 3 A 3,(I 4 A 1,I 4 A 3 )], [I 3 A 3,(I 1 A 1,I 1 A 2,I 1 A 3 )], [I 4 A 1, 358 (I 1 A 1,I 1 A 2 )], [I 4 A 1,(I 1 A 1,I 1 A 3 )], [I 4 A 1,(I 1 A 2,I 1 A 3 )], 359 Itemset Count (I 1 A 1,I 1 A 2 ) 6 (I 1 A 1,I 1 A 3 ) 7 (I 1 A 2,I 1 A 3 ) 5 (I 4 A 1,I 4 A 3 ) 4 Table 6 The set of large inter-object 1-itemsets L 1 in this example Itemset Count I 1 A 1 8 I 1 A 2 6 I 1 A 3 7 I 2 A 2 4 I 3 A 3 4 I 4 A 1 4 I 4 A 3 5 (I 1 A 1,I 1 A 2 ) 6 (I 1 A 1,I 1 A 3 ) 7 (I 1 A 2,I 1 A 3 ) 5 (I 4 A 1,I 4 A 3 ) 4 (I 1 A 1,I 1 A 2,I 1 A 3 ) 5

6 6 C-M Huang et al / Expert Systems with Applications xxx (26) xxx xxx 36 [I 4 A 1,(I 1 A 1,I 1 A 2,I 1 A 3 )], [I 4 A 3,(I 1 A 1,I 1 A 2 )], [I 4 A 3, 361 (I 1 A 1,I 1 A 3 )], [I 4 A 3,(I 1 A 2,I 1 A 3 )], [I 4 A 3,(I 1 A 1,I 1 A 2, 362 I 1 A 3 )], [(I 1 A 1,I 1 A 2 ),(I 4 A 1,I 4 A 3 )], [(I 1 A 1,I 1 A 3 ), (I 4 A 1, 363 I 4 A 3 )], [(I 1 A 2,I 1 A 3 ), (I 4 A 1,I 4 A 3 )] and [(I 4 A 1,I 4 A 3 ), 364 (I 1 A 1,I 1 A 2,I 1 A 3 )] There are totally 42 candidates in C Step 12: The count of each candidate in C 2 is calculated, 366 with the results shown in Table Step 13: The support of each candidate is calculated and 368 compared with the predefined minimum support value In this example, the 11 itemsets, [I 1 A 1,I 4 A 3 ], [I 1 A 1, 37 (I 4 A 1,I 4 A 3 )], [I 1 A 2,I 4 A 3 ] [I 1 A 2,(I 4 A 1,I 4 A 3 )], [I 1 A 3, 371 I 4 A 3 ], [I 1 A 3,(I 4 A 1,I 4 A 3 )], [I 2 A 2,I 4 A 3 ], [I 4 A 3,(I 1 A 1, 372 I 1 A 2 )], [I 4 A 3,(I 1 A 1,I 1 A 3 )], [I 4 A 3,(I 1 A 2,I 1 A 3 )] and 373 [I 4 A 3,(I 1 A 1,I 1 A 2,I 1 A 3 )] satisfy the condition and are 374 kept in L 2 Table 8 shows the results 375 Step 14: IfL zþ1 is null, the next step is done; otherwise, 376 z = z + 1 and Steps are repeated Since L 2 is not null 377 in the example, r = r + 1 = 2 Steps are then repeated to find L 3 C 3 is first generated from L 2, and four inter-object 3-itemsets [I 2 A 2,I 4 A 3,(I 1 A 1,I 1 A 2 )], [I 2 A 2, 38 I 4 A 3,(I 1 A 1,I 1 A 3 )], [I 2 A 2,I 4 A 3,(I 1 A 2,I 1 A 3 )] and [I 2 A 2, 381 I 4 A 3,(I 1 A 1,I 1 A 2,I 1 A 3 )], are formed The count of each itemset is then calculated Since all the count values are 383, smaller than 4, the above 3-itemsets are not large L 3 is 384 thus an empty set Step 15 then begins 385 Step 15: Intra-object association rules with confidence 386 values larger than or equal to k are derived from the large 387 itemsets L 2 to L r In this example, r = 3 It includes the fol- 388 lowing sub-steps: 389 (a) All intra-object possible association rules are formed 39 The following 14 association rules are thus generated: If I 1 = A 1, then I 1 = A 2 ; If I 1 = A 2, then I 1 = A 1 ; If I 1 = A 1, then I 1 = A 3 ; If I 1 = A 3, then I 1 = A 1 ; If I 1 = A 2, then I 1 = A 3 ; If I 1 = A 3, then I 1 = A 2 ; Table 8 The counts of the itemsets in L 2 Itemset Count [I 1 A 1,I 4 A 3 ] 4 [I 1 A 1,(I 4 A 1,I 4 A 3 )] 4 [I 1 A 2,I 4 A 3 ] 4 [I 1 A 2,(I 4 A 1,I 4 A 3 )] 4 [I 1 A 3,I 4 A 3 ] 4 [I 1 A 3,(I 4 A 1, I 4 A 3 )] 4 [I 2 A 2,I 4 A 3 ] 4 [I 4 A 3,(I 1 A 1,I 1 A 2 )] 4 [I 4 A 3,(I 1 A 1,I 1 A 3 )] 4 [I 4 A 3,(I 1 A 2,I 1 A 3 )] 4 [I 4 A 3,(I 1 A 1,I 1 A 2,I 1 A 3 )] 4 7 If I 4 = A 1, then I 4 = A 3 ; If I 4 = A 3, then I 4 = A 2 ; If I 1 = A 1, then (I 1 = A 2 and I 1 = A 3 ); If (I 1 = A 2 and I 1 = A 3 ), then I 1 = A 1 ; 4 11 If I 1 = A 2, then (I 1 = A 1 and I 1 = A 3 ); If (I 1 = A 1 and I 1 = A 3 ), then I 1 = A 2 ; If I 1 = A 3, then (I 1 = A 1 and I 1 = A 2 ); If (I 1 = A 1 and I 1 = A 2 ), then I 1 = A (b) The confidence factors for the above association rules 47 are calculated Take the first association rule as an 48 example The intra-object count of I 1 A 1 \ I 1 A 2 is 49 calculated as shown in Table 9 41 The confidence factor for the association rule If 411 I 1 = A 1, then I 1 = A 2 is then calculated as 412 P 1 i¼1 ði 1:A 1 \ I 1 :A 2 Þ P 1 i¼1 ði ¼ 6 1:A 1 Þ 8 ¼ :75: 414 With the same calculation process, the results for all 415 the 14 rules are shown below If I 1 = A 1, then I 1 = A 2 with a confidence fac- 417 tor of 75; If I 1 = A 2, then I 1 = A 1 with a confidence fac- 419 tor of 1; 42 Table 7 The counts of the itemsets in C 2 Itemset Count Itemset Count Itemset Count [I 1 A 1,I 2 A 2 ] 2 [I 2 A 2,I 4 A 1 ] 1 [I 4 A 1,(I 1 A 2,I 1 A 3 )] 3 [I 1 A 1,I 3 A 3 ] 3 [I 2 A 2,I 4 A 3 ] 4 [I 4 A 1,(I 1 A 1,I 1 A 2,I 1 A 3 )] 3 [I 1 A 1,I 4 A 1 ] 3 [I 2 A 2,(I 1 A 1,I 1 A 2 )] 2 [I 4 A 3,(I 1 A 1,I 1 A 2 )] 4 [I 1 A 1,I 4 A 3 ] 4 [I 2 A 2,(I 1 A 1,I 1 A 3 )] 1 [I 4 A 3,(I 1 A 1,I 1 A 3 )] 4 [I 1 A 1,(I 4 A 1,I 4 A 3 )] 4 [I 2 A 2,(I 1 A 2,I 1 A 3 )] 1 [I 4 A 3,(I 1 A 2,I 1 A 3 )] 4 [I 1 A 2,I 2 A 2 ] 2 [I 2 A 2,(I 4 A 1,I 4 A 3 )] 1 [I 4 A 3,(I 1 A 1,I 1 A 2,I 1 A 3 )] 4 [I 1 A 2,I 3 A 3 ] 3 [I 2 A 2,(I 1 A 1,I 1 A 2,I 1 A 3 )] 1 [(I 1 A 1,I 1 A 2 ),(I 4 A 1,I 4 A 3 )] 3 [I 1 A 2,I 4 A 1 ] 3 [I 3 A 3,I 4 A 1 ] 2 [(I 1 A 1,I 1 A 3 ),(I 4 A 1,I 4 A 3 )] 3 [I 1 A 2,I 4 A 3 ] 4 [I 3 A 3,I 4 A 3 ] 2 [(I 1 A 2,I 1 A 3 ),(I 4 A 1,I 4 A 3 )] 3 [I 1 A 2,(I 4 A 1,I 4 A 3 )] 4 [I 3 A 3,(I 1 A 1,I 1 A 2 )] 3 [(I 4 A 1,I 4 A 3 ),(I 1 A 1,I 1 A 2,I 1 A 3 )] 3 [I 1 A 3,I 2 A 2 ] 1 [I 3 A 3,(I 1 A 1,I 1 A 3 )] 2 [I 1 A 3,I 3 A 3 ] 2 [I 3 A 3,(I 1 A 2,I 1 A 3 )] 2 [I 1 A 3,I 4 A 1 ] 3 [I 3 A 3,(I 4 A 1,I 4 A 3 )] 2 [I 1 A 3,I 4 A 3 ] 4 [I 3 A 3,(I 1 A 1,I 1 A 2,I 1 A 3 )] 2 [I 1 A 3,(I 4 A 1,I 4 A 3 )] 4 [I 4 A 1,(I 1 A 1,I 1 A 2 )] 3 [I 2 A 2,I 3 A 3 ] 2 [I 4 A 1,(I 1 A 1,I 1 A 3 )] 3

7 C-M Huang et al / Expert Systems with Applications xxx (26) xxx xxx 7 Table 9 The calculation for the intra-object count of I 1 A 1 \ I 1 A 2 Trans ID I 1 A 1 I 1 A 2 I 1 A 1 \ I 1 A Count If I 1 = A 1, then I 1 = A 3 with a confidence fac- 422 tor of 88; If I 1 = A 3, then I 1 = A 1 with a confidence fac- 424 tor of 1; If I 1 = A 2, then I 1 = A 3 with a confidence fac- 426 tor of 83; If I 1 = A 3, then I 1 = A 2 with a confidence fac- 428 tor of 71; If I 4 = A 1, then I 4 = A 3 with a confidence fac- 43 tor of 1; If I 4 = A 3, then I 4 = A 1 with a confidence fac- 432 tor of 8; If I 1 = A 1, then (I 1 = A 2 and I 1 = A 3 ) with a 434 confidence factor of 63; If (I 1 = A 2 and I 1 = A 3 ), then I 1 = A 1 with a 436 confidence factor of 1; If I 1 = A 2, then (I 1 = A 1 and I 1 = A 3 ) with a 438 confidence factor of 83; If (I 1 = A 1 and I 1 = A 3 ), then I 1 = A 2 with a 44 confidence factor of 71; If I 1 = A 3, then (I 1 = A 1 and I 1 = A 2 ) with a 442 confidence factor of 71; If (I 1 = A 1 and I 1 = A 2 ), then I 1 = A 3 with a confidence factor of (c) The confidence factors of the above association rules 448 are then compared with the predefined confidence 449 threshold k Assume the confidence k was set at 8 45 in this example The following eight rules are thus 451 output to users: If I 1 = A 2, then I 1 = A 1 with a confidence factor 453 of 1; If I 1 = A 1, then I 1 = A 3 with a confidence factor 455 of 88; If I 1 = A 3, then I 1 = A 1 with a confidence factor 457 of 1; If I 1 = A 2, then I 1 = A 3 with a confidence factor 459 of 83; 46 5 If I 4 = A 1, then I 4 = A 3 with a confidence factor 461 of 1; If I 4 = A 3, then I 4 = A 1 with a confidence factor 463 of 8; If (I 1 = A 2 and I 1 = A 3 ), then I 1 = A 1 with a 465 confidence factor of 1; 8 If I 1 = A 2, then (I 1 = A 1 and I 1 = A 3 ) with a confidence factor of The above rules can then be explained in a comprehen- 471 sible way For example, the association rule If I 1 = A 2, 472 then I 1 = A 1 with a confidence factor of 1 can be 473 explained as If item I 1 has the characteristic discount, then 474 I 1 is on_sale, with a confidence factor of Step 16: Inter-object association rules with confidence 476 values larger than or equal to k are derived from the large 477 itemsets L 2 to L z In this example, z = 2 The following 478 inter-object association rules can then be derived: If I 4 = A 3, then I 1 = A 1 with a confidence factor of 8; 2 If (I 4 = A 1 and I 4 = A 3 ), then I 1 = A 1 with a confidence factor of 1; 3 If I 1 = A 2, then I 4 = A 3 with a confidence factor of 1; 4 If I 4 = A 3, then I 1 = A 2 with a confidence factor of 8; 5 If (I 4 = A 1 and I 4 = A 3 ), then I 1 = A 2 with a confidence factor of 1; 6 If I 4 = A 3, then I 1 = A 3 with a confidence factor of 8; 7 If (I 4 = A 1 and I 4 = A 3 ), then I 1 = A 3 with a confidence factor of 1; 8 If I 2 = A 2, then I 4 = A 3 with a confidence factor of 1; 9 If I 4 = A 3, then I 2 = A 2 with a confidence factor of 8; 1 If I 4 = A 3, then (I 1 = A 1 and I 1 = A 2 ) with a confidence factor of 8; 11 If I 4 = A 3, then (I 1 = A 1 and I 1 = A 3 ) with a confidence factor of 8; 12 If I 4 = A 3, then (I 1 = A 2 and I 1 = A 3 ) with a confidence factor of 1; 13 If (I 1 = A 2 and I 1 = A 3 ), then I 4 = A 3 with a confidence factor of 1; 14 If I 4 = A 3, then (I 1 = A 1 and I 1 = A 2 and I 1 = A 3 ) with a confidence factor of 8; 15 If (I 1 = A 1 and I 1 = A 2 and I 1 = A 3 ), then I 4 = A 3 with a confidence factor of 8 The association rule If (I 1 = A 1 and I 1 = A 2 and I 1 = A 3 ), then I 4 = A 3 with a confidence factor of 8 can be explained as If item I 1 is on_sale, with discount and take_out service, then I 4 is also with take_out service, with a confidence factor of 8 After Step 16, the two kinds of intra- and inter-object association rules are found from the given set of object-oriented transactions 6 Experimental results The section reports on experiments made to show the effects of the parameters on the proposed algorithm for

8 8 C-M Huang et al / Expert Systems with Applications xxx (26) xxx xxx 522 inter- and intra-object association rules They were imple- 523 mented in JAVA on a Pentium-IV 28 GHz personal com- 524 puter with 1 GB memory There were 1 object-oriented 525 items Each item had four attributes, and each attribute 526 was or 1 Data sets with different numbers of transactions 527 were run by the proposed algorithm In each data set, num- 528 bers of purchased items in transactions were first randomly 529 generated The purchased items and their attribute values 53 in each transaction were then generated An item could 531 not be generated twice in a transaction 532 Experiments were first performed to find the relation- 533 ships between numbers of rules and minimum supports 534 when the minimum transaction number was set at 1, 535 the minimum confidence was 2 and the average number 536 of purchased items in transactions was 1 The results for 537 both kinds of the intra and inter-object association rules 538 are shown in Fig It can be observed from Fig 3 that the number of rules 54 decreased along with the increase of the minimum support 541 value It was consistent with the property of data mining 542 Besides, the numbers of intra-object association rules were 543 smaller than those of inter-object association rules because 544 the attribute number is less than the item number in the 545 experiments This situation usually occurs in real applica- 546 tions Intra-object association rules are just internal rela- 547 tions within objects and inter-object association rules are 548 external relations among objects We also find the execu- 549 tion time for intra-object association rules was smaller than 55 that for inter-object association rules (which will be shown 551 later) 552 Experiments were then made to find the relationships 553 between numbers of rules and minimum confidences when 554 the minimum transaction number was 1, the minimum 555 support was 2 and the average number of purchased 556 items in transactions was 1 The results for both kinds 557 of the intra and inter-object association rules are shown 558 in Fig It can be observed from Fig 4 that the number of rules 56 decreased along with the increase of the minimum confi- 561 dence value It was also consistent with the property of Number of rules intra oo inter oo data mining Besides, the numbers of intra-object association rules were smaller than those of inter-object association rules when the minimum confidence was small, and larger when the minimum confidence was large This was because the inter-object association rules were derived from the given set of items, which was more dispersed than the set of attributes Thus, when the minimum confidence value was high, only a few inter-object association rules could be derived Experiments were then performed to compare the results of different numbers of transactions The relationship between numbers of intra-object association rules and minimum support values along with different numbers of transactions for an average of 1 purchased items in transactions and a minimum confidence value set at 5 is shown in Fig 5 The relationship between numbers of inter-object association rules and minimum support values along with different numbers of transactions is shown is Fig 6 From Figs 5 and 6, it is easily seen that the numbers of rules are nearly the same for different transactions since the minimum support and the minimum confidence were set at ratios and independent of transaction numbers The rule numbers along with different transactions for the minimum Mini-support Number of rules intra oo inter oo Mini-confidence Fig 4 The relationship between numbers of rules and minimum confidence values Number of rules Intra-1 Intra-2 Intra-3 Intra-4 Intra-5 Intra Mini-support Fig 3 The relationship between numbers of rules and minimum support values Fig 5 The relationship between numbers of intra-object rules and minimum support values for different numbers of transactions

9 C-M Huang et al / Expert Systems with Applications xxx (26) xxx xxx 9 Number of rules Inter-1 Inter-2 Inter-3 Inter-4 Inter-5 Inter Mini-support Fig 6 The relationship between numbers of inter-object rules and minimum support values for different numbers of transactions 586 support set at 2 and the minimum confidence set at are shown in Fig 7 The lines are nearly constant Number of rules intra oo inter oo Transactions Fig 7 The relationship between numbers of rules and numbers of transactions Times (Second) Intra-1 Intra-2 Intra-3 Intra-4 Intra-5 Intra-6 Finally, the execution time for intra-object rules with different minimum support values along with different numbers of transactions for an average of 1 purchased items in transactions and a minimum confidence value set at 5 is shown in Fig 8 The execution time for inter-object rules is shown in Fig 9 It is obvious from Figs 8 and 9 that the execution time increased along with the increase of transaction numbers Besides, finding inter-object association rules spent more time than finding intra-object association rules The second phase of the proposed algorithm is thus the bottleneck in finding the rules 7 Conclusion and future work The object concept has been very popular and used in a variety of applications, especially for complex data description An object represents an instance with several related attribute values and methods integrated together In this paper, we study how to mine out intra- and inter-object association rules from object transactions Each item itself is thought of as a class, and each item purchased in a transaction is thought of as an instance Instances with the same class (item name) may have different attribute values since they may appear in different transactions The proposed algorithm can be divided into two main phases The first phase is called the intra-object mining phase, in which the large itemsets associated with the same classes (items) but with different attributes are divided The phase can find out the association relation within the same kind of objects Each large itemset found in this phase can thus be thought of as a composite item used in phase 2 The second phase is called the inter-object mining phase, in which the large itemsets from the composite items are obtained to get relationship among different kinds of objects Both the intraobject and inter-object association rules can thus be easily derived by the proposed algorithm at the same time An example has also been given to illustrate the algorithm in detail In the future, we will further generalize our approach to manage different types of attribute values in Mini-support Fig 8 The execution times for intra-object association rules Times (Second) Inter-1 Inter-2 Inter-3 Inter-4 Inter-5 Inter Mini-support Fig 9 The execution times for inter-object association rules

10 1 C-M Huang et al / Expert Systems with Applications xxx (26) xxx xxx 625 addition to binary ones Experimental results have also 626 shown the effects of the parameters on the proposed algo- 627 rithm Finding inter-object association rules usually spend 628 more time than finding intra-object association rules 629 References 63 Agrawal, R, & Srikant, R (1994) Fast algorithm for mining association 631 rules In The international conference on very large databases (pp ) 633 Agrawal, R, & Srikant, R (1995) Mining sequential patterns In The 634 eleventh international conference on data engineering (pp 3 14) 635 Agrawal, R, Imielinksi, T, & Swami, A (1993a) Mining association rules 636 between sets of items in large database The 1993 ACM SIGMOD 637 conference, Washington, DC, USA 638 Agrawal, R, Imielinksi, T, & Swami, A (1993b) Database mining: a 639 performance perspective IEEE Transactions on Knowledge and Data 64 Engineering, 5(6), Clair, C, Liu, C, & Pissinou, N (1998) Attribute weighting: a method of 642 applying domain knowledge in the decision tree process In The seventh 643 international conference on information and knowledge management (pp ) Clark, P, & Niblett, T (1989) The CN2 induction algorithm Machine 645 Learning, 3, Famili, A, Shen, W M, Weber, R, & Simoudis, E (1997) Data 647 preprocessing and intelligent data analysis Intelligent Data Analysis, 648 1(1) 649 Frawley, W J, Piatetsky-Shapiro, G, & Matheus, C J (1991) 65 Knowledge discovery in databases: an overview In The AAAI 651 workshop on knowledge discovery in databases (pp 1 27) 652 Kim, W (199) Object-oriented databases: definition and research 653 directions IEEE Transactions on Knowledge and Data Engineering, 654 2(3), Kimura, T D (1995) Object-oriented dataflow In The 11th IEEE international symposium on visual languages (pp ) Mannila, H (1997) Methods and problems in data mining In The 658 international conference on database theory 659 Srikant, R, & Agrawal, R (1996) Mining quantitative association rules 66 in large relational tables In The 1996 ACM SIGMOD international 661 conference on management of data (pp 1 12), Montreal, Canada, June Srikant, R, Vu, Q, & Agrawal, R (1997) Mining association rules with 664 item constraints In The third international conference on knowledge 665 discovery in databases and data mining (pp 67 73), Newport Beach, 666 California, August

Applying Data Mining to Wireless Networks

Applying Data Mining to Wireless Networks CHENG-MING HUANG 1, TZUNG-PEI HONG 2 and SHI-JINN HORNG 3,4 1 Department of Electrical Engineering National Taiwan University of Science and Technology, Taipei,