A Combined Approach for Mining Fuzzy Frequent Itemset

A Combned Approach for Mnng Fuzzy Frequent Itemset R. Prabamaneswar Department of Computer Scence Govndammal Adtanar College for Women Truchendur 628 215 ABSTRACT Frequent Itemset Mnng s an mportant approach for Market Basket Analyss. Earler, the frequent temsets are determned based on the customer transactons of bnary data. Recently, fuzzy data are used to determne the frequent temsets because t provdes the nature of frequent temset e., t descrbes whether the frequent temset conssts of only hghly purchased tems or medum purchased tems or less purchased tems or combnaton of all these based on the fuzzy parttons correspond to quantty purchased. Ths paper concentrates on fuzzy frequent temset mnng n mult-dmensonal aspect by combnng prevously used approaches. Ths proposed approach ntally creates fuzzy parttons for numercal attrbutes and selects the fuzzy parttons to construct the fuzzy records and create the cluster-based fuzzy set table. Then, t uses cluster-based fuzzy set table, fnds the fuzzy frequent temset and reduces the sze of the cluster-based fuzzy set table teratvely. Fnally, t concludes wth the large fuzzy frequent temset. Ths paper also compares the proposed approach wth the fuzzy Apror approach and suggests the proposed approach s better than exstng fuzzy Apror approach. General Terms Data Mnng, Assocaton Rules, Frequent temset Keywords Fuzzy set, FCM, Cluster-based fuzzy set table, Fuzzy frequent temset 1. INTRODUCTION Knowledge dscovery, whose objectve s to obtan useful knowledge from data stored n large repostores, s recognzed as a basc necessty n many areas, especally those related to busness. Snce data represent a certan real-world doman, patterns that hold n data show us nterestng relatons that can be used to mprove our understandng of that doman. Data mnng s the step n the knowledge dscovery process that attempts to dscover novel and meanngful patterns n data. One of the best studed models for data mnng s that of assocaton rules. In the orgnal context of assocaton rule mnng, data are represented by a table wth bnary values. The man task of assocaton rule dscovery s to extract frequent temsets from data. Apror[1] algorthm of frequent temset mnng s a powerful method for market basket analyss. Ths market basket analyss system wll help the managers to understand about the set of tems are customers lkely to purchase. It wll also help the managers to propose new way of arrangement n store layouts. It wll gude them to plan marketng or advertsng approach.ths analyss may be carred out on all the retal stores data of customer transactons. Most of the problems deal wth quanttatve attrbutes nstead of bnary attrbutes. Therefore, there s a need to change from quanttatve attrbute nto bnary attrbute. In [2], mnng quanttatve assocaton rules has been proposed. Ths algorthm fnds the assocaton rules by parttonng attrbute domans and combnng adjacent parttons, then transforms the problem nto bnary one. Although, ths method can solve problem ntroduced by nfnte doman, t causes sharp boundary problem. It ether gnores or overemphaszes the elements near the boundares n the mnng process. In dealng wth the sharp boundary problem n parttonng, fuzzy sets whch can deal wth the boundary problem naturally have been used n the assocaton rule mnng domans. Fuzzy sets are frst ntroduced by Loft A. Zadeh n 1965[3]. After the ntroducton of fuzzy sets, Fuzzy Assocaton Rule Mnng algorthm (Fuzzy Apror) s ntroduced by kuok,fu and Wong[4]. Fuzzy Apror s smlar to classcal Apror but, t uses fuzzy data. Ths paper also uses fuzzy sets for mnng fuzzy frequent temsets. The fuzzy sets can be created by usng membershp functons such as Trangle, Trapezodal and Gaussan or by usng clusterng algorthms such as K- Means and Fuzzy C-Means. The created fuzzy sets are used to construct the fuzzy transacton records. In [5] and [6] trapezodal shape membershp functon s used for fndng fuzzy values. In [7] and [8-12], FCM algorthm s used for generatng fuzzy sets. The fuzzy records can be clustered for reducng the executon tme. The clusterbased approach s used n [11] and [13-14]. In [11] and [14], trangle functon s used ntally and generated fuzzy records are clustered depends on the length of the orgnal transacton. In [13], FCM s used nstead of trangle functon. In ths paper, ntally fuzzy parttons are created for each numercal attrbute usng FCM [15] algorthm. Then, one of the fuzzy parttons for each numercal attrbute s selected by fndng the sum of each partton separately and consderng the maxmum value among them. Afterwards, fuzzy records are constructed by consderng selected partton for each attrbute and jonng them. It s smlar to [9], but t consders only fuzzy values for numercal attrbutes. Now, fuzzy frequent 1-temset s determned by usng the created fuzzy records and smultaneously cluster-based fuzzy set table wth count >1 s created by elmnatng the fuzzy records whose number of fuzzy values satsfyng threshold equal to1. Smlarly fuzzy frequent 2-temset s determned by usng the created cluster-based fuzzy set table wth count >1 and smultaneously cluster-based fuzzy set table wth count >2 s created by elmnatng the fuzzy records wth count equal to 2. Iteratvely, the sze of the cluster-based fuzzy set table s reduced and fuzzy frequent large temset s determned from the reduced table. Secton 2 llustrates Fuzzy Apror and FCM. Secton 3 descrbes the proposed algorthm for determnng fuzzy 1

frequent temsets. Secton 4 demonstrates the algorthm wth an example. Secton 5 dscusses the expermental results. Fnally, concluson s gven n Secton 6. 2. FUZZY APRIORI AND FCM Fuzzy Apror s smlar to classcal Apror but, t uses fuzzy data. In recent years, there have been many attempts to mprove the classcal approach. Fuzzy sets can generally be vewed as an extenson of the classcal crsp sets. They have been frst ntroduced by Loft A. Zadeh n 1965[3]. A fuzzy set A on a unverse X s characterzed by X [0, 1] mappng, also called the membershp functon of A. Therefore, t s necessary to convert the crsp dataset nto fuzzy dataset. For convertng crsp dataset nto fuzzy dataset, frst fuzzy parttons are created for each numercal attrbute then, multple fuzzy records are created for each record n the crsp dataset. There s no well-defned and coordnated fuzzy-orented method to create fuzzy parttons. But, FCM[15] s a very popular and establshed algorthm for fuzzy clusterng n varous domans. It helps n the fuzzy parttons of the dataset, where every data pont belongs to every cluster to a certan degree μ n the range [0, 1]. Thus, each pece of data can belong to two or more clusters. The algorthm[15] tres to mnmze the objectve functon: (1) where m s any real number such that 1 m <, μ j s the degree of membershp of x n the cluster j, x s the th d-dmensonal measured data, c j s the d-dmenson center of the cluster, and * s any norm expressng the smlarty between any measured data and the center. The fuzzness parameter m s an arbtrary real number (m > 1). The hgher the value of m, the fuzzer s the resultng parttonng. The fuzzy parttons generated by FCM are normalzed such that for each data pont the sum of the membershp degrees for each cluster s 1 (, where C s the total number of one dmensonal clusters for that partcular attrbute). Fuzzy parttonng s carred out through an teratve optmzaton of the objectve functon shown above, wth the update of membershp μ j and the cluster centers c j by: (2) where c j = (3) Based on the fuzzy parttons correspond to each numercal attrbute, each record n the orgnal dataset s converted nto multple fuzzy records n the fuzzy dataset. Both the algorthms Apror and Fuzzy Apror count the support of each temset for fndng frequent temset but, the only dfference s that Fuzzy Apror calculates sum of the membershp functon μ correspondng to each record where the temset exsts. Thus, the support for any temset s the sum of membershp functons over the whole fuzzy dataset. Ths calculaton s done wth the help of a sutable t-norm. 3. PROPOSED APPROACH The proposed approach consders only the numercal attrbutes. The followng steps are used n ths approach. Step1: Fuzzy parttons are generated usng FCM algorthm. Here, the sngle dmensonal FCM s used. Step2: Fuzzy records are created by consderng one of the selected parttons of each numercal attrbute and jonng them. The partton can be selected by fndng the sum of each partton separately and choosng the maxmum among them. Then, fuzzy frequent 1-temset s determned by usng the created fuzzy records and smultaneously cluster-based fuzzy set table wth count >1 s created by elmnatng the fuzzy records whose number of fuzzy value satsfyng threshold equal to1. Step3: Frequent large temset s determned by followng the modfed Apror algorthm[16] along wth clusterbased fuzzy set table. Ths approach follows [15] for performng Step1. Step2 s carred out by followng [9] and [11]. In [9], both numercal and categorcal attrbutes are consdered and n [11], the length of orgnal transacton s consdered for clusterng n to correspondng table and trangle functon s used for generatng membershp value, but n ths approach, FCM s used for generatng membershp value and number of satsfed membershp values of each record(each membershp value should be specfed threshold membershp value) s consdered for clusterng nto cluster-based fuzzy set table. Here, the same clusterbased fuzzy set table s used successvely by elmnatng the fuzzy records whose number of membershp values s not satsfed. Intally, the records whose no of membershp =1 are removed, then the records whose no of membershp =2 are removed and so on tll the large fuzzy frequent temset s detemned nstead of clusterng nto cluster-table(k),cluster-table(k+1),cluster table(k+2).. n [11]. Step 3 s performed by modfyng canddate generaton process of an Apror algorthm[1]. Ths proposed algorthm fnds the subsets of a canddate temset partally. It also follows the same procedure used n [16] along wth cluster-based fuzzy set table and fnds frequent k-temset. In [11], frequent k-temset s determned wth reference to the cluster-table(k), then t s contacted wth the cluster-table(k+1), cluster table(k+2).but here, the same cluster-based fuzzy set table s used. Therefore, the number of tmes of scannng the database s reduced. Hence, the proposed approach s faster than exstng fuzzy Apror algorthm. The proposed algorthm s gven below: Proposed Algorthm Input: database D, mn-support, threshold -membershp value Output: fuzzy frequent large temset Man() begn 1. Call FCM1 2. Call Modfed Apror() End Procedure FCM1 D1= Null; canddate= set of numercal tems for canddate s numercal attrbute= 1 to n do For each transacton T D do 1 create fuzzy parttons usng FCM 2. sum each partton separately select any one partton wth greater sum 2

D1=D1+selected partton //create fuzzy_cltable Fuzz_cltable= for each transacton T D1 do canddate.supp s calculated based on thresholdmembershp value f (no of membershp of T >1) add T nto fuzzy_cltable L 1 =canddate canddate.supp mn-support D1=fuzzy_cltable Procedure Modfed Apror() number of tems n a combnaton (c1) = 1 k:=2 whle (L k-1!= φ) call_generate_new (c1) c1++ k++ Procedure call_generate_new(c1) generate canddates (L k-1 ) fuzzy_cltable= for each generated canddates c C k f (c1=1) call frequent(c) else get all partal subset( c) f (all partal subset(c) L k-1 ) call frequent(c) D1=fuzzy_cltable Procedure frequent(c) c.supp=0 for each fuzzy transacton T D1 fnd fuzzy ntersecton(c) // durng fuzzy ntersecton, check each value aganst threshold -membershp value; f value s < threshold -membershp value, then skp that transacton f ( fuzzy ntersecton(c) threshold -membershp value ) c.supp++ add no of membershp of T >k to fuzzy_cltable L k = c c.supp mn-support 4. EXAMPLE Let D= T 1, T 2 T n be a set of transacton and I= 1, 2. n be a set of attrbutes. Each attrbute k assocates wth several fuzzy sets such as F =f 1,f 2.f n. Table 1 shows sample database wth quanttatve attrbutes (quantty corresponds to tems). Table1. A Sample Transactonal Database Trans-d Item wth quantty 1 1 : 31, 2 :27, 3 :39, 4 :39, 6 :39 2 2 :39, 4 :27, 6 :29 3 1 :21, 3 :27, 5 :29 4 1 :37, 2 :27, 5 :35 5 2 :35, 4 :20, 6 :21 6 1 :20, 2 :27, 3 :31, 6 :23 7 3 :25, 5 :29 8 4 :25, 5 :38, 6 :36 9 5 :39, 6 :31 10 3 :27, 4 :37, 6 :34 The fuzzy sets for each attrbute s determned by usng FCM algorthm wth the clusters centers such as c 1 =15, c 2 =30, and c 3 =40. Therefore, three fuzzy parttons are created for each quanttatve attrbute and one of the fuzzy parttons s selected based on the maxmum cumulatve value among them. Then, fuzzy records are created by jonng all the selected parttons and removng the fuzzy value 0. Table 2 shows fuzzy transactonal database. Table2. A fuzzy Transactonal Database/ cluster-based fuzzy set table (1). Trans-d 1 1.med : 0.853; 2.med :0.896; 4.med : 0.008; 6.med : 0.012 2 2.med : 0.012; 4.med : 0.99; 6.med : 0.987 3 1.med : 0.465; 3.med : 0.956; 5.med : 0.987 4 1.med : 0.047; 2.med : 0.896; 5.med : 0.485 5 2.med : 0.485; 4.med : 0.463; 6.med : 0.288 6 1.med : 0.338; 2.med : 0.896; 3.med : 0.931; 6.med : 7 3.med : 0.85; 5.med : 0.987 8 4.med :0.915; 5.med : 0.05; 6.med : 0.3 9 5.med : 0.012; 6.med : 0.984 10 3.med : 0.956; 4.med : 0.099; 6.med : 0.672 Intally, cluster-based fuzzy set table s same as constructed fuzzy records. Then ts sze s reduced successvely. Here, each fuzzy record s each membershp value s checked aganst the threshold membershp value and the satsfed value s consdered for fndng the count value. Thereafter, the fuzzy record s stored n a clusterbased fuzzy set table depends on the count value for further teraton. It s shown as below: Table3. Cluster-based fuzzy set table (2) Transd 1 1.med : 0.853; 2.med :0.896; 4.med : 0.008; 6.med : 0.012 2 1.med : 0.465; 3.med : 0.956; 5.med : 0.987 3 1.med : 0.047; 2.med : 0.896; 5.med : 0.485 3

4 2.med : 0.485; 4.med : 0.463; 6.med : 0.288 5 1.med : 0.338; 2.med : 0.896; 3.med : 0.931; 6.med : Table4. Cluster-based fuzzy set table (3) 1 1.med : 0.465; 3.med : 0.956; 5.med : 0.987 2 2.med : 0.485; 4.med : 0.463; 6.med : 0.288 2 1.med : 0.338; 2.med : 0.896; 3.med : 0.931; 6.med : Table5. Cluster-based fuzzy set table (4) Transd Transd 1 1.med : 0.338; 2.med : 0.896; 3.med : 0.931; 6.med : Assume that the threshold membershp value s 0.05 and mn-support s 0.05. The fuzzy support values of canddate 1-temsets are calculated as: 1.med = 0.853+0.0465+0.047+0.338 = 1.703 2.med = 0.896+0.896+0.485+0.896 =3.173 3.med =0.956+0931+0.85+0.956 =3.693 4.med =6.99+0.463+0.915+0.099 =8.467 5.med =0.987+0.485+0.987+0.058 =2.517 6.med =0.987+0.288++0.3+0.672 =2.764 The canddate 1-temset 0.05 are 1.med, 2.med, 3.med, 4.med, 5.med, 6.med. Therefore, L 1 = 1.med, 2.med, 3.med, 4.med, 5.med, 6.med. Then canddate 2-temsets are generated and frequent 2-temsets are determned from cluster-based fuzzy-sets table wth count >1..e., cluster-based fuzzysets table (2). For example, the fuzzy support value of canddate 2-temset 1.med, 2.med s calculated as 1.med, 2.med = mn (0.853, 0.896) + mn (0.047, 0.896) +mn (0.338, 0.896) =0.853+0.047+0.338 =1.238 Smlarly, the fuzzy support values of further canddate2- tmsets are calculated. Therefore, L 2 = 1.med, 2.med, 1.med, 3.med, 1.med, 5.med, 1.med, 6.med, 2.med, 3.med, 2.med, 4.med, 2.med, 5.med, 2.med, 6.med, 3.med, 4.med, 3.med, 5.med, 3.med, 6.med, 4.med, 5.med, 4.med, 6.med, 5.med, 6.med Now, the canddate 3-temsets are generated and frequent 3-temsets are determned from cluster-based fuzzy-sets table wth count >2 as, L 3 = 1.med, 2.med, 3.med, 1.med, 2.med, 6.med, 1.med, 3.med, 5.med, 1.med, 3.med, 6.med 2.med, 3.med, 6.med, 2.med, 4.med, 6.med In the smlar way, large temsets L n are dscovered. In ths example, the large temset determned s 1.med, 2.med, 3.med, 6.med. 5.EXPERIMENTAL RESULTS To evaluate the effcency of the proposed method, t s mplemented along wth fuzzy Apror algorthm and expermented wth the mushroom dataset. The mushroom dataset contans the characterstcs of varous speces of mushrooms and s orgnally obtaned from the UCI repostory of machne learnng databases. It has 119 tems and 8124 transactons. The mnmum, maxmum and average length of ts transacton s 23. Both algorthms are mplemented usng C# language and carred on the computer wth the confguraton such as Intel(R) Core(TM) 3CPU, 3 GB RAM, 2.53 GHz Speed and Wndows 7 Operatng System. Fg 1 shows the performance of both algorthms Fg1. Performance of Fuzzy Apror and Proposed Approach Here, the quanttes are randomly added wth the mushroom dataset and fuzzy transactons are created. The tme taken for creatng fuzzy transactons s 54.5 sec and for fndng fuzzy frequent temset s gven n the fgure 1. The experment s done by varyng fuzzy support values such as 1000.0, 1500.0, 2000.0 and 2500.0 and gettng frequent 5-temset, frequent 4-temset, frequent 3-tmset and frequent 3-temset correspondngly. From the fgure 1, t s clear that the proposed algorthm takes lesser tme comparng to Fuzzy Apror algorthm. 6. CONCLUSION The proposed approach determnes the fuzzy frequent large temsets based on the approaches such as FCM, FPREP, FCBAR and modfed Apror algorthm. It llustrates the fast performance of the proposed approach wth the comparson to Fuzzy Apror approach. It consders only numercal attrbutes for constructng fuzzy records. In future, t wll consder categorcal attrbutes along wth the numercal attrbutes for creatng fuzzy records and fndng fuzzy frequent temset. 7. REFERENCES [1] Agrawal. R., Imelmsk. T and Swam. A.N., Mnng Assocaton Rules between sets of tems n large database, Proceedngs of ACM SIGMOD Internatonal Conference Management of Data, CM Press, pp. 207-216, 1993. [2] R. Srkant and R. Agrawal, Mnng quanttatve assocaton rules nlarge relatonal tables, n Proc. 4

ACM SIGMOD Conf. Management Data, Montréal, QC, Canada,pp. 1 12, 1996. [3] Zadeh L.A., Fuzzy Sets and systems, Internatonal Journal of General Systems, Vol. 17, pp. 129-138, 1990. [4] C. M. Kuok, A. W.-C. Fu, and M. H. Wong, Mnng fuzzy assocaton rules n databases, ACM SIGMOD Rec., vol. 27, no. 1, pp. 41 46, Mar.1998. [5] Wenng Zhang, Mnng Fuzzy Quanttatve Assocaton Rules, n Proc. ICTAI, pp. 99-102, 1999. [6] Keon-Myung Lee, Mnng Generalzed Fuzzy Quanttatve Rules wth Fuzzy Generalzaton Herarches, IFSA World Congress and 20th NAFIPS Internatonal Conference, 2001, vol. 5, pp.2977-2982, Dgtal Object Identfer : 10.1109/NAFIPS.2001.943701. [7] H.Verlnde, M. De Cock, and R. Boute, Fuzzy versus quanttatve assocaton rules: A far data drven comparson, IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 36, no. 3, pp. 679 684, Jun. 2006.Man, Cybern. B, Cybern., vol. 36, no. 3, pp. 679 684, Jun. 2006. [8] E.Hüllermeer, Y. Y, In Defense of Fuzzy Assocaton Analyss, IEEE Transactons on Systems, Man and Cybernetcs - Part B: Cybernetcs, vol. 37, no.4,pp. 1039-1043, August.2007. [9] Ashsh Mangalampall and Vkram Pud, FPrep: Fuzzy Clusterng drven Effcent Automated Pre-processng for Fuzzy Assocaton Rule Mnng, IEEE Intl Conference on Fuzzy Systems (FUZZ-IEEE) Barcelona, Span [10] Ashsh Mangalampall and Vkram Pud, Fuzzy Assocaton Rule Mnng Algorthm for Fast and Effcent Performance on Very Large Datasets, FUZZ-IEEE 2009, Korea, pp.1163-1168, August 20-24, 2009. [11] Reza Sheban and Amr Ebrahmzadeh, An Algorthm For Mnng Fuzzy Assocaton Rules, Proceedngs of the Internatonal MultConference of Engneers and Computer Scentsts 2008 Vol I, IMECS 2008, 19-21 March, 2008, Hong Kong. [12] Toshhko WATANABE, A Fast Fuzzy Assocaton Rules Mnng Algorthm Utlzng Output Feld Specfcaton, Bomedcal Soft Computng and Human Scences, vol. `16, no. 2, pp. 69-76. [13] Amr Ebrahmzadeh and Reza Sheban, Two Effcent Algorthms for Mnng Fuzzy Assocaton Rules, Internatonal Journal of Machne Learnng and Computng, Vol. 1, No. 5, December 2011. [14] Hung-Pn Chu et.al., Applyng cluster-based fuzzy assocaton rules mnng framework nto EC envronment, Appled Soft Computng, Artcle n Press 2012. [15] Bezdek J.C.: Pattern Recognton wth Fuzzy Objectve Functon Algorthms. Kluwer Academc Publshers, Norwell, MA (1981). [16] D.Ashok Kumar and R.Prabamaneswar, A Modfed Algorthm for Generatng Sngle Dmensonal Fuzzy temset Mnng, Internatonal Journal of Computatonal Intellgence and Informatcs, Vol. 1, N0. 3, pp. 162-166, ISSN: 2231-0258, October-December 2011. 5