Fuzzy Weighted Association Rule Mining with Weighted Support and Confidence Framework

Similar documents
Weighted Association Rule Mining from Binary and Fuzzy Data

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems

A Combined Approach for Mining Fuzzy Frequent Itemset

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Concurrent Apriori Data Mining Algorithms

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

A Binarization Algorithm specialized on Document Images and Photos

TF 2 P-growth: An Efficient Algorithm for Mining Frequent Patterns without any Thresholds

Parallel and Distributed Association Rule Mining - Dr. Giuseppe Di Fatta. San Vigilio,

Machine Learning: Algorithms and Applications

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Wireless Sensor Networks Fault Identification Using Data Association

Query Clustering Using a Hybrid Query Similarity Measure

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Available online at Available online at Advanced in Control Engineering and Information Science

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

Module Management Tool in Software Development Organizations

A Heuristic for Mining Association Rules In Polynomial Time*

Keywords: classifier, Association rules, data mining, healthcare, Associative Classifiers, CBA, CMAR, CPAR, MCAR

Maintaining temporal validity of real-time data on non-continuously executing resources

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A NOTE ON FUZZY CLOSURE OF A FUZZY SET

An Optimal Algorithm for Prufer Codes *

Support Vector Machines

Innovation Typology. Collaborative Authoritativeness. Focused Web Mining. Text and Data Mining In Innovation. Generational Models

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

A Heuristic for Mining Association Rules In Polynomial Time

A Simple Methodology for Database Clustering. Hao Tang 12 Guangdong University of Technology, Guangdong, , China

A METHOD FOR FACTOR SCREENING OF SIMULATION EXPERIMENTS BASED ON ASSOCIATION RULE MINING

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Classifier Selection Based on Data Complexity Measures *

Analysis of Continuous Beams in General

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

Cluster Analysis of Electrical Behavior

Meta-heuristics for Multidimensional Knapsack Problems

FAHP and Modified GRA Based Network Selection in Heterogeneous Wireless Networks

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

TN348: Openlab Module - Colocalization

ASSOCIATION RULE MINING BASED ON IMAGE CONTENT

Programming in Fortran 90 : 2017/2018

LinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals

A User Selection Method in Advertising System

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Load Balancing for Hex-Cell Interconnection Network

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Hermite Splines in Lie Groups as Products of Geodesics

Load-Balanced Anycast Routing

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Related-Mode Attacks on CTR Encryption Mode

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Smoothing Spline ANOVA for variable screening

Solving two-person zero-sum game by Matlab

GSLM Operations Research II Fall 13/14

Edge Detection in Noisy Images Using the Support Vector Machines

The Research of Support Vector Machine in Agricultural Data Classification

A Deflected Grid-based Algorithm for Clustering Analysis

From Comparing Clusterings to Combining Clusterings

An Image Fusion Approach Based on Segmentation Region

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints

ABSTRACT. WEIQING, JIN. Fuzzy Classification Based On Fuzzy Association Rule Mining (Under the direction of Dr. Robert E. Young).

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Reducing Frame Rate for Object Tracking

X- Chart Using ANOM Approach

Performance Evaluation of Information Retrieval Systems

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

High-Boost Mesh Filtering for 3-D Shape Enhancement

Learning-Based Top-N Selection Query Evaluation over Relational Databases

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

An Entropy-Based Approach to Integrated Information Needs Assessment

Effective Page Recommendation Algorithms Based on. Distributed Learning Automata and Weighted Association. Rules

GA-Based Learning Algorithms to Identify Fuzzy Rules for Fuzzy Neural Networks

Multiway pruning for efficient iceberg cubing

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Association Analysis for an Online Education System

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Outline. CHARM: An Efficient Algorithm for Closed Itemset Mining. Introductions. Introductions

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)

Study on Fuzzy Models of Wind Turbine Power Curve

Discovering Relational Patterns across Multiple Databases

EVALUATION OF THE PERFORMANCES OF ARTIFICIAL BEE COLONY AND INVASIVE WEED OPTIMIZATION ALGORITHMS ON THE MODIFIED BENCHMARK FUNCTIONS

Pruning Training Corpus to Speedup Text Classification 1

Intra-Parametric Analysis of a Fuzzy MOLP

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Analysis of Collaborative Distributed Admission Control in x Networks

Virtual Machine Migration based on Trust Measurement of Computer Node

Transcription:

Fuzzy Weghted Assocaton Rule Mnng wth Weghted Support and Confdence Framework M. Sulaman Khan, Maybn Muyeba, Frans Coenen 2 Lverpool Hope Unversty, School of Computng, Lverpool, UK 2 The Unversty of Lverpool, Department of Computer Scence, Lverpool, UK khanm@hope.ac.uk, muyebam@hope.ac.uk, frans@csc.lv.ac.uk} Abstract. In ths paper we extend the problem of mnng weghted assocaton rules. A classcal model of boolean and fuzzy quanttatve assocaton rule mnng s adopted to address the ssue of nvaldaton of downward closure property (DCP) n weghted assocaton rule mnng where each tem s assgned a weght accordng to ts sgnfcance w.r.t some user defned crtera. Most works on DCP so far struggle wth nvald downward closure property and some assumptons are made to valdate the property. We generalze the problem of downward closure property and propose a fuzzy weghted support and confdence framework for boolean and quanttatve tems wth weghted settngs. The problem of nvaldaton of the DCP s solved usng an mproved model of weghted support and confdence framework for classcal and fuzzy assocaton rule mnng. Our methodology follows an Apror algorthm approach and avods pre and post processng as opposed to most weghted ARM algorthms, thus elmnatng the extra steps durng rules generaton. The paper concludes wth expermental results and dscusson on evaluatng the proposed framework. Keywords: Assocaton rules, fuzzy, weghted support, weghted confdence, downward closure. Introducton Assocaton rules (ARs) [] have been wdely used to determne customer buyng patterns from market basket data. The task of mnng assocaton rules s manly to dscover assocaton rules (wth strong support and hgh confdence) n large databases. Classcal Assocaton Rule Mnng (ARM) deals wth the relatonshps among the tems present n transactonal databases [9, 0]. The typcal approach s to frst generate all large (frequent) temsets (attrbute sets) from whch the set of ARs s derved. A large temset s defned as one that occurs more frequently n the gven data set than a user suppled support threshold. To lmt the number of ARs generated a confdence threshold s used. The number of ARs generated can therefore be nfluence by careful selecton of the support and confdence thresholds, however great

care must be taken to ensure that temsets wth low support, but from whch hgh confdence rules may be generated, are not omtted. Gven a set of tems I,,... } and a database of transactons 2 m 2 p D t, t2,... t n} where t I, I,... I }, p m and I, f X I wth K X s called a k-temset or smply an temset. Let a database D be a mult-set of subsets of I as shown. Each An assocaton rule s an expresson X > Y, where X, Y are tem sets and X Y holds. Number of transactons T supportng an tem X w.r.t D s called support of X, Supp( X ) T D X T}/ D. The strength or confdence (c) for an assocaton rule X > Y s the rato of the number of transactons that contan X U Y to the number of transactons that contan X, Conf (X Y) Supp (X U Y)/ Supp (X). For non-boolean tems fuzzy assocaton rule mnng was proposed usng fuzzy sets such that quanttatve and categorcal attrbutes can be handled [2]. A fuzzy quanttatve rule represents each tem as (tem, value) par. Fuzzy assocaton rules are expressed n the followng form: I j T D supports an temset X I f X T holds. If X s A satsfes Y s B For example, f (age s young) > (salary s low) Gven a database T, attrbutes I wth temsets X I, Y I and X x, x2,... xn} and Y y, y2,... yn} and X Y fuzzy sets A fx, fx2,..., fxn} and B fx, fx2,..., fxn} Y respectvely. For example (, Y ), we can defne assocated to X and X could be (age, young), (age, old), (salary, hgh) etc. The semantcs of the rule s that when the antecedent X s A s satsfed, we can mply that Y s B s also satsfed, whch means there are suffcent records that contrbute ther votes to the attrbute fuzzy set pars and the sum of these votes s greater than the user specfed threshold. However, the above ARM framework assumes that all tems have the same sgnfcance of mportance.e. ther weght wthn a transacton or record s the same (weght) whch s not always the case. For example, [wne salmon, %, 80%] may be more mportant than [bread mlk, 3%, 80%] even though the former holds a lower support of %. Ths s because those tems n the frst rule usually come wth more proft per unt sale, but the standard ARM smply gnores ths dfference. Table. Wegted Items Database ID Item Proft Weght Scanner 0 0. 2 Prnter 30 0.3 3 Montor 60 0.6 4 Computer 90 0.9 Table 2. Transactons TID Items,2,4 2 2,3 3,2,3,4 4 2,3,4

Weghted ARM deals wth the mportance of ndvdual tems n a database [2, 3, 4]. For example, some products are more proftable or may be under promoton, therefore more nterestng as compared to others, and hence rules concernng them are of greater value. Items are assgned weghts (w) accordng to ther sgnfcance as shown n table. These weghts may be set accordng to an tem s proft margn. Ths generalzed verson of ARM s called Weghted Assocaton Rule Mnng (WARM). From table, we can see that the rule Computer Prnter s more nterestng than Computer Scanner because the proft of a prnter s greater than that of a scanner. The man challenge n weghted ARM s that downward closure property whch s crucal for effcent teratve process of generatng and prunng frequent temsets from subsets. In ths paper we address the ssue of downward closure property n WARM. We generalze and solve the problem of downward closure property and propose a weghted support and confdence framework for both boolean and quanttatve tems for classcal and fuzzy WARM (FWARM). We evaluate our proposed framework wth expermental results. The paper s organsed as follows: secton 2 presents background and related work; secton 3 gves problem defnton one; secton 4 gves problem defnton 2; secton 5 detals weghted downward closure property; secton 6 revews expermental results and secton 7 concludes the paper wth drectons for future work. 2 Background and Related Work Classcal ARM data tems are vewed as havng equal mportance but recently some approaches generalze ths where tems are gven weghts to reflect ther sgnfcance to the user [4]. The weghts may correspond to specal promotons on some products or the proftablty of dfferent tems etc. Currently, two approaches exst: pre- and post-processng. Post processng solves frst the non-weghted problem (weghts per tem) and then prunes the rules later. Pre -processng prunes the nonfrequent temsets earler usng weghts after every teraton. The ssue post-processed weghted ARM s that frst, tems are scanned wthout consderng ther weghts. Fnally, the rule base s checked for frequent weghted ARs. Ths gves us a very lmted temset pool to check weghted ARs and may mss many potental temsets. In pre-processng, classcal ARM prunes temsets by checkng frequent ones aganst weghted support after every scan. In pre-processng, less rules are obtaned as compared to post processng because many potental frequent super sets are mssed. In [2] a post-processng model s proposed. Two algorthms were proposed to mne temsets wth normalzed and un-normalzed weghts. The K-support bound metrc was used to ensure valdty of the downward closure property. Even that ddn t guarantee every subset of a frequent set beng frequent unless the k-support bound value of (K-) subset was hgher than (K). Effcent mnng methodology for Weghted Assocaton Rules (WAR) s proposed n [3]. A Numercal attrbute was assgned for each tem where the weght of the tem was defned as part of a partcular weght doman. For example, soda[4,6] snack[3,5] means that f a customer purchases soda n the quantty between 4 and 6

bottles, he s lkely to purchase 3 to 5 bags of snacks. WAR uses a post-processng approach by dervng the maxmum weghted rules from frequent temsets. Post WAR doesn t nterfere wth the process of generatng frequent temsets but focuses on how weghted assocaton rules can be generated by examnng the weghtng factors of the tems ncluded n generated frequent temsets. Smlar technques for weghted fuzzy quanttatve assocaton rule mnng [5, 7, 8]. In [6], a two-fold pre processng approach s used where frstly, quanttatve attrbutes are dscretsed nto dfferent fuzzy lngustc ntervals and weghts assgned to each lngustc label. A mnng algorthm s appled then on the resultng dataset by applyng two support measures for normalzed and un-normalzed cases. The closure property s addressed by usng the z-potental frequent subset for each canddate set. An arthmetc mean s used to fnd the possblty of frequent k+temset, whch s not guaranteed to valdate the vald downward closure property. Another sgnfcance framework, WARM, that handles the DCP problem, s proposed []. Weghtng spaces were ntroduced as nner-transacton space, tem space and transacton space, n whch tems can be weghted dependng on dfferent scenaros and mnng focus. However, support s calculated by only consderng the transactons that contrbute to the temset. Further, no dscussons were made on the confdence or nterestngness ssue of the rules produced In ths paper we present a fuzzy weghted support and confdence framework to mne weghted boolean and quanttatve data (by fuzzy means) to address the ssue of nvaldaton of downward closure property. We then show that usng the proposed framework, rules can be generated effcently wth a vald downward closure property wthout bases made by pre- or post-processng approaches. 3 Problem Defnton One (Boolean) Let the nput data D have transactons T t, t, t, L, tn} wth a set of tems 2 3 I,,, L, } and a set of non-negatve, real number weghts 2 3 I W w, w2, w3, L, w I } assocated wth each tem. Each th transacton some subset of I and a weght w s attached to each tem ] the th transacton). [ j t s t (the jth tem n Table 3. Transactonal Database T Items I t 2 3 4 5 t 2 3 5 t 3 2 4 t 4 4 5 t 5 2 3 4 Table 4. Items wth weghts Items.9 2.7 3.5 4.3 5. Weghts (IW)

Thus each tem.e. a par ) j wll have assocated wth t a weght correspondng to the set W, (, w s called a weghted tem where I. Weght for the jth tem n the th transacton s gven by t [ [ j. We llustrate the concept and defntons usng tables 3 and 4. Table 3 contans transactons for 5 tems. Table 4 has correspondng weghts assocated to each tem n T. In our defntons, we use sum of votes for each temset by multplyng weght occurrence per tem as a standard approach. Defnton Item Weght IW s a real value gven to each tem wth some degree of mportance, a weght j [w]. j rangng [0..] Defnton 2 Itemset Transacton Weght ITW s the product of weghts of all the tems n the temset present n a sngle transacton. Itemset transacton weght for an temset X can calculated as: vote for t satsfyng X X ( [ [ X ) t[ k [ k Itemset transacton weght ITW of temset (2, 4) s calculated as: ITW ( 2,4).7.3 0.2 Defnton 3 Weghted Support WS s the sum of temset transacton weght ITW of all the transactons n whch temset s present, dvded by the total number of transactons. It s denoted as: WS ( X ) Sum of votes satsfyng X Number of records n T n X k ( [ [ X ) t [ k[ Let s take an example of temset (2, 4), and fnd ts temset transacton weght, weghted support and weghted confdence. Weghted Support WS of temset (2, 4) s calculated as: Sum of votes satsfyng (2,4) 2,4) Number of records n T.63 0.26 5 n () (2) (.7.3) + (.7.3) + (.7.3) 5 Defnton 4 Weghted Confdence WC s the rato of sum of votes satsfyng both X Y to the sum of votes satsfyng X. It s formulated (wth Z X Y ) as:

WC( X Z) Y) X ) n k X Z k ( [ z[ Z) ( [ [ X ) t [ z [ k t [ x [ k (3) Weghted Confdence WC of temset (2, 4) s calculated as: Z ) X Y) 0.26 WC ( 2,4) WC(2,4) WC(2,4) 0.9 X ) X ) 0.4 4 Problem Defnton Two (Quanttatve/Fuzzy) Let a dataset D conssts of a set of transactons T t, t, t, L, tn} wth a set of tems I,,, L, } 2 3 I 2 3. A fuzzy dataset D conssts of fuzzy transactons T t, t, t,..., t } wth fuzzy sets assocated wth each tem n I, whch s 2 3 n dentfed by a set of lngustc labels l, l, l,..., l } example L small, medum, l arg e} L (for 2 3 L ). We assgn a weght w to each l n L assocated wth. Each attrbute t ] s assocated (to some degree) wth [ j several fuzzy sets. The degree of assocaton s gven by a membershp degree n the range [ 0..], whch ndcates the correspondence between the value of a gven t [ j ] and the set of fuzzy lngustc labels. The kth weghted fuzzy set for the jth tem n the th fuzzy transacton s gven by t [ [ l [ ]. Thus each label l k for tem a weghted tem where would have assocated wth t a weght,.e. a par ([ [ l]], w) j weght assocated wth label l. j k s called [ [ l]] L s a label assocated wth tem and w W s the Table 5. Fuzzy Transactonal Database TID X Y Small Medum Small Medum 0.5 0.5 0.2 0.8 2 0.9 0. 0.4 0.6 3.0 0.0 0. 0.9 4 0.3 0.7 0.5 0.5 Table 6. Fuzzy Items wth weghts Fuzzy Items [l] Weghts (IW) (X, Small).9 (X, Medum).7 (Y, Small).5 (Y, Medum).3

We llustrate the fuzzy weghted ARM concept and defntons usng tables 5 and 6. Table 5 contans transactons for 2 quanttatve tems further dscretsed nto two overlapped ntervals wth fuzzy vales. Table 4 has correspondng weghts assocated to each fuzzy tem [l] n T. Defnton 5 Fuzzy Item Weght FIW s a value attached wth each fuzzy set. It s a real number value range [ 0..] w.r.t some degree of mportance (table 6). Weght of a fuzzy set for an tem s denoted as [ l [ j j k. Defnton 6 Fuzzy Itemset Transacton Weght FITW s the product of weghts of all the fuzzy sets assocated to tems n the temset present n a sngle transacton. Fuzzy Itemset transacton weght for an temset (X, A) can be calculated as: vote for t satsfyng X L ( [ [ l[ ] X ) t [ j[ lk[ ] k Let s take an examp le of temset <(X, Medum), (Y, Small)> denoted by (X, Medum) as A and (Y, Small) as B. Fuzzy Itemset transacton weght FITW of temset (A, B) n transacton s calculated as FITW ( A, B) (.5.7) (.2.5) (.35) (.). 035 Defnton 7 Fuzzy Weghted Support FWS s the sum of weght FITW of all the transactons n whch temset s present, dvded by the total number of transactons. It s denoted as: FWS ( X ) Sum of votes satsfyng X Number of records n T n L k ( [ [ l[ ] X ) t[ j[ lk[ w n (4) (5) Weghted Support FWS of temset (A, B) s calculated as: Sum of votes satsfyng (A, B) F A, B) Number of records n T (.5.7)(.2.5) + (..7)(.4.5) + (.0.7)(..5) + (.7.7)(.5.5) 4.297 0.074 4 Defnton 8 Fuzzy Weghted Confdence FWC s the rato of sum of votes satsfyng both X Y to the sum of votes satsfyng X wth Z X Y. It s formulated as:

FWC( X Y ) FWS ( Z ) FWS ( X ) n k X Z ( [ z[ Z ) ( [ [ X) k t [ z k t [ x k [ [ Fuzzy Weghted Confdence ( FWC ) of temset (A, B) s calculated as: Z) X Y) 0.074 FWC ( A, B) 0.325 X ) X ) 0.227 5 Downward Closure Property (DCP) In a classcal Apror algorthm t s assumed that f the temset s large, then all ts subsets should also be large and s called Downward Closure Property (DCP). Ths helps algorthm to generate large temsets of ncreasng sze by addng tems to temsets that are already large. In the weghted ARM case where each tem s assgned weght, the DCP does not hold. Because of the weghted support, an temset may be large even though some of ts subsets are not large. Ths volates DCP (see table 7). Table 7. Frequent temsets wth nvald DCP T Items t A B C D E t 2 A C E t 3 B D t 4 A D E A B C D t 5 Items Weghts (IW). 2.3 3.6 4.9 5.7 mn_supp40% weghted_supp0.4 Large Itemsets Support Large? Weghted Support Large AB 40% Yes 0.6 No AC 60% Yes 0.42 Yes ABC 40% Yes 0.4 Yes BC 40% Yes 0.36 No BD 60% Yes 0.72 Yes BCD 40% Yes 0.72 Yes Table 7 shows four large temsets of sze 2 (AB, AC, BC, BD) and two large temsets of sze 3 (ABC, BCD, whch are a combnaton of two large temsets. In classcal ARM, when the weghts are not consdered, all of the sx temsets are large. But f we consder tem weghts and calculate the weghted support of temsets accordng to defnton 3 and 7, a new set of support values are obtaned. In table 7, although the

classcal support of all temsets s large, f ABC and BCD are frequent then ther subsets wll also be large accordng to classcal ARM. But weghted support of AB and BC are no longer frequent. In classcal ARM usng DCP, we assume that f AB and BC are not frequent, then ABC and BCD cannot be frequent so we don t consder generatng the supersets that contan non-frequent temsets. 5. Weghted Downward Closure Property (DCP) We now argue that the DCP wth boolean and quanttatve data can be valdated by usng ths new weghted framework. We gve a proof and an example to llustrate ths. Consder fgure, where tems n the transacton are assgned weghts wth supports above a user threshold. In fgure, for each temset, weghted support WS (the number above each temset) s calculated by usng defnton 3 and weghted confdence WC (the number on top of each temset) s calculated by usng defnton 4. If an temset s weghted support s above the threshold, the temset s frequent and we mark t wth colour background and compared to the whte background, meanng that t s not large. Fg.. The lattce of frequent temsets It can be noted that f an temset s wth whte background.e. not frequent, then any of ts supersets n the upper layer of the lattce can not be frequent. Thus weghted downward closure property, s vald under the weghted support framework. It justfes the effcent mechansm of generatng and prunng sgnfcance teratvely.

We also brefly prove that the DCP s always vald n the proposed framework. The followng lemma apples to both boolean and fuzzy (quanttatve) data and s stated as: Lemma If an temset s not frequent them ts superset cannot be frequent and WS ( subset ) sueprset ) s always true. Proof Gven an temset X not frequent.e. ws( X ) < mn_ ws. For any temset Y, X Y.e. superset of X, f a transacton t has all the tems n Y,.e. Y t, then X t. We use tx to denote a that transacton must also have all the tems n X,.e. set of transactons each of whch has all the tems n X,.e. tx tx T, ( t tx, X t)}. Smlarly we have ty ty T,( t ty, Y t)}. Snce X Y, we havetx ty. Therefore WS ( tx) ty). Accordng to the defnton of weghted n X ( [ [ w ]] X ) t[ k[ support, k X ) the denomnator stays the same, therefore n we have WS ( X ) WS ( Y ). Becausews( X ) < mn_ ws, we get ws( Y ) < mn_ ws. Ths then proves that Y s not frequent f ts subset s not frequent. Fgure 2 llustrates a concrete example. Itemset AC appears n transacton, 5 and 8, therefore the WS (AC) (.06+.06+.06)/00.08. Intutvely, the occurrence of ts superset ACE s only possble when AC appears n that transacton. But temset set ACE only appears n transactons and 8, thus WS (ACE).02+.02/00.0024, where WS (ACE) <WS (AC). Summatvely, f AC s not frequent, t s superset ACE s mpossble to be frequent; hence there s no need to calculate ts weghted support. 6 Expermental Results For fuzzy weghted assocaton rule mnng standard ARM algorthms can be used or at least adopted after some modfcatons. An effcent algorthm s requred because a sgnfcant amount of processng s undertaken to the applcaton of fuzzy weghted assocaton rule mnng. The proposed Weghted ARM (WARM) and Fuzzy Weghted ARM (FWARM) algorthms belong to the breadth frst traversal famly of ARM algorthms, developed usng tree data structures [3] and works n a fashon smlar to the Apror algorthm [0]. We performed several experments usng a T0I4N0KD00k (average of 0 tems per transacton, average of 4 tems per nterestng set, 000 attrbutes and 00,000 transactons/records) artfcal data set. The data set was actually generated usng the IBM Quest data generator. The data s a transactonal database contanng 00K records and 0K tems. Two sets of experments were undertaken wth four dfferent

algorthms namely Weghted ARM, Fuzzy WARM, Classcal Apror ARM and Classcal WARM shown n the results below:. In the frst experment we tested usng both boolean and fuzzy datasets and compared the outcome wth classcal ARFM and WARM algorthms. The results show qute smlar behavour to classcal ARM. Results are better than WARM because we consder the whole temset space (pool) to generate frequent tems unlke the pre- or post-processng WARM approaches. Experments show () the number of frequent sets generated (usng four algorthms ), () the number of rules generated (usng weghted confdence) and () executon tme usng all four algorthms. 2. Comparson of executon tmes usng dfferent weghted supports and data szes. 6.. Experment One: (Qualty Measures) For experment one, the T0I4D00K dataset descrbed above was used wth 50 weghted attrbutes. Each tem s assgned a weght range between [ 0..]. Wth fuzzy dataset each attrbute s dvded nto fve dfferent fuzzy sets. Fgure 3 shows the number of frequent temsets generated usng () weghted boolean dataset and () wth weghted quanttatve attrbutes wth fuzzy parttons () classcal ARM wth boolean dataset and (v) and WARM wth weghted boolean datasets. A range of support thresholds was used. As expected the number of frequent temsets ncreases as the mnmum support decreases n all cases. In fgure 2, Weghted ARM shows the number of frequent temsets generated usng weghted boolean datasets. Fuzzy WARM shows the number of frequent temsets usng attrbutes wth fuzzy lngustc values, Classcal Apror shows the number of frequent temset usng boolean dataset and classcal WARM shows number of frequent temsets generated usng weghted boolean datasets wth dfferent weghted support thresholds. More frequent temsets and rules are generated because of the large temset pool. Frequent Itemsets 600 500 400 300 200 00 Weghted ARM Fuzzy WARM Classcal ARM WARM Number of Rules 200 60 20 80 40 Weghted ARM Fuzzy WARM Classcal ARM WARM 0 2 3 4 5 6 7 8 9 0 Weghted Support (%) 0 20 30 40 50 60 70 80 90 Weghted Confdence (%) Fg. 2. No. of frequent Itemsets Fg. 3. No. of Interestng Rules usng Confdence We do not use Apror ARM to frst fnd frequent temsets and then re-prune them usng weghted support measures. Instead all the potental temsets are consdered from begnnng for prunng usng Apror approach n order to valdatng the DCP. In

contrast classcal WARM only consders frequent temsets and prunes them (usng pre or post processng). Ths generates less frequent temsets and msses potental ones. Fgures 3 shows the number of nterestng rules generated usng weghted confdence, fuzzy weghted confdence and classcal confdence values respectvely. In all cases, the number of nterestng rules s less as compared to fgure 2. Ths s because the nterestngness measure generates fewer rules. Fgure 4 shows the executon tme of four algorthms. Executon Tme (sec) 50 40 30 20 0 Weghted ARM Fuzzy WARM Classcal ARM WARM 0 2 3 4 5 6 7 8 9 0 Weghted Support (%) Fg. 4. Executon tme to generate frequent temsets The experments show that the proposed framework produces better results as t uses all the possble temsets and generates rules usng the DCP. Further, the novelty s the ablty to analyse both boolean and fuzzy datasets wth weghted settngs. 6.2. Experment Two: (Performance Measures) Experment two nvestgated the effect on executon tme caused by varyng the weghted support and sze of data (number of records). Executon Tme (sec) 80 Weghted Support % 70 Weghted Support 2% Weghted Support 3% 60 Weghted Support 4% 50 Weghted Support 5% Weghted Support 6% 40 30 20 0 0 0 20 30 40 50 60 70 80 90 00 Performance measures: Number of records (*k) Executon Tme (sec) 00 Fuzzy_WS % 90 Fuzzy_WS 2% 80 Fuzzy_WS 3% 70 Fuzzy_WS 4% 60 Fuzzy_WS 5% Fuzzy_WS 6% 50 40 30 20 0 0 0 20 30 40 50 60 70 80 90 00 Performance measures: Number of records (*k) Fg. 5. Performance: weghted support (WS) Fg. 6. Performance: fuzzy WS A support threshold from 0. to 0.6 and confdence 0.5 was used. Fgures 5 and 6 show the effect of ncreasng the weghted support and number of records. To obtan dfferent data szes, we parttoned T0I4D00K nto 0 equal parttons labeled 0K,

20K,..,00K. Dfferent weghted support thresholds were used wth dfferent datasets. Smlarly from fgure 6, the algorthm scales lnearly wth ncreasng fuzzy weghted support threshold and number of records, smlar behavour to Classcal ARM. 7 Concluson and future work In ths paper, we have presented a weghted support and confdence framework for mnng weghted assocaton rules (Boolean and quanttatve data) by valdatng the downward closure property (DCP). We used classcal and fuzzy ARM to solve the ssue of nvaldaton of DCP n weghted ARM. We generalzed the DCP and proposed a fuzzy weghted ARM framework. The problem of nvaldaton of downward closure property s solved usng mproved model of weghted support and confdence framework for classcal and fuzzy assocaton rule mnng. There are stll some ssues wth dfferent measures for valdatng DCP, normalzaton of values etc whch are worth nvestgatng. References. Tao, F., Murtagh, F., Fard, M.: Weghted Assocaton Rule Mnng Usng Weghted Support and Sgnfcance Framework. In: Proceedngs of 9th ACM SIGKDD Conference on Knowledge Dscovery and Data Mnng, pp. 66 666, Washngton DC (2003). 2. Ca, C.H., Fu, A.W-C., Cheng, C. H., Kwong, W.W.: Mnng Assocaton Rules wth Weghted Items. In: Proceedngs of 998 Intl. Database Engneerng and Applcatons Symposum (IDEAS'98), pages 68--77, Cardff, Wales, UK, July 998 3. Wang, W., Yang, J., Yu, P. S.: Effcent Mnng of Weghted Assocaton Rules (WAR). In: Proceedngs of the KDD, Boston, MA, August 2000, pp. 270-274 4. Lu, S., Hu, H., L, F.: Mnng Weghted Assocaton Rules, Intellgent data Analyss Journal, 5(3), 2--255 (200) 5. Wang, B-Y., Zhang, S-M.: A Mnng Algorthm for Fuzzy Weghted Assocaton Rules. In: IEEE Conference on Machne Learnng and Cybernetcs, 4, pp. 2495--2499 (2003) 6. Gyenese, A.: Mnng Weghted Assocaton Rules for Fuzzy Quanttatve Items, Proceedngs of PKDD Conference pp. 46--423 (2000). 7. Shu, Y. J., Tsang, E., Yeung, Damng, S.: Mnng Fuzzy Assocaton Rules wth Weghted Items, IEEE Internatonal Conference on Systems, Man, and Cybernetcs, (2000). 8. Lu, J-J.: Mnng Boolean and General Fuzzy Weghted Assocaton Rules n Databases, Systems Engneerng-Theory & Practce, 2, 28--32 (2002) 9. Agrawal, R., Srkant, R.: Fast Algorthms for Mnng Assocaton Rules. In: 20 th VLDB Conference, pp. 487--499 (994) 0. Bodon, F.: A Fast Apror mplementaton. In: ICDM Workshop on Frequent Itemset Mnng Implementatons, vol. 90, Melbourne, Florda, USA (2003). Agrawal, R., Imelnsk, T., Swam, A.: Mnng Assocaton Rules Between Sets of Items n Large Databases. In: 2th ACM SIGMOD on Management of Data, pp. 207--26 (993) 2. Kuok, C.M., Fu, A., Wong, M.H.: Mnng Fuzzy Assocaton Rules n Databases. SIGMOD Record, 27, (), 4--46 (998) 3. Coenen, F., Leng, P., Goulbourne, G.: Tree Structures for Mnng Assocaton Rules. Data Mnng and Knowledge Dscovery, 8() 25--5 (2004)