A Fast Association Rule Algorithm Based On Bitmap and Granular Computing

Similar documents
A Fast Association Rule Algorithm Based on Bitmap Computing with Multiple Minimum Supports using Maximum Constraints

A Fast Association Rule Algorithm Based On Bitmap and Granular Computing

Laboratory Exercise 6

Data Mining with Linguistic Thresholds

1 The secretary problem

Universität Augsburg. Institut für Informatik. Approximating Optimal Visual Sensor Placement. E. Hörster, R. Lienhart.

Bitmap Techniques for Optimizing Decision Support Queries and Association Rule Algorithms

New Structural Decomposition Techniques for Constraint Satisfaction Problems

Performance of a Robust Filter-based Approach for Contour Detection in Wireless Sensor Networks

DAROS: Distributed User-Server Assignment And Replication For Online Social Networking Applications

xy-monotone path existence queries in a rectilinear environment

Lecture 14: Minimum Spanning Tree I

Distributed Packet Processing Architecture with Reconfigurable Hardware Accelerators for 100Gbps Forwarding Performance on Virtualized Edge Router

Analyzing Hydra Historical Statistics Part 2

How to Select Measurement Points in Access Point Localization

Chapter S:II (continued)

On successive packing approach to multidimensional (M-D) interleaving

Delaunay Triangulation: Incremental Construction

A Multi-objective Genetic Algorithm for Reliability Optimization Problem

How to. write a paper. The basics writing a solid paper Different communities/different standards Common errors

Topics. Lecture 37: Global Optimization. Issues. A Simple Example: Copy Propagation X := 3 B > 0 Y := 0 X := 4 Y := Z + W A := 2 * 3X

Key Terms - MinMin, MaxMin, Sufferage, Task Scheduling, Standard Deviation, Load Balancing.

Laboratory Exercise 6

Minimum congestion spanning trees in bipartite and random graphs

MAT 155: Describing, Exploring, and Comparing Data Page 1 of NotesCh2-3.doc

Edits in Xylia Validity Preserving Editing of XML Documents

Building a Compact On-line MRF Recognizer for Large Character Set using Structured Dictionary Representation and Vector Quantization Technique

Hassan Ghaziri AUB, OSB Beirut, Lebanon Key words Competitive self-organizing maps, Meta-heuristics, Vehicle routing problem,

Shortest Paths with Single-Point Visibility Constraint

Routing Definition 4.1

Planning of scooping position and approach path for loading operation by wheel loader

The Association of System Performance Professionals

Maneuverable Relays to Improve Energy Efficiency in Sensor Networks

A Sparse Shared-Memory Multifrontal Solver in SCAD Software

A Novel Feature Line Segment Approach for Pattern Classification

Keywords Cloud Computing, Service Level Agreements (SLA), CloudSim, Monitoring & Controlling SLA Agent, JADE

Distribution-based Microdata Anonymization

Laboratory Exercise 6

3D SMAP Algorithm. April 11, 2012

Image authentication and tamper detection using fragile watermarking in spatial domain

Laboratory Exercise 6

(12) Patent Application Publication (10) Pub. No.: US 2003/ A1

Multi-Target Tracking In Clutter

Web Page Recommendation Approach Using Weighted Sequential Patterns and Markov Model

New Structural Decomposition Techniques for Constraint Satisfaction Problems

CERIAS Tech Report EFFICIENT PARALLEL ALGORITHMS FOR PLANAR st-graphs. by Mikhail J. Atallah, Danny Z. Chen, and Ovidiu Daescu

LinkGuide: Towards a Better Collection of Hyperlinks in a Website Homepage

Shortest Paths Problem. CS 362, Lecture 20. Today s Outline. Negative Weights

A Load Balancing Model based on Load-aware for Distributed Controllers. Fengjun Shang, Wenjuan Gong

Operational Semantics Class notes for a lecture given by Mooly Sagiv Tel Aviv University 24/5/2007 By Roy Ganor and Uri Juhasz

An Intro to LP and the Simplex Algorithm. Primal Simplex

Distributed Partial Information Management (DPIM) Schemes for Survivable Networks - Part II

A Practical Model for Minimizing Waiting Time in a Transit Network

Advanced Encryption Standard and Modes of Operation

Localized Minimum Spanning Tree Based Multicast Routing with Energy-Efficient Guaranteed Delivery in Ad Hoc and Sensor Networks

A Boyer-Moore Approach for. Two-Dimensional Matching. Jorma Tarhio. University of California. Berkeley, CA Abstract

UC Berkeley International Conference on GIScience Short Paper Proceedings

Today s Outline. CS 561, Lecture 23. Negative Weights. Shortest Paths Problem. The presence of a negative cycle might mean that there is

Generic Traverse. CS 362, Lecture 19. DFS and BFS. Today s Outline

A TOPSIS based Method for Gene Selection for Cancer Classification

arxiv: v1 [cs.ds] 27 Feb 2018

SLA Adaptation for Service Overlay Networks

AUTOMATIC TEST CASE GENERATION USING UML MODELS

Cutting Stock by Iterated Matching. Andreas Fritsch, Oliver Vornberger. University of Osnabruck. D Osnabruck.

Increasing Throughput and Reducing Delay in Wireless Sensor Networks Using Interference Alignment

Karen L. Collins. Wesleyan University. Middletown, CT and. Mark Hovey MIT. Cambridge, MA Abstract

CSE 250B Assignment 4 Report

Comparison of Methods for Horizon Line Detection in Sea Images

Combining Web Usage Mining and Fuzzy Inference for Website Personalization

Using Partial Evaluation in Distributed Query Evaluation

Stochastic Search and Graph Techniques for MCM Path Planning Christine D. Piatko, Christopher P. Diehl, Paul McNamee, Cheryl Resch and I-Jeng Wang

Set-based Approach for Lossless Graph Summarization using Locality Sensitive Hashing

A METHOD OF REAL-TIME NURBS INTERPOLATION WITH CONFINED CHORD ERROR FOR CNC SYSTEMS

Service and Network Management Interworking in Future Wireless Systems

Size Balanced Tree. Chen Qifeng (Farmer John) Zhongshan Memorial Middle School, Guangdong, China. December 29, 2006.

A Linear Interpolation-Based Algorithm for Path Planning and Replanning on Girds *

arxiv: v3 [cs.cg] 1 Oct 2018

A Local Mobility Agent Selection Algorithm for Mobile Networks

Modeling the Effect of Mobile Handoffs on TCP and TFRC Throughput

A study on turbo decoding iterative algorithms

A note on degenerate and spectrally degenerate graphs

( ) subject to m. e (2) L are 2L+1. = s SEG SEG Las Vegas 2012 Annual Meeting Page 1

Domain-Specific Modeling for Rapid System-Wide Energy Estimation of Reconfigurable Architectures

Trainable Context Model for Multiscale Segmentation

Contents. shortest paths. Notation. Shortest path problem. Applications. Algorithms and Networks 2010/2011. In the entire course:

See chapter 8 in the textbook. Dr Muhammad Al Salamah, Industrial Engineering, KFUPM

Computer Arithmetic Homework Solutions. 1 An adder for graphics. 2 Partitioned adder. 3 HDL implementation of a partitioned adder

else end while End References

Course Project: Adders, Subtractors, and Multipliers a

Chapter 13 Non Sampling Errors

AN ALGORITHM FOR RESTRICTED NORMAL FORM TO SOLVE DUAL TYPE NON-CANONICAL LINEAR FRACTIONAL PROGRAMMING PROBLEM

Modeling of underwater vehicle s dynamics

The norm Package. November 15, Title Analysis of multivariate normal datasets with missing values

Power Aware Location Aided Routing in Mobile Ad-hoc Networks

VLSI Design 9. Datapath Design

Laboratory Exercise 2

Application of Social Relation Graphs for Early Detection of Transient Spammers

An Approach to a Test Oracle for XML Query Testing

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

DWH Performance Tuning For Better Reporting

Transcription:

A Fat Aociation Rule Algorithm Baed On Bitmap and Granular Computing T.Y.Lin Xiaohua Hu Eric Louie Dept. of Computer Science College of Information Science IBM Almaden Reearch Center San Joe State Univerity Drexel Univerity 650 Harry Road San Joe, California 95192 Philadelphia, PA 19104 San Joe, CA 95120 tylin@c.ju.edu thu@ci.drexel.edu ewlouie@almaden.ibm.com Abtract Mining aociation rule from databae i a time-conuming proce. Finding the large item et fat i the crucial tep in the aociation rule algorithm. In thi paper we preent a fat aociation rule algorithm (Bit-AoRule) baed on granular computing. Our Bit-AocRule doen t follow the generation-and-tet trategy of Apriori algorithm and adopt the divide-and-conquer trategy, thu avoid the time-conuming table can to find and prune the itemet, all the operation of finding large itemet from the dataet are the fat bit operation baed on it correponding granular. The experimental reult of our Bit- AocRule algorithm with Apriori, AprioriTid and AprioirHybrid algorithm how Bit-AocRule i 2 to 3 order of magnitude fater. Our reearch indicate that bitmap and granular computing can greatly improve the performance of aociation rule algorithm, and are very promiing for data mining application. 1. Introduction An aociation rule in a tranaction databae i an expreion X Y, where X, and Y are et of item, X Y = φ [1,2]. Given a et of tranaction D, the problem of mining aociation rule i to generate all aociation rule that meet certain uer-pecific minimum upport and confidence. The problem can be decompoed into two ubproblem: (1) finding all combination of item that have tranaction upport above the minimum upport, (2) ue the large itemet to generate the deired rule. A lot of aociation rule algorithm have been developed in the lat decade [1,2,3,4,6,9,11,12,13,14], which can be claified into two categorie: (1) candidate-generation-and-tet approach uch a Apriori [2], (2) pattern-growth approach [6,12,13]. The challenging iue of aociation rule algorithm are multiple can of tranaction databae and huge number of candidate. In thi paper we preent a novel aociation rule algorithm Bit- Aoc baed on bitmap and granular computing approach. Traditional Apriori algorithm require full table can and multiple pae of the itemet in order to finding aociation rule from large databae. Our Bit-AocRule avoid thee time-conuming operation and relie on the fat bit operation of it granular to find the large itemet. With bitmap technique, we can greatly improve the performance of the aociation rule algorithm. The ret of the paper i organized a follow: we dicu granular computing and bitmap technique in Section 2. In Section 3 we preent the granular-baed aociation algorithm Bit- AocRule and the comparion reult of Bit-AocRule with variou Apriori algorithm (Apriori, AprioriTid and AprioriHybrid). Section 4 conclude the paper with ome dicuion 2. Granular Computing and Bitmap Technique Granular computing wa firt propoed by TY Lin [7] and ha become a very important tool in data mining ince then. A granule i a clump of object drown together by inditinguihability, imilarity, proximity or functionality. The equivalent relation are the granule of the relation. Each unique attribute value in the relational table i a granule, and each granule i a lit of tuple that have the ame attribute value. Only the tuple name, or the reference to the tuple, i tored in the granule. The bitmap technique wa propoed in the 1960 [5] and ha been ued by a variety of product ince then. Bitmap technique deliver far uperior query performance on unelective (low cardinality) data than traditional B-Tree indexing technique. Bitmap repreent each ditinct value a array of bit where a 1 or 0 in each relative poition in the array repreent True or Fale for that value for the correponding relative record within the databae relational table. Thi approach i ometime referred a inverted-lit. But the real benefit of the bitmap index i the proceing peed. Combining the proce of performing a logical operation (AND, OR or NOT) on a eriou of bitmap i very efficient, particularly compared with performing imilar procee on lit of tuple-id. The granular concept and bitmap index are very cloely related to each other. Other than a lit, the granular can be repreented in bitmap. Each tuple in the relation ha one unique offet poition in the bitmap. The bit are et to 1 for thoe tuple having the attribute value of the granule, and the bit are et to 0 for thoe tuple not having the attribute value of the granule. Thi i the bitmap repreentation of the granule of the lit. 1

3. An Aociation Rule Algorithm Bit-AocRule The mot influential algorithm Apriori (a decribed below) developed by Rakeh etc [1,2] generate the k-candidate by combining two (k-1)-itemet that have the firt k-2 attribute value in the two (k-1)-itemet the ame the lat pair doe not. A new k- candidate become a k large itemet if every (k-1)-ubet of the k-candidate i a large itemet otherwie it i removed. Thi algorithm need to do table can of the whole data et and examine the item et multiple time, the proce i very time conuming. Algorithm Apriori L 1 = {large 1-itemet} For ( k=2; L k-1 φ ; k++) do begin C k = apriori-gen(l k-1 ) ; // new candidate Forall tranaction t D do begin C t =ubet(c k, t); // Candidate contained in t Forall candidate c C t do c.count ++ end L k = {c C k c.count >= minup} End Anwer = k L k (L k : Set of large k-itemet (thoe with minimum upport). Each member of thi et ha two field: (1) itemet, and (2) upport count. C k : Set of candidate k-itemet (potential large itemet). Each member of thi et ha to field: (1) itemet and (2) upport count) The aprior-gen function take a an argument L k-1, the et of all large (k-1)-itemet. It return a uperet of the et of all large k- itemet. Firt, in the join tep, L k-1 join with L k-1 to obtain a uperet of the final et o candidate C k. The union p q of itemet p, q L k-1 i inerted in C k if they hare k-2 firt item. Recently, many attempt have been given to applying bitmap technique in the aociation rule algorithm in [7,8,10,15,18,19]. The ue of bitmap improve the performance to find aociation rule. The bit repreentation of bitmap offer efficient torage while the interection of bitmap offer fat computation in finding aociation rule. The AND, SHIFT, and COUNT operation among bitmap are extremely fat. Unlike the traditional Apriori algorithm which generate k-candidate by combining two (k-1) large itemet. Our Bit-AocRule algorithm generate k-candidate by interecting the bitmap of 1 attribute value with bitmap of other (k-1) attribute value. Below we decribe the algorithm in detail. 3.1 The Generation of Combination Creating combination, or pattern, i a computational proce of forming et of attribute value. The number of combination for one elected attribute i the number of ditinct value of the elected attribute. Each candidate contain one attribute value from that elected attribute i the Carteian product, q 1 q 2, where q 1 and q 2 are the number of ditinct value the firt and econd elected attribute repectively. Each combination contain two attribute value: one attribute value form the firt and one attribute value form the econd. For the general cae, the number of k-candidate i q 1 q 2 q 3 q k, where q i i the number of ditinct value of the i th attribute. Each k-candidate contain one attribute value from each elected attribute. The more general cae i to create k-candidate among the n attribute in the relation. Since no column are elected beforehand, k unique attribute are choen from the n attribute in the relation, and from thoe choen k attribute, all combination of attribute value form the domain of thoe attribute are formed. There are C(n,k) poible way to chooe k attribute form n attribute, and each poible way ha it own et of k-candidate among the k attribute. In hort, the number of k-candidate i the total um of the each ubtotal combination from each poible way to elect k attribute. The general equation for the total number of k-candidate on n attribute in the relation i the following: Number of candidate of Length k = g=1 n-(k-1) (q g ( h=g+1 n-(k-2) q h ( i=h+1 n-(k-3) q i ( z=y+1 n q z )))) o far, the equation deal with variou number of ditinct value on the column in the relation. For the pecial cae, uppoe all attribute in the relation have the ame number of ditinct value, F, the equation implifie to C(n,k)F k. Generally, the number of ditinct value of each attribute in the relation are not the ame. Nonethele, taking the average number of ditinct value of all the attribute may be ueful to etimate the number of combination of length k: c(n,k)(( n g=1 q g )/n) k For a relation with many attribute or attribute value, the number of combination can be very large. Each combination require a count of the number of tuple in the relation that the combination contain. The combination with the count greater than the minimal upport are aociation rule. The next ubection demontrate method to reduce the number of combination. In doing o, the number of comparion between the combination and the tuple in the relation are aved. The number of combination i an important factor to conider in the proce. Each combination ha the bitmap in the combination interected and the reult counted. If a combination doe not have the potential of becoming a large itemet, the combination hould not be undergoing thi proce. So only potential combination are generated in the proce. The algorithm tart with a lit L 1, which contain all the 1- itemet (all the count of the bitmap of thee 1-itemet are 2

greater than the minimal counter number). When making k- candidate, all the element in the lit L 1 are verified if they exit a element in any (k-1)-itemet. If an element doe not exit in any (k-1)-itemet, it i removed from the lit. Next, new k- candidate are created from the (k-1)-itemet and the lit L by joining a (k-1)-itemet with element in L that ha an attribute index greater than all attribute indexe of element in that (k-1)- itemet. Only the new k candidate that have every (k-1)-ubet a large itemet are kept. Algorithm Bit-AocRule L 1 = {bitmap of large 1-itemet} For ( k=2; L k-1 φ ; k++) do begin Remove thoe element in L 1 which are not included in any itemet of L k-1 //prune the L 1 C k = Bit-apriori-gen(L k-1, L 1 ) ; // new candidate L k = {c C k bitmap count of c >= minup} End Anwer = k L k The algorithm tart with a lit L 1, which contain attribute value (alo called 1-itemet, all the count of the bitmap of thee 1-itemet are greater than the minimal counter number). The k- candidate conit of k attribute value (X 1, t1, X 2,t2,, X k-1,tk-1, X ij ) from k attribute. Uing bitmap technique, the candidate i a large itemet if the bit count on the interection of all the bitmap B 1 B 2 B k (uppoe B j i the bitmap of the X k,tj ) i equal or greater than the minimal count. The bit count i the number of 1 in the bitmap indexe from the reult of the interection of the bitmap. Below i an example of creating 4-combination from 3- itemet. Suppoe the following are the current 3-itemet {X 6,0 X 11,0 X 12,0 }, {X 6,0 X 11,0 X 14,2 )}, {X 6, 0 X 11, 0 X 15,2 }, {X 6,0 X 11,1 X 12,3 }, {X 6,0 X 11,1 X 14,2 }, {X 6,0 X 11,1 X 15,2 }, {X 6,0 X 14,2 X 15,2 }, {X 6,1 X 11,0 X 12,0 }, {X 6,1 X1 1,2 X 12,3 }, {X 11,0 X 14,2 X 15,2 }, {X 11,1 X 14,2 X 15,2 }. And uppoe the following are the 1-itemet in the lit L 1 = {X 4,0, X 4,10, X 6,0, X 6,1, X 11,0, X 11,1, X 12,0, X 12,3, X 14,2, X 15,2 }. Firt, ince X 4,0, X 4,10 are not in any 3-itemet, thu X 4,0 and X 4,10 are removed from the lit L, Any 4-combination that include thee attribute value would not be an large itemet. Below are the remaining attribute value in the lit L={X 6,0, X 6,1, X 11,0, X 11,1, X 12,0, X 12,3, X 14,2, X 15,2 }. Next each 3-itemet i combined with attribute value in L to create 4-candidate, if poible, Thi 3-itemet {X 6,0 X 11,0 X 12,0 } i combined with thee two attribute value in L: X 14,2, and X 15,2. thi 3-itemet i combined with one attribute value in L X 15,2 and o on with each 3 itemet. All the 4-combination are hown below: {X 6,0 X 11,0 X 12,0 X 14,2 }, {X 6,0 X 11,0 X 12,0 X 15,2 }, {X 6,0 X 11,0 X 14,2 X 15,2 }, {X 6,0 X 11,1 X 12,3 X 14,2 }, {X 6,0 X 11,1 X 12,3 X 15,2 }, {X 6,0 X 11,1 X 14,2 X 15,2 }, {X 6,1 X 11,1 X 12,3 X 15,2 }, {X 6,0 X 11,1 X 14,2 X 15,2 }, {X 6,1 X 11,0 X 12,0 X 14,2 }, {X 6,1 X 11,0 X 12,0 X 15,2 },{X 6,1 X 11,1 X 12,3 X 14,2 }, {X 6,1 X 11,1 X 12,3 X 15,2 } All the 4-combination are generated and their bitmap are counted. If the bitmap counter of the 4-candidate i greater or equal to the minimum upport, then it become a 4 large itemet, otherwie it i deleted. 3.2 The Storage and Management of Bitmap The bitmap are tored on dik in data page. The data page offer flexibility in acceing and proceing the bitmap. Portion of the bitmap called lice, are tored in the data page and the data page are connected by link. Below how an example of everal data page holding three bitmap. Each bitmap ha it initial page on a different page., The initial page for each bitmap are not necearily conecutive. Suppoe there are 3 different value X i,k1, X j,k2, X lk3 from the i th, j th and l th attribute in the data et, their bitmap are B i,k1 B j,k2, B l,k3. Bitmap Initial Page # B i,k1 1 B j,k2 2 B l,k3 5 Each bitmap require the ame number of data page to tore the binary repreentation of the lit, and for thi example, let aume that each bitmap ha 5 data page. 1 3 7 2 4 8 10 13 For the combination, {X i,k1, X j,k2, X lk3 }, the bitmap B i,k1 B j,k2, B l,k3 are interected and the reult i counted. Since the bitmap are tored in data page, the data page 1,2,and 5 are interected firt, and the bit in the reult are counted. Next page 3, 4, and 6 are interected, and the bit in thi interection are counted. Thi ubtotal i added to the previou ubtotal. Thi continue on for the third, forth and fifth page of the bitmap B i,k1 B j,k2 and B l,k3. 11 14 5 6 9 12 15 3

After the lat page, the combination {X i,k1, X j,k2, X lk3 } i a large itemet if the total counter i greater than the minimal upport. Through the interection and bit counting, many data page are read among the k-combination in each cycle. Even though the bitmap in each k-combination do not repeat within the k- combination, the bitmap in one k-combination may appear among other k-combination. Thi mean that data page read for one combination may be needed for other combination. So to reduce the phyical data page read for bitmap, all the n th data page on each bitmap from each k-combination are proceed at the ame time. In other word, all the n th data page on each bitmap are interected, the reult of the interection are counted, and the count are added to each combination ubtotal before continuing to the next data page on each combination. A a reult, the data page for each bitmap are read once from dik to memory for k-combination in determining which are large item et. The total torage cot for bitmap i baed on the number of bitmap in the relation and the number of data page per bitmap. The number of page per bitmap i dependent on the number of tuple in the relation and the number of tuple that can be tored per page. The equation for the total torage cot i the following: Storage_cot = Number of bitmap * Number of tuple / Max bit per page * Data_page_ize For example, the relation ha 1000 bitmap and 1000,000 tuple. Each data page i 4096 byte, and 4080 out of 4096 byte are available to tore data for bitmap (the lat 16 byte i ued a a pointer to point the next page in the bitmap). Then, the following i the total torage cot: Storage cot = 1000 * 1000000 / (4080*8) * 4096 = 125Mbyte 3.3 Comparion Between Apriori, It Variation and Bit- AocRule Algorithm The traditional Apriori algorithm, it variation and bitmapbaed Bit-AocRule are compared baed on their key operation. For the Bit-AocRule, the number of AND operation between the bitmap determine the cot. For the Apriori algorithm, the number of comparion between the attribute value in the candidate and the tuple determine the cot. A ration i ued to compare the two algorithm Apriori and Bit-AocRule X= Number of Comparion operation in Apriori / Number of AND operation in Bit-AocRule For the Aprioir algorithm if there are p candidate and q tuple in the relation, p*q comparion are needed to determine the count for the p candidate. A hah-tree table i ued to reduce the comparion. The node of the hah-tree table i either a leaf node containing ome candidate or an interior node containing a hah table [2]. The hah-table conit of reference to the interior node or leaf node. The hah-tree table reduce the candidate to compare on each tuple by didiving the p candidate among the leaf mode. All the p candidate have the ame length, k. Each tuple in the relation ha C(n,k) ubet, and each ubet i hahed on the hah-tree table. For ubet that hah onto a non-viited leaf node, each candidate in thoe leaf node i compared to the tuple. The ubet that hah to a viited leaf node or interior node are kipped. Thi i the ubet function that i applied each tuple in the relation. The cot of the ubet function i determined by two factor: the number of ubet per tuple and the number of candidate for all viited-once node per tuple. The equation i: m(c(n,k)) + i=1 m (v i *t). The firt term i the cot of calling the hah function for m tuple in the relation. The econd term i the um cot of each tuple comparion. The v i i the number of viited-once leaf node per tuple, and t i the average number of candidate per leaf node, n i the number of attribute in the relation and k i the current length of the candidate in the hah-tree table. Apriori i improved further by hort-circuiting the comparion on each candidate to each tuple and by comparing the k- candidate to a (k-1)-candidate bar table coniting of (k-1)- candidate id. The firt improvement ha the comparion top once the firt element in the k-candidate i not contained in the tuple. The econd improvement ue the (k-1)-candidate bar table to compare only the (k-1)-candidate id of the tuple. Baically, when a k-candidate i compared to a tuple in the (k-1)- candidate bar table, the two (k-1)-candidate id, that generated the k-candidate, i compared to the (k-1)-candidate id in the tuple. If both (k-1)-candidate id exit for that tuple, the k- candidate id for thi k-candidate i inerted into the k-candidate bar table for that tuple. For more detail, refer to [2,9]. The Apriori and ApriorTid algorithm generate the candidate itemet to be counted in a pa by uing only the itemet found large in the previou pa without conidering the tranaction in the databae. The AprioriTid algorithm ha the additional property that the databae i not ued at all for counting the upport of candidate itemet after the firt pa. AprioriHybrid ue Apriori in the initial pae and witche to AprioriTid when it expect that the et of candidate itemet at the end of the pa will fit in the memory [1,2]. The wort cae for the Apriori algorithm i m*(k*q) comparion operation, where m, k defined above and q i the average number of attribute value per candidate. Each candidate compare with each tuple to determine if it i contained in the tuple. 4

The wore cae for Bit-AocRule algorithm i (m/32)*(k-1)*q AND operation. So, the ratio i the following: X= m * (r*q) / (m/32*(r-1)*q) = 32 * k / (k-1) The Bit-AocRulemethod can execute 32 time fater or more than Apriori in theory. The ubet function in Apriori lower the cot if the ubet function reduce the number of r-candidate to compare to each tuple in the relation. However, the cot varie per tuple in the relation and i affected by the number of attribute in the relation. 3.4 Experimental Run: Apriori, AprioriTid, AprioriHybrid and Bit-AocRule Three different data are generated to compare the run time on thee algorithm to find aociation rule. The data varie in the number of tuple, the attribute per relation and the minimal upport. The program for Apriori, AprioriTid and AprioriHybrid are our honet implementation of the algorithm in [2. In the implementation, we ue ome buffer cheme to peedup read/write for all algorithm. In the implementation, we ue ome buffer cheme to peedup read/write for all algorithm. The tet were conducted uing an IBM PC with 933Mhz CPU, 512MB memory under Window 2000. The program i coded in C++. Data et Row #of item Table ize Bitmap ize DS1 400K 16 199 25.6MB 10.6MB 20K DS2 800K 20 247 64MB 25.0MB 40K DS3 1.0M 30 709 120MB 90MB 50K Table 1: 3 Data Set L e n g t h # Cand # item et Column Minimal Support BitAoRule Apri- ori- Hybrid Apriori- Tid Apriori DS1 1 199 188 3.966 4.106 4.106 4.105 2 16333 103 18.426 1402.9 77 1669.180 1403. 438 3 92 10 0.111 1.833 79.374 5.979 4 0 0 0 0 0 0 Total time 22.503 1408.9 16 DS2 1752.660 1413. 522 1 247 235 10.375 10.275 10.275 10.78 6 2 26033 88 56.371 4496.2 45 5305.439 3 0 0 0 0.01 0 0 Total time 66.746 4506.5 5315.714 30 DS3 1 709 709 21.981 20.089 2 24056 771 648.973 51611. 3 744 4496. 405 4507. 191 19.278 19.29 8 62305.43 5138 0 7.081 3 748 42 2.304 5924.7 96 5942.796 36.15 2 4 0 0 0.030 0.030 0.030 0.030 Total time 673.288 68286. 534 68267.53 4 Table 2: Experimental Run of 3 Data Set Here are ome obervation and explanation on the reult 5144 2.561 (1) The total time of our comparion include the time to write the aociation rule to a file; Bit-AocRule i 2 to 3 order of magnitude fater than the variou Apriori algorithm (64-221 time fater). The big the tet data et, the big the time difference between the Bit- AocRule and the variou Apriori algorithm. We haven t compared our algorithm with ome of the other aociation rule algorithm uch a VIPER [15], CHARM [19], CLOSE [6] (CHARM and CLOSE are baed on the cloed frequent itemet concept), but baed on their publihed comparion reult with Apriori, our Bit-AocRule i very competitive compared to them and a direct comparion will be conducted and reported in the near future. (2) Bit-AocRule take the ame or litter longer time than the variou Apriori algorithm in contructing the 1- itemet becaue of the extra cot of building the bitmap for the 1-itemet. But after the 1-itemtet i done, Bit-AocRule i ignificant fater than the Apriori algorithm in contructing large frequent itemet becaue it only ue the fat bit operation (AND, COUNT and SHIFT) and doen t need to tet the ubet of the newly candidate (3) Bit-AocRule only tore the bitmap of the frequent item, and the bitmap torage (uncompreed) i le than the original data et (1/2 to 1/4 of the original data ize). 5

The main reaon that Bit-AocRule algorithm i ignificant fater than Apriori and it variation are (1) Bit-AocRule adopt the divide-and-conquer trategy, the tranaction i decompoe into vertical bitmap format and lead to focued earch of maller domain. There i no repeated can of entire databae in Bit- AocRule. (2) Bit-AocRule doen t follow the traditional candidate-generate-and tet approach, thu ave ignificant amount of time to tet the candidate (3) In Bit-AocRule, the baic operation are bit Count and bit And operation, which are extremely fater than the pattern earch and matching operation ued in Apriori and it variation 4. Concluion We preent a bitmap baed aociation rule algorithm uing granular computing technique and introduce the bitmap technique to the data mining procedure and develop a bitmap baed algorithm (Bit-AocRule) to find aociation rule. Our Bit- AocRule avoid the time-conuming table can to find and prune the itemet, all the operation of finding large itemet from the dataet are the fat bit operation. The experimental reult of our Bit-AocRule algorithm with Apriori, AprioriTid and AprioirHybrid algorithm how Bit-AocRule i 2 to 3 order of magnitude fater. Thi reearch indicate that bitmap and granular computing technique can greatly enhance the performance for finding aociation rule, and bitmap technique are very promiing for the deciion upport query optimization and data mining application. Bitmap technique i only one way to improve the performance data mining algorithm. Parallelim i another crucial apect of DSS and data mining performance. We are currently working on paralleling the bitmap-baed algorithm and hope to report our finding in the near future. 5. Reference [1] Agrawal R. Srikant R., Fat Algorithm for Mining Aociation Rule, Prod. of the 20th VLDB conf. 1994 [2] Agrawal R., Mannila H., Srikant R., Toivonen H., Verkamo A., Fat Dicovery of Aociation Rule, in Advance in Knowledge Dicovery and Data Mining, MIT 1996 [3] Agarwal, R. Aggarwal C., Praad V., A Tree Projection Algorithm for Generation of Frequent Itemet, Journal of Parallel and Ditributed Computing, 2002 [4] Bayardo R.J.Jr., Agrawal, R., Gunopulo D., "Contraint-Baed Rule Mining in Large, Dene Databae", Proc. of the 15th Int'l Conf. on Data Engineering (ICDE1999) [5] Bertino E., Ooi B.C., Sack-Davi R. etc, Indexing Technique for Advanced Databae Sytem, Kluwer Publiher [6] Han, J. Pei, J. Yin. Y., "Mining Frequent Pattern without Candidate Generation", Prod of the SIGMOD- 2002 [7] Lin T.Y., Data Mining and Machine Oriented Modeling: A Granular Computing Approach, Journal of Applied Intelligence, Oct. 2000 [8] Louie E., Lin T.Y., Finding Aociation Rule uing Fat bit Computation: Machine-Oriented Modeling IS- MIS-2000 [9] Mannila H., Toivonen H., Verkamo A., :Efficient Algorithm for Dicovering Aociation Rule, in KDD94 [10] Morzy T., Zakrzewicz M., Group Bitmap Index: A Structure for Aociation Rule Retrieval, Prod. of the 4 th Int l Conf. on Knowledge Dicovery and Data Mining (KDD-98) [11] Paquier N., Batide Y., Taouil R., Lakhal L., Dicovering Frequent Cloed Itemet for Aociation Rule, ICDT2000 [12] Pei, J. Han, H. Lu, S. Nihio, S. Tang, and D. Yang. "H-Mine: Hyper-tructure Mining of Frequent Pattern in Large Databae", Proc. The 2001 IEEE Int l Conference on Data Mining [13] Pei J., Han J., Lakhmanan, Mining Frequent Itemet with Convertible Contraint: in ICDE2001 [14] Savaere, A. Omiecinki E., Navathe S., An Efficient Algorithm for Mining Aociation Rule in Large Databae, in Prod. of the 21 t VLDB conf. [15] Shenoy P., et al, Turbo-charging vertical mining of large databae, in SIGMOD 00 [16] Wu M., Buchmann A., Encoding Bitmap Indexing for Data Warehoue, Proc. of the 14th Int l Conference on Data Engineering, 220-231, 1998 [17] Zaki M., Generating Non-Redundant Aociation Rule in KDD-2002 [18] Zaki M.,et al, New Algorithm For Fat Dicovery of Aociation Rule, In KDD97 [19] Zaki M., Hian C.J., CHARM: An Efficient Algorithm for Cloed Aociation Rule Mining, Tech Report, CS dept., RPI, USA 6