Optimization of Association Rule Mining Using Genetic Algorithm

Similar documents
Data Mining Part 3. Associations Rules

A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining

Optimization of Association Rule Mining through Genetic Algorithm

Improved Frequent Pattern Mining Algorithm with Indexing

Generating Cross level Rules: An automated approach

Tutorial on Association Rule Mining

Association mining rules

Chapter 4: Mining Frequent Patterns, Associations and Correlations

This paper proposes: Mining Frequent Patterns without Candidate Generation

Optimization using Ant Colony Algorithm

Data Mining Concepts

Data Structures. Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali Association Rules: Basic Concepts and Application

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset

Association Pattern Mining. Lijun Zhang

Association Rules. Berlin Chen References:

An Approach for Finding Frequent Item Set Done By Comparison Based Technique

2 CONTENTS

CS570 Introduction to Data Mining

An Improved Apriori Algorithm for Association Rules

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management

Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm

Association Rule Mining from XML Data

PTclose: A novel algorithm for generation of closed frequent itemsets from dense and sparse datasets

Improved Algorithm for Frequent Item sets Mining Based on Apriori and FP-Tree

A Comparative Study of Association Rules Mining Algorithms

Association Rule Mining

DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

ALGORITHM FOR MINING TIME VARYING FREQUENT ITEMSETS

Association Rules Identification and Optimization using Genetic Algorithm

A Data Mining Framework for Extracting Product Sales Patterns in Retail Store Transactions Using Association Rules: A Case Study

APRIORI ALGORITHM FOR MINING FREQUENT ITEMSETS A REVIEW

Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering Amravati

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

Approaches for Mining Frequent Itemsets and Minimal Association Rules

A NOVEL APPROACH FOR PRIORTIZATION OF OPTIMIZED TEST CASES

Adaption of Fast Modified Frequent Pattern Growth approach for frequent item sets mining in Telecommunication Industry

Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data

Mining of Web Server Logs using Extended Apriori Algorithm

A mining method for tracking changes in temporal association rules from an encoded database

Association Rule Mining. Introduction 46. Study core 46

The Transpose Technique to Reduce Number of Transactions of Apriori Algorithm

Mining Frequent Patterns with Counting Inference at Multiple Levels

A New Approach to Discover Periodic Frequent Patterns

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM

Association Rule Mining: FP-Growth

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India

Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules

ANALYSIS OF DENSE AND SPARSE PATTERNS TO IMPROVE MINING EFFICIENCY

Hierarchical Online Mining for Associative Rules

A Literature Review of Modern Association Rule Mining Techniques

Data Structure for Association Rule Mining: T-Trees and P-Trees

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Role of Association Rule Mining in DNA Microarray Data - A Research

Association Rules Mining using BOINC based Enterprise Desktop Grid

Association Rule with Frequent Pattern Growth. Algorithm for Frequent Item Sets Mining

Performance Based Study of Association Rule Algorithms On Voter DB

DIVERSITY-BASED INTERESTINGNESS MEASURES FOR ASSOCIATION RULE MINING

AN ENHANCED SEMI-APRIORI ALGORITHM FOR MINING ASSOCIATION RULES

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA

Association Rules. A. Bellaachia Page: 1

BINARY DECISION TREE FOR ASSOCIATION RULES MINING IN INCREMENTAL DATABASES

A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining

CHAPTER 5 WEIGHTED SUPPORT ASSOCIATION RULE MINING USING CLOSED ITEMSET LATTICES IN PARALLEL

Discovering Interesting Association Rules: A Multi-objective Genetic Algorithm Approach

Marwan AL-Abed Abu-Zanona Department of Computer Information System Jerash University Amman, Jordan

Association Rule Mining. Entscheidungsunterstützungssysteme

An efficient Frequent Pattern algorithm for Market Basket Analysis

Sequential Pattern Mining Methods: A Snap Shot

Review paper on Mining Association rule and frequent patterns using Apriori Algorithm

Product presentations can be more intelligently planned

Comparison of FP tree and Apriori Algorithm

Discovery of Frequent Itemset and Promising Frequent Itemset Using Incremental Association Rule Mining Over Stream Data Mining

Association Rules Apriori Algorithm

Appropriate Item Partition for Improving the Mining Performance

Association Rule Mining Using Revolution R for Market Basket Analysis

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining

An Improved Technique for Frequent Itemset Mining

Improved Apriori Algorithms- A Survey

Mining Association Rules in Large Databases

Mining Association Rules From Time Series Data Using Hybrid Approaches

Maintenance of the Prelarge Trees for Record Deletion

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6

Association Rule Discovery

A Taxonomy of Classical Frequent Item set Mining Algorithms

Discovery of Multi Dimensional Quantitative Closed Association Rules by Attributes Range Method

Parallel Mining Association Rules in Calculation Grids

FREQUENT ITEMSET MINING USING PFP-GROWTH VIA SMART SPLITTING

Keyword: Frequent Itemsets, Highly Sensitive Rule, Sensitivity, Association rule, Sanitization, Performance Parameters.

A Novel Texture Classification Procedure by using Association Rules

An Effectual Approach to Swelling the Selling Methodology in Market Basket Analysis using FP Growth

A Modern Search Technique for Frequent Itemset using FP Tree

I. INTRODUCTION. Keywords : Spatial Data Mining, Association Mining, FP-Growth Algorithm, Frequent Data Sets

Application of Web Mining with XML Data using XQuery

Available online at ScienceDirect. Procedia Computer Science 45 (2015 )

Medical Data Mining Based on Association Rules

Transcription:

Optimization of Association Rule Mining Using Genetic Algorithm School of ICT, Gautam Buddha University, Greater Noida, Gautam Budha Nagar(U.P.),India ABSTRACT Association rule mining is the most important technique in the field of data mining. Association rule mining finding the frequent patterns, associations among sets of items or objects in transaction databases, relational databases, and other information repositories. We formulate a general Association rule mining model for extracting useful information from very large dataset. The patterns that occur frequently in data are frequent patterns. Some frequent patterns are like frequent itemsets, sequential patterns and frequent substructures. A frequent itemset are those items that often appear together in a dataset. In this paper we design a method for generation of strong rule. In which Apriori algorithm is combine with FP growth algorithm to generate the rules. After that we use the optimization techniques to optimize the rule. Genetic algorithm is used to optimize the rules. Keywords Apriori algorithm, Genetic algorithms, Association rule mining, FP growth. 1. INTRODUCTION 1.1 Data mining Data mining is the technique of discovering interesting patterns and knowledge from large amounts of data. The sources of data can be databases, data warehouses, the Web, other information repositories. Data mining is an essential step in the process of knowledge discovery. The rapid development of computer technology, especially increased capacities and decreased cost of storage medium, has led businesses to store huge amounts of external and internal information in large databases with lower cost. Mining useful information and helpful knowledge from these large databases has thus evolved into an important research area. 1.2 Association rule Association rule is a technique for discovering interesting relations between items in large databases. It is used to judge the strong rules that are discovered in databases using different measures of interestingness. Association rules are used for discovering relationship between products in large-scale transaction data in supermarkets. For example, the rule {butter, bread} {milk} found in the sales data of a supermarket would indicate that if a customer buys butter and bread together, they are likely to also buy milk. Such information can be used as the basis for decisions about strategy in marketing such as, e.g., promotional pricing or product placements. Association rule is defined as: Let I={i 1,i 2,i 3 i n } be a set of n items called attributes. Let D={t 1,t 2,t 3 t m } be a set of transactions called database. Each transaction in database D has a specific identifier ID and contains a subset of the items in I. A rule is defined as an implication of the form: X Y Where X,Y I and X Y=ø Every rule is composed by two different set of items, also known as item sets, X and Y, Where X is called antecedent or left-hand-side (LHS) and Y consequent or right-hand-side (RHS). 1.3 Apriori algorithm Apriori algorithm is used for mining the frequent itemset and association rule. The algorithm use a level-wise search to find frequent itemsets from transactional database for Boolean association rules. In this k-itemsets are used to explore (k+1) itemsets. An itemset which contains k items is known as k-itemset. In this algorithm, 42

frequent subsets are extended one item at a time from previous set and this step is known as candidate generation process. Then groups of candidates are evaluated against the data. To count candidate itemsets accurately, Apriori uses breadth-first search method and a hash tree data structure. It identifies the frequent individual items in the database and extends them to larger and larger itemsets as long as those itemsets appear in the database. Apriori algorithm find frequent item sets that are used to determine association rules which highlight general trends in the database. 1.4 Genetic algorithms A genetic algorithm is a searching algorithm. It searches a solution space for an optimal solution to a problem. The algorithm produce a population of feasible solutions to the problem and lets them evolve over multiple generations to find better solution. Algorithm is started with a set of solutions (represented by chromosomes) called population. Solutions from one population are capture and employ to form a new population. Cycle of the Algorithm: The algorithm operates through a cycle formation of a population of strings. Evolution of each string. Selection of the perfect string. Genetic manipulation to form a new population of strings. 1.5 FP growth algorithm Frequent pattern growth, or FP-growth, which adopts a divide-and-conquer strategy. First, it compresses the database representing frequent items into a frequent pattern tree, known as FP-tree, which retains the itemset association information. It then split up the compressed database into a set of conditional databases, each associated with one frequent item or pattern and extracting each D separately. Over each pattern fragment, only its associated data sets need to be examined. Therefore, this approach may significantly reduce the size of the data sets to be searched, along with the growth of patterns being examined. 2. METHODOLOGY Hybrid algorithm The proposed algorithm is a hybrid of FP Tree creation algorithm i.e. FP Tree and Apriori algorithm. This proposed algorithm can be explained into two phases. The first phase constructs the FP Tree and second phase involves mining the FP Tree created using the hybrid(fp+apriori) algorithm. Phase 1:- Tree construction using FP Tree Algorithm. Phase 2:- Tree Mining using hybrid (FP+Apriori) Algorithm. 2.1 FP tree algorithm:- The FP-tree is constructed in the following steps:- (a) Scan the transaction database D once. Collect F, the set of frequent items, and their support counts. Sort F in support count descending order as L, the list of frequent items. (b) Create the root of an FP-tree, and label it as null. For each transaction Trans in D do the following. Select and sort the frequent items in Trans according to the order of L. Let the sorted frequent item list in Trans be [p P], where p is the first element and P is the remaining list. Call insert_tree([p P],T), which is performed as follows. If T has a child N such that N.item-name=p.item-name, then increment N s count by 1; else create a new node N, and let its count be 1, its parent link be linked to T, and its node-link to the nodes with the same item-name via the node-link structure. If P is nonempty, call insert_tree(p,n) recursively. 2.2 Candidate set algorithm :- getcandidate(k) In order to perform the mining by hybrid (FP+Apriori) algorithm firstly the candidates are generated using a candidate set algorithm. Step 1 Step 2 43 List1=k-1 frequent item dataset from Data n=size(list1) Initialize mylist as a blank list to contain generated frequent item dataset

Step 3 Repeat for i=1 to n-1 Step 4 Repeat for j=i+1 to n Step 5 l1=list1[i] Step 6 l2=list1[j] Step 7 Remove the last elements from l1 and l2 Step 8 if l2 is a subset of l1 then Flist=append last element of l2 at end of l1 Step 9 If count(flist)>=supp then Add Flist to mylist Else return null Step 10 end 2.3 Hybrid (FP+Apriori) algorithm Input: Nodes of FP Tree as a list2 and Sup: minimum support Output: frequent itemsets frequent_data The algorithm is implemented with the following steps:- Step 1 Repeat following step while scanning list2 till end Create list containing single item from list2 Add this list to mylist1 [End of repeat] Step 2 Add mylist1 to data Step 3 Repeat for k=2,3,4,. Step 4 C k =getcandidate(k) Step 5 If [C k ] =0 then Goto step 6 Else Add C k to frequent_data Step 6 End After obtaining the frequent itemsets, the association rules can be generated as follows For each frequent itemset l, generate all nonempty subsets of l. For every nonempty subset s of l, output the rule s (l-s) if (support _count(l) support_count(s)) min_conf, where min_conf is the minimum confidence threshold. After it we apply the hybrid(fp+apriori) algorithm with GA to optimize the Association rule. 3. PROPOSED FLOW CHART Start Load dataset FP Tree Algorithm Hybrid (FP+Apriori) Algorithm Find the Association rule Hybrid (FP+Apriori) Algorithm with GA End 44 Fig 1: Flow Chart for Hybrid Algorithm

4. EXPERIMENT AND RESULTS 4.1 Experiment The experiment is done on Abalone dataset obtained from UCI machine learning repository. The data set has 4177 samples. It is composed of one nominal attribute and 7 continuous attributes and one integer attributes. The setting of parameter: the size of population N=500, maximum generation=100, crossover rate=0.006, mutation rate=0.001. The experiment was executed using software MATLAB 8.1.0.604(R2013a), Microsoft Windows XP Professional 2002 operating system. 4.2 Results Fig 2: Comparison between Apriori with GA and Apriori + FP with GA for the generation and fitness Figure 2 show the value of fitness based on the number of generations. The number of maximum generation in above graph is 100. Hybrid(Apriori+ FP) with GA algorithm give the approximately 4% better result in comparison to the Apriori algorithm with GA. There is a definite improvement by using hybrid algorithm comprising of Apriori along with FP and Genetic algorithm. Fig 3: Comparison between Apriori with GA and Apriori + FP with GA for the Rule generation 45

Figure 3 show the number of rules generated. The number of rules generated in hybrid (Apriori+ FP) with GA algorithm is approximately 60% less than the number of rules generated in Apriori algorithm with GA. Thus the proposed hybrid (Apriori+ FP) with GA algorithm gives the better results. The below table represent the rule generation of Apriori with GA and Apriori + FP with GA for particular support. It is seen that number of rule generation discusses with hybrid algorithm. Table 1 result of rule generation Attributes Min support Apriori with GA Apriori + FP with GA A 2 76 42 B 3 77 43 C 5 79 44 D 9 83 48 E 9 84 49 F 10 84 49 G 10 84 50 H 10 84 50 Fig 4: Comparison of rules in Apriori with GA and Apriori + FP with GA with respect to various attributes Fig 4 show the number of rules generated in Apriori with GA algorithm and in hybrid (Apriori + FP) with GA algorithm for various attributes. The number of rules generated in hybrid (Apriori + FP) with GA algorithm is less than the number of rules generated in Apriori with GA algorithm for each attributes. Thus the proposed hybrid (Apriori + FP) with GA algorithm gives the better results. 5. CONCLUSION AND FUTURE WORK In this paper, we presented the implementation of hybrid of Apriori and FP growth with GA for association rule mining on Abalone dataset. A data mining system has the ability to generate the thousands or more rules. So all the rules are not interesting, only small fraction of the rules would be interest to user. Generate only interesting rules is the optimization problem in data mining. In this study, to generate the association rule we have implemented Apriori algorithm with FP growth algorithm. We have used Genetic algorithm to optimize 46

the generated rules, we take fitness function for the optimization and find the optimum solutions that are interesting rules. We compared these two implementations on the generation and fitness they required. To improve the efficiency and accuracy of the optimization there is need of incorporating more measures in simple genetic algorithm for such data mining problems. REFERENCES [1] Rupali Haldulakar and Jitendra Agrawal, Optimization of Association Rule Mining through Genetic Algorithm, IJCSE, ISSN : 0975-3397, Vol. 3 No. 3 Mar 2011. [2] Dimple S.Kanani and Shailendra K. Mishra, An Optimized Association Rule Mining using Genetic Algorithm, International Journal of Computer Applications (0975 8887),Volume119 No.14, June 2015. [3] Anandhavalli M, Suraj Kumar Sudhanshu, Ayush Kumar and Ghose M.K., Optimized Association Rule Mining using Genetic Algorithm, Advances in Information Mining, ISSN:0975 3265, Volume 1, Issue 2, 2009, pp-01-04. [4] Sanjeev Sharma,Vivek Badhe and Sudhir Sharma, Optimization of Association Rule Mining using Genetic Algorithm,International Journal of soft computing 2(1):75-79, 2007. [5] R. agrawal and R shrikanth, Fast algorithm for mining association rules, proceedings of vldb conference, pp 487 449,1994. [6] Daniel kunkle, donghui zhang, gene Cooperman, Mining frequent generalized itemset and generalized association rules without redundancy, journal of computer science & technology, vol.23(1), pp. 77-102, jan. 2008. [7] Li pingxiang, chen jiangping, bian fuling, A developed algorithm of apriority based on association analysis, geospatial information science,vol. 7, issue 2, pp 108-112, june 2004. [8] Ajith abraham and xiaozhe wang, I-miner: a web usage mining framework using neuro- genetic-fuzzy approach. [9] S. veeramalai, n.jaisankar and a. kannan, efficient web log mining using enhanced apriori algorithm with hash tree and fuzzy, international journal of computer science & information technology,vol.2,no4,august 2010. [10] Jiyi xiao lamei zou chuanqi li, optimization of hidden markov model by a genetic algorithm for web information extraction,iske-2007 proceedings. [11] Sang t.t. nguyen, efficient web usage mining process for sequential patterns,iiwas2009, december 14 16, 2009. [12] Mahendra pratap yadav, pankaj kumar keserwani, shefalika ghosh samaddar, An efficient web mining algorithm for web log analysis: e-web miner 1 st int l conf. on recent advances in information technology rait-2012. [13] K. sotiris, and D. kanellopoulos, association rules mining: a recent overview. Gests international transactions on computer science and engineering, vol.32 (1), pp.71-82. 2006. [14] A. savasere, E. omiecinski, and S. navathe. An efficient algorithm for mining association rules in large databases, proceedings of 21 th international conference on very large databases (VLDB 95), p 432-444,1995. [15] Shuguo Han and Wee Keong Ng, privacy preserving genetic algorithms for rule discovery,2007. 47