Optimization of Association Rule Mining through Genetic Algorithm

Similar documents
The k-means Algorithm and Genetic Algorithm

Optimization of Association Rule Mining Using Genetic Algorithm

[Premalatha, 4(5): May, 2015] ISSN: (I2OR), Publication Impact Factor: (ISRA), Journal Impact Factor: 2.114

Genetic Algorithms Variations and Implementation Issues

DETERMINING MAXIMUM/MINIMUM VALUES FOR TWO- DIMENTIONAL MATHMATICLE FUNCTIONS USING RANDOM CREOSSOVER TECHNIQUES

Multi-objective Optimization

GENETIC ALGORITHM with Hands-On exercise

The Genetic Algorithm for finding the maxima of single-variable functions

Topological Machining Fixture Layout Synthesis Using Genetic Algorithms

Evolutionary Computation. Chao Lan

Genetic Algorithms for Vision and Pattern Recognition

A Genetic Algorithm for Graph Matching using Graph Node Characteristics 1 2

AN OPTIMIZATION GENETIC ALGORITHM FOR IMAGE DATABASES IN AGRICULTURE

ARTIFICIAL INTELLIGENCE (CSCU9YE ) LECTURE 5: EVOLUTIONARY ALGORITHMS

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Evolutionary Computation Algorithms for Cryptanalysis: A Study

4/22/2014. Genetic Algorithms. Diwakar Yagyasen Department of Computer Science BBDNITM. Introduction

A New Selection Operator - CSM in Genetic Algorithms for Solving the TSP

A Genetic Algorithm Approach for Clustering

Grid Scheduling Strategy using GA (GSSGA)

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest

Heuristic Optimisation

Suppose you have a problem You don t know how to solve it What can you do? Can you use a computer to somehow find a solution for you?

Rajdeep Kaur Aulakh Department of Computer Science and Engineering

A Novel method for Frequent Pattern Mining

Introduction to Genetic Algorithms

Clustering Analysis of Simple K Means Algorithm for Various Data Sets in Function Optimization Problem (Fop) of Evolutionary Programming

Association Rules Identification and Optimization using Genetic Algorithm

Introduction to Genetic Algorithms. Based on Chapter 10 of Marsland Chapter 9 of Mitchell

A Software Testing Optimization Method Based on Negative Association Analysis Lin Wan 1, Qiuling Fan 1,Qinzhao Wang 2

Association Rules Extraction using Multi-objective Feature of Genetic Algorithm

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Genetic Algorithm for Finding Shortest Path in a Network

Meta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic Algorithm and Particle Swarm Optimization

A NOVEL APPROACH FOR PRIORTIZATION OF OPTIMIZED TEST CASES

RESOLVING AMBIGUITIES IN PREPOSITION PHRASE USING GENETIC ALGORITHM

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra

Genetic Algorithms. Kang Zheng Karl Schober

Generation of Ultra Side lobe levels in Circular Array Antennas using Evolutionary Algorithms

Evolving SQL Queries for Data Mining

AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENETIC ALGORITHMS

ATI Material Do Not Duplicate ATI Material. www. ATIcourses.com. www. ATIcourses.com

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Neural Network Weight Selection Using Genetic Algorithms

Adaptive Crossover in Genetic Algorithms Using Statistics Mechanism

Basic Data Mining Technique

A Genetic Algorithm-Based Approach for Energy- Efficient Clustering of Wireless Sensor Networks

Outline. Motivation. Introduction of GAs. Genetic Algorithm 9/7/2017. Motivation Genetic algorithms An illustrative example Hypothesis space search

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Improved Frequent Pattern Mining Algorithm with Indexing

An Evolutionary Algorithm for the Multi-objective Shortest Path Problem

Association Rule Mining. Entscheidungsunterstützungssysteme

Role of Genetic Algorithm in Routing for Large Network

Structural Optimizations of a 12/8 Switched Reluctance Motor using a Genetic Algorithm

Evolutionary Algorithms. CS Evolutionary Algorithms 1

A Modified Genetic Algorithm for Process Scheduling in Distributed System

Reducing Graphic Conflict In Scale Reduced Maps Using A Genetic Algorithm

A Parallel Evolutionary Algorithm for Discovery of Decision Rules

Automated Test Data Generation and Optimization Scheme Using Genetic Algorithm

Comparison Study of Multiple Traveling Salesmen Problem using Genetic Algorithm

Information Fusion Dr. B. K. Panigrahi

Santa Fe Trail Problem Solution Using Grammatical Evolution

Discovering Interesting Association Rules: A Multi-objective Genetic Algorithm Approach

Time Complexity Analysis of the Genetic Algorithm Clustering Method

ANTICIPATORY VERSUS TRADITIONAL GENETIC ALGORITHM

HYBRID GENETIC ALGORITHM WITH GREAT DELUGE TO SOLVE CONSTRAINED OPTIMIZATION PROBLEMS

DERIVATIVE-FREE OPTIMIZATION

CHAPTER 6 REAL-VALUED GENETIC ALGORITHMS

Lecture 6: Genetic Algorithm. An Introduction to Meta-Heuristics, Produced by Qiangfu Zhao (Since 2012), All rights reserved

Khushboo Arora, Samiksha Agarwal, Rohit Tanwar

Association mining rules

A GENETIC ALGORITHM FOR CLUSTERING ON VERY LARGE DATA SETS

Genetic Algorithm Performance with Different Selection Methods in Solving Multi-Objective Network Design Problem

Advanced Search Genetic algorithm

Hybrid of Genetic Algorithm and Continuous Ant Colony Optimization for Optimum Solution

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES

Escaping Local Optima: Genetic Algorithm

Classification of Concept-Drifting Data Streams using Optimized Genetic Algorithm

METAHEURISTICS Genetic Algorithm

The Binary Genetic Algorithm. Universidad de los Andes-CODENSA

Attribute Selection with a Multiobjective Genetic Algorithm

Data Mining Concepts

Distributed Optimization of Feature Mining Using Evolutionary Techniques

Constrained Functions of N Variables: Non-Gradient Based Methods

Optimization Technique for Maximization Problem in Evolutionary Programming of Genetic Algorithm in Data Mining

A Steady-State Genetic Algorithm for Traveling Salesman Problem with Pickup and Delivery

Thresholds Determination for Probabilistic Rough Sets with Genetic Algorithms

GT HEURISTIC FOR SOLVING MULTI OBJECTIVE JOB SHOP SCHEDULING PROBLEMS

An Application of Genetic Algorithm for Auto-body Panel Die-design Case Library Based on Grid

Scheme of Big-Data Supported Interactive Evolutionary Computation

Artificial Intelligence Application (Genetic Algorithm)

ISSN: [Keswani* et al., 7(1): January, 2018] Impact Factor: 4.116

Jasmeet Singh Gurm Department of CSE RIMT-IET Punjab Technical University, Jalandhar Punjab, India

A Genetic Algorithm for Multiprocessor Task Scheduling

Inducing Parameters of a Decision Tree for Expert System Shell McESE by Genetic Algorithm

Genetic Algorithm based Fractal Image Compression

Using Genetic Algorithm with Triple Crossover to Solve Travelling Salesman Problem

Introducing Partial Matching Approach in Association Rules for Better Treatment of Missing Values

REAL-CODED GENETIC ALGORITHMS CONSTRAINED OPTIMIZATION. Nedim TUTKUN

Mutations for Permutations

Transcription:

Optimization of Association Rule Mining through Genetic Algorithm RUPALI HALDULAKAR School of Information Technology, Rajiv Gandhi Proudyogiki Vishwavidyalaya Bhopal, Madhya Pradesh India Prof. JITENDRA AGRAWAL School of Information Technology, Rajiv Gandhi Proudyogiki Vishwavidyalaya Bhopal, Madhya Pradesh India Abstract:- Strong rule generation is an important area of data mining. In this paper we design a novel method for generation of strong rule. In which a general Apriori algorithm is used to generate the rules after that we use the optimization techniques. Genetic algorithm is one of the best ways to optimize the rules.in this direction for the optimization of the rule set we design a new fitness function that uses the concept of supervised learning then the GA will be able to generate the stronger rule set. Keywords: Data Mining, Association Rule, genetic algorithm 1.0 Introduction Concepts of mining associations are given as follows [5]. Let I = {I1, I2,..., Im } be a set of items and D = {t1,t2,.tn} be a set of transactions, where ti is a set of items such that ti I. An association rule is an implication of the form X Y, where X, Y I and X Y =. The rule X Y holds in the set D with support and confidence, where support is the percentage of transactions in D that contain both X and Y and confidence is the percentage of transactions in D containing X that also contain Y. An association rule must satisfy a user-set minimum support (minsup) and minimum confidence (minconf). The rule X Y is called a strong association rule if support minsup and confidence minconf. Usually association analysis is not given decision attributes so that we can find association and dependence between attributes to the best of our abilities. But the aimless analysis may take much time and space. Decision attributes determined can reduce the amount of candidate sets and searching space, and then improve the efficiency of algorithms to some extent [12].In addition, users are not interested in all association rules, but they are just concerned about the associations among condition attributes and decision attributes. So, in this paper we just mine association rules with decision attributes, in that, we consider attributes which users are concerned about as decision attributes and other attributes as condition attributes. If mining association rules from continuous attributes data, the continuous attributes have to be discretized first. The essence of discretization is to use the selected cut-points to divide the values of the continuous attributes into intervals. The methods of dividing determine the quality of association rules [2].While the genetic algorithm has good optimization. In this paper, genetic algorithm is used to search the cut-points of the continuous attributes. This paper start with as in section 2 describes genetic algorithm and genetic operators. Section 3 we define the rule generation by apriori algorithm. In section 4 we explain the optimization through GA. Section 5 represent the flow chart of proposed work. The section 6 defines Experiment and result and the last one section 7 define conclusion and future trends. 2.0 Genetic algorithm A genetic algorithm ([7]) is a type of searching algorithm. It searches a solution space for an optimal solution to a problem. The algorithm creates a population of possible solutions to the problem and lets ISSN : 0975-3397 Vol. 3 No. 3 Mar 2011 1252

them evolve over multiple generations to find better and better solution. Algorithm is started with a set of solutions (represented by chromosomes) called population. Solutions from one population are taken and used to form a new population. Cycle of the Algorithm: The algorithm operates through a simple cycle Creation of a population of strings. Evolution of each string. Selection of the best string. Genetic manipulation to create a new population of strings. Operators of Genetic Algorithm The Genetic operators determine the search capability and convergence of the algorithm. Genetic operators hold the selection crossover and mutation on the population and generate the new population. Selection Chromosomes are selected from the population to be parents to crossover. The problem is how to select these chromosomes. According to Darwin's evolution theory the best ones should survive and create new offspring. There are many methods how to select the best chromosomes, for example roulette wheel selection, Boltzmann selection, tournament selection, rank selection, steady state selection and some others. Encoding of a Chromosome The most used way of encoding is a binary string. The chromosome then could look like this: Chromosome 1 1101100100110110 Chromosome 2 1101111000011110 Crossover After we have decided what encoding we will use, we can make a step to crossover.crossover selects genes from parent chromosomes and creates a new offspring. There is different way like single point crossover other is multipoint crossover. Chromosome 1 11011 00100110110 Chromosome 2 11011 11000011110 Offspring 1 11011 11000011110 Offspring 2 11011 00100110110 Mutation After a crossover is performed, mutation takes place. This is to prevent falling all solutions in population into a local optimum of solved problem. Mutation changes randomly the new offspring. For binary encoding we can switch a few randomly chosen bits from 1 to 0 or from 0 to 1. Mutation can then be following: ISSN : 0975-3397 Vol. 3 No. 3 Mar 2011 1253

Original offspring 1 1101111000011110 Original offspring 2 1101100100110110 Mutated offspring 1 1100111000011110 Mutated offspring 2 1101101100110110 Outline of Basic Genetic Algorithm 1. [Start] Generate random population of n chromosomes (suitable solutions for the problem) 2. [Fitness] Evaluate the fitness f(x) of each chromosome x in the population 3. [New population] Create a new population by repeating following steps until the new population is complete Selection: Select two parent chromosomes from a population according to their fitness (the better fitness, the bigger chance to be selected) Crossover: With a crossover probability cross over the parents to form a new offspring (children). If no crossover was performed, offspring is an exact copy of parents. Mutation: With a mutation probability mutate new offspring at each locus (position in chromosome). Accepting: Place new offspring in a new population 4. [Replace] Use new generated population for a further run of algorithm 5. [Test] If the end condition is satisfied, stop, and return the best solution in current population 6. [Loop] Go to step 2. A typical flowchart of a genetic algorithm is shown in Figure. One iteration of the algorithm is referred to as a generation Initialize populatio Evaluate function Solution found Yes End No Reproduction Selection Generate new population Fig.1 flow chart of genetic algorithm ISSN : 0975-3397 Vol. 3 No. 3 Mar 2011 1254

3.0 Rule generation by A priori algorithm The basic concept of Mining association are given as follows Let I = {I 1, I 2.I n } be a set of item D= {T 1, T 2...T n } be a set of transaction Where ti is a set of transaction t i I, An association rule is transaction of the form X=>Y where X, YCI and X^Y=Ø.The rule X=>Y holds in the set D with Support and Confidence. Input: a transaction database D, the minimum threshold of support minsupnum Output: the set of frequent item sets L 1) for all transaction t D do 2) Generate TV; 3) L 1 = (frequent 1-itemsets); 4) C 2 = L 1 L 1 ; 5) L 2 = {c C 2 sup(c) MinSupNum}; // sup(c) is the result of the formula (4) 6) for (k=3; L k -1 ø; k++ ) do begin 7) for ( j=k; j m; j++ ) do k-1 8) Generate CIV ij C k = candidate_gen (Lk-1); 9) L k = {c C k sup(c) MinSupNum}; 10) end 11) Return L = L k ; candidate_gen (frequent item sets L k -1) 1) for all (k-1)-item set l L k -1 do 2) for all ij L k -1 do 3) // S is the result of the formula (2) if for every r(1 r k) such that S[r] k-1 then 4) add l {ij} to C k ; 4.0 Rule Optimization by Genetic algorithm a.)data Encoding Here we have used Abolan Dataset basically these dataset used for the classification. In this Database 8 attributes are exited. It is provided by the MCI. Genetic Algorithm directly not work on the raw data then whole data we have encoded in the form of Binary representation technique (0 and 1). b.) Fitness function The most important part of Genetic Algorithm is a design of Fitness Function: ISSN : 0975-3397 Vol. 3 No. 3 Mar 2011 1255

f(x) = Support(x) minsupport p(support(x)>minsupport) q(support(x)<minsupport) Support is the Support of New rule generated through genetic operation. Normal case the value of q(support(x)<minsupport)is rejected for the better performance of genetic algorithm. We have used classlearned classifier for the prediction for rejected those value near to the Maximum value. The value of q class is divided into two parts C 1 and C 2. q = {C 1, C2} C 1 = {those value or Data minsupport less than 0.5} C 2 = {those value or Data minsupport greater than 0.5} Now Support (C 2 ) α= (Support (C 2 ))>C 1 f (q) = = minsupport β= (Support(C 2 ))=C 1 c.)selection Strategy The selection strategy based on the basis of individual fitness and concentration pi is the probably of selection of individual whose fitness value is greater than one and f(α) is a those value whose fitness is less than one but near to the value of 1. Now f (x i ) Pi = -αf (α) e M 1 f (x j ) Where α is an adjustment factor. d.) Genetic Operation The Genetic operators determine the search capability and convergence of the algorithm. Genetic operators hold the selection crossover and mutation on the population and generate the new population. Select operation: In this algorithm it restores each chromosome in the population to the corresponding rule, and then calculate selection probability pi for each rule based on above formula. Crossover operation: In which multi point crossover are used. It classifies the domain of each attribute into a group and classifies the cut point of each continuous attributes into one group.and the crossover carried out between the corresponding groups of two individuals by a certain rate. Mutation Operation: Any bit in the chromosomes is mutated by a certain rate, that is, changing 0 to 1, 1 to 0. ISSN : 0975-3397 Vol. 3 No. 3 Mar 2011 1256

e.)terminating condition The algorithm finally extracts the rules that meet the confidence threshold given by users, so the final output is not one optimal rule, but rather a set of rules that meet the threshold. f.) Rules extraction The frequent rules are generated according to the fitness function and genetic operators. In order to mine the strong association rules finally, these rules must be extracted again. Extraction criteria are: output the rule which meets the minimum confidence given by users, otherwise abandon it. 5.0 Flow chart of proposed work Fig.2 Flow chart of proposed work 6.0 Presents the Experiments and Results The experiment uses Abalone dataset obtained from UCI machine learning repository. The data set has 4177 samples. It is composed of a discrete attribute and 8 continuous attributes. In this paper, we only mined such association rules X Y that Y was age. The setting of parameter: the size of evolutionary population N=100, crossover rate=0.006, mutation rate=0.001. The experiment was executed on Pentium IV CPU 2.58GHZ machine and software was MATLAB (2007). The below table represent the rule generation with genetic and without genetic algorithm for particular support and confidence ISSN : 0975-3397 Vol. 3 No. 3 Mar 2011 1257

TABLE 1.Represent the result of rule generation Attribute A B C D E F Min support 2 4 6 8 9 7 Min confidence Apriori algorithm 0.1 0.2 0.3 0.1 0.2 0.5 121 137 50 98 84 28 Genetic algorithm 94 115 42 75 70 22 The below graph represent the result analysis. 160 140 120 100 80 60 40 20 0 A B C D E Min support Apriori With genetic Figure.3 support/confidence vs. rule generation graph 7.0 conclusion & Future trend In this direction we optimize association rule mining using new fitness function. In which fitness function divide into two classes c1 and c2 one class for discrete rule and another class for continuous rule. Through this direction we get a better result. To make genetic algorithm more effective and efficient it can be incorporated with other techniques so it can provide a best result. REFERENCES [1]. C.Y.Jia and X.J.Ni, Association rule mining: A survey, Computer Science, vol. 30,,pp.145-149,2003. [2]. F.Q.Shi, S.Q.Sun, and J.Xu, Association rule mining of Dansei knowledge based on rough set, Computer Integrated Manufacturing Systems, vol.14,pp.407-411,2008. [3]. F.Y.Li, L.P.Zhao, and H.Y.Wang, Improved mining method for association rules based on genetic algorithm, Computer Engineering and Applications, vol. 44, pp.155-158,2008. [4]. H.S.Nguyen and A.Skowron, Quantization of real value attributes: rough set and boolean reasoning approach, Proc of 2th Joint Annual Conf. on Information Science, IEEE Press,pp.34-37,1995. [5]. J. Han and M. Kamber, Data Mining:Concepts and Techniques, San Francisco: Morgan Kaufmann Publishers,2001. [6]. J.G.Zheng and X.Y.Wang, DNA-Immune-Genetic algorithm based on information entropy, Computer Simulation,vol.23,pp.163-165,2006. [7]. Kalyanmoy Deb, Introduction to Genetic Algorithms, Kanpur Genetic Laboratory (Kangal), Depart of Mechanical Engineering, IIIT Kanpur 2005. ISSN : 0975-3397 Vol. 3 No. 3 Mar 2011 1258

[8]. W.S.Yao, L.Shang, and Z.Q.Chen, A quantization of real-value attributes based on evolution algorithm, Computer Applications and Software, vol. 22, pp.37-39, 2005. [9]. W.Ning and M.Q.Zhao, More improved greedy algorithm for discrimination of decision table, Computer Engineering and Applications, vol.43, pp.173-178., 2007 [10]. X.P.Wang and L.M.Cao,Genetic Algorithm-Theory, Application and Software Realization, Xi an: Xi an Jiaotong University Press, 2001. [11]. Y.Zhu,H.Zhang and L.D.Kong, Research and application of multidimensional association rules minng based on artificial immune system, Computer Science, vol. 36pp.239-242,2009. [12]. Z.Tong, K.Luo, Mining of association rules with decision attributes based on rough set, Computer Engineering and Applications, vol 42,pp.166-169,2006. ISSN : 0975-3397 Vol. 3 No. 3 Mar 2011 1259