Mining Class Contrast Functions by Gene Expression Programming 1

Size: px
Start display at page:

Download "Mining Class Contrast Functions by Gene Expression Programming 1"

Transcription

1 Mining Class Contrast Functions by Gene Expression Programming 1 Lei Duan, Changjie Tang, Liang Tang, Tianqing Zhang and Jie Zuo School of Computer Science, Sichuan University, Chengdu , China {leiduan, cjtang}@scu.edu.cn Abstract. Finding functions whose accuracies change significantly between two classes is an interesting work. In this paper, this kind of functions is defined as class contrast functions. As Gene Expression Programming (GEP) can discover essential relations from data and express them mathematically, it is desirable to apply GEP to mining such class contrast functions from data. The main contributions of this paper include: (1) proposing a new data mining task class contrast function mining, (2) designing a GEP based method to find class contrast functions, (3) presenting several strategies for finding multiple class contrast functions in data, (4) giving an extensive performance study on both synthetic and real world datasets. The experimental results show that the proposed methods are effective. Several class contrast functions are discovered from the real world datasets. Some potential works on class contrast function mining are discussed based on the experimental results. Keywords: Gene Expression Programming, Contrast Mining, Data Mining 1 Introduction Discovering mathematic models that can precisely describe the underlying relationships and be easily understood from observed data is helpful for scientists to know better on the unknown. Given a set containing several class samples, it is interesting to find some models or relationships that exist in one class while not in others. For example, the height of an ordinary person equals to his/her arm span length. However, for most basketball players and swimmers, their arm span lengths are greater than their heights. This kind of relationships differs from the regression models which do not take class information into consideration. In this paper, we propose a new data mining task called class contrast function mining. The class contrast function has following characteristics: Each variable in class contrast function is an attribute in the sample set. A class contrast function has high accuracy in one class but not in other classes. 1 This work was supported by the National Natural Science Foundation of China under grant No , and the 11th Five Years Key Programs for Sci. &Tech. Development of China under grant No. 2006BAI05A01.

2 Definition 1 (Contrast Ratio). Given a dataset D that contains two class samples (c 1 and c 2 ). Let f be a function. Suppose Err(f, c 1 ) and Err(f, c 2 ) are average errors of f in c 1 and c 2, respectively. If Err(f, c 1 ) Err(f, c 2 ), then the contrast ratio of f, CR(f): CR(f) = Err(f, c 1 ) / Err(f, c 2 ) And Err(f, c 1 ) is called tiny error, Err(f, c 2 ) is called coarse error. Definition 2 (Class Contrast Function). Let CR(f) be the contrast ratio of f on dataset D. Given an user defined parameter ε, 0 < ε < 1.0. If CR(f) < ε, then f is a class contrast function on D. Example 1. Given a sample set with two classes (c 1 and c 2 ), the samples are list in Table 1. Given a function f(a 1, a 2, a 3, a 4 ): a 1 + a 2 + a 3 a 4 = 0, then the average absolute errors of f in c 1 and c 2 are and 4.0 respectively. Let CR(f) be Then f is a class contrast function of c 1. Table 1. A sample set with two classes (c 1 and c 2 ) a 1 a 2 a 3 a 4 a 5 Class s c 1 s c 1 s c 1 s c 2 s c 2 s c 2 From Definition 1 and Definition 2, we can see that a good class contrast function should have small contrast ratio. In other words, the tiny error should be small and the coarse error should be large. However, small contrast ratio may not mean small tiny error. So, in our work we define a contrast threshold t that the tiny error of the class contrast function discovered must be less than it. In other words, if the tiny error of a discovered class contrast function is greater than contrast threshold, this function may be meaningless (big error in each class). Intuitively, the concept of class contrast function is similar to the concept of emerging patterns (EPs) which was proposed by Dong and Li [1]. Due to wide applications of EPs, many high performance algorithms on discovering EPs have been proposed [2-8]. However, class contrast function is not as the same as EP, since EPs are set of items whose support changes significantly between the two classes. Algorithms of emerging patterns discovery cannot be applied to datasets with numeric attributes directly, unless some discretization method is adopt. However, information may be lost in the process of converting numeric values to items. To the best of our knowledge, there is no previous work on finding function relationships whose accuracies are different between classes have been done. Generally, finding the class contrast functions has following challenges. How to determine the form of class contrast functions to be discovered. In the real world applications, there is no priori knowledge on function form,

3 variables, and parameters of the function. Even we are not sure whether there exists class contrast function in data or not. How to find class contrast functions from high dimensional dataset? How to find all class contrast functions those exist in the given dataset? Traditional regression methods can be used to discover functions from dataset. However, such methods need user to define some hypothesis. Gene Expression Programming (GEP) [9, 10], which is the newest development of Genetic Algorithms (GA) and Genetic Programming (GP), has strong calculation power due to the special individual structure. GEP can evolve functions with little priori knowledge. And it is unnecessary to define the function form in GEP. GEP can select function variables automatically from all given attributes. GEP has been widely used in data mining [11-16]. Section 2 introduces the preliminary knowledge of GEP. The main contributions of this work include: (1) proposing a new data mining task class contrast function mining, (2) designing a GEP based method to find class contrast functions, (3) presenting several strategies for finding multiple class contrast functions in datasets, (4) giving an extensive performance study on proposed methods and discussing some potential work on class contrast functions. The rest of this paper is organized as follows. Section 2 introduces related works. Section 3 presents the main ideas used by our algorithms and the algorithms. Section 4 reports an experimental evaluation of the algorithms. Section 5 discusses future works, and concluding remarks. 2 Related Work 2.1 Emerging Patterns Emerging Patterns (EPs) are contrast items between two classes of data whose support changes significantly between the two classes [1]. Specially, pattern which just occurs in some samples of one class is called jumping EP [1]. Since the first EP mining algorithm was proposed in [1], several methods had been designed, including: Constraint-based approach [3], Tree-based approach [4, 5], projection based algorithm [6], ZBDD based method [7], and Equivalence Classes based method [8]. The complexity of finding emerging patterns is MAX SNP-hard [17]. EPs can be found in many real world datasets. Since EPs have high discrimination power [17], many EP based classification methods have been proposed, such as CAEP [18], DeEPs [19, 20], Jumping EP based method [21]. By using EPs, the discriminating power of low support EPs, together with high support ones and multi-feature conditions are taken into consideration when building a classifier. The research results show that EP based classification methods often out perform state of the art classifiers, including C4.5 and SVM [17].

4 2.2 The Basic Concepts and Terminology Definitions of GEP The basic steps of using GEP to seek the optimal solution are the same as those of GA and GP [9]. The main players in GEP are: the chromosome and the expression tree. The chromosome is a linear, symbolic string of fixed length, while the expression tree contains the genetic information of the chromosome. A chromosome consists of one or more genes. Each gene is divided into a head and a tail. The head contains symbols that represent functions or terminals, whereas the tail contains only terminals. In GEP, the length of a gene and the number of genes composed in a chromosome are fixed. However, each gene can code for an expression tree of different sizes and shapes. The valid part of GEP genes can be got by parsing the expression tree from left to right and from top to bottom. Since the structural organization of GEP genes is flexible, any modification made in the chromosome can generate a valid expression tree. So all programs evolved by GEP are syntactically correct. Based on the natural selection principle, GEP operates iteratively evolving a population of chromosomes, encoding candidate solutions, through genetic operators, such as selection, crossover, and mutation, to find an optimum solution. The details of GEP implementation can be referred in [6]. Other than C. Ferreira s researches [9-12], GEP has been widely used in data mining research fields, such as, symbolic regression [13], classification [14, 15], and time series analysis [16]. 3 Class Contrast Function Mining 3.1 Fitness Function Design in GEP The GEP algorithm begins with the random generation of a set of chromosomes, which is called the initial population. Then the fitness of each individual is evaluated according to fitness function. The individuals are then selected according to fitness to reproduce with modification, leaving progeny with new characteristics. The individuals of this new generation are subjected to the same evolution process: expression of the genomes, confrontation of the selection environment, and reproduction with modification. This procedure is repeated until a satisfactory solution is found, or a predetermined number of generations is reached. Then evolution stops and the best-so-far solution is returned [9, 10]. The fitness function in GEP determines the evolution direction of candidate solutions. As stated before, we prefer to find class contrast function that not only has small contrast ratio but also small tiny error. Given a GEP individual g, the fitness of g is calculated as follows. fitness(g) = 1/ serr t 0.5 serr > t (1 serr / cerr) serr t where serr is the tiny error, cerr is the coarse error and t is contrast threshold. There are two phrases of evaluating fitness of a GEP individual g, (1)

5 The tiny error is greater than the predefined value. In this phrase, the fitness is evaluated by the tiny error. The fitness range is (0, 0.5]. The tiny error is not greater than the predefined value. In this phrase, the fitness is evaluated by both tiny error and coarse error. The fitness range is [0.5, 1.0]. Based on Equation (1), the GEP individual whose tiny error is small and contrast ratio is small will get high fitness value. Since individuals with higher fitness values will get larger opportunities to survive and evolve, using Equation (1) in GEP can generate desirable results. In our work, we are interested in finding functions that has high accuracy in one class but not in other classes. This is the main difference between our work and other function discovery works. Algorithm 1 describes the pseudo code of evaluating GEP individuals to evolve class contrast functions. Algorithm 1: GEP_Evaluate(Pop, D1, D2, t) Input: (1) A set of evolving GEP individuals: Pop; (2) A set of samples that belong to class c: D1; (3) A set of samples that do not belong to class c: D2; (4) A user defined threshold: t. Output: the individual with the highest fitness: bestindividual. begin 1. bestfit 0 2. bestindividual NULL 3. For each individual ind in Pop 4. Err1 Min(getAvgErr(ind, D1), getavgerr(ind, D2)) 5. Err2 Max(getAvgErr(ind, D1), getavgerr(ind, D2)) 6. if Err1 > t 7. Fit 1/Err1 * t * else 9. Fit (1 Err1/Err2) * if bestfit < Fit 11. bestfit Fit 12. bestindividual ind 13. return bestindividual end. Algorithm 1 states the process of the evaluation in GEP. In Step 4 and 5, Function getavgerr() returns the average errors of current individual ind in dataset D1 and D2, respectively. From Step 6 to 9, fitness is evaluated according to Equation (1). The time complexity of Algorithm 1 is O(m*n), where m is the number of GEP individuals in population, and n is the number of samples in D1 and D2. There are two stop conditions for GEP evolving class contrast function: A function whose contrast ratio is greater than the predefined value is found. The number of generations equals to the predetermined value. If no class contrast function is found by GEP, we have two choices: increasing the predefined generation number or restarting GEP once more. If no satisfactory function is found after several independent GEP runs, we stop the searching and conclude there may be no class contrast function exists in the given dataset.

6 3.2 Finding Class Contrast Function in High Dimensional Dataset In Subsection 3.1, we describe the basic steps of applying GEP to finding class contrast functions. In a naïve way, we take all attribute values of samples as GEP terminals. However, this naïve way may be inefficient when the dataset is high dimensional. The reason lies that in GEP when the number of terminals is large, the evolution efficiency is low. As a result, it is undesirable to apply GEP to evolving class contrast functions in high dimensional dataset directly. As a more challenging problem in this work, we investigate how to select a part of original attributes of dataset as GEP terminals. Naturally, the selected attributes should contribute to generating good class contrast functions. Intuitively, attributes whose values differ greatly in different classes are preferable. Since class distribution information is available for each attribute in class contrast function mining, we adopt information gain based method to select GEP terminals from dataset. In addition, we take the correlation information of selected attributes into consideration. The details of calculating information gain are presented in [22, 23]. For each attribute in the dataset, we calculate its information gain. And we sort all attributes in information gain descending order. Suppose the size of GEP terminal set is k. The simple way is fetching attributes with top k information gains. However, some correlations may exist in attributes. So, we select attributes that have high information gains and low correlations among them. Let v 1 and v 2 be two attribute vectors. The correlation between them, denoted as Cor(v 1, v 2 ), is calculated in following way. Cor(v 1, v 2 ) = v 1 v 2 /( v 1 v 2 ) (2) Suppose V is the set of selected attribute vectors. For each unselected attribute vector v, we calculate its correlation with each selected attribute. And take the maximum as the correlation between v and V. The correlation score between v and V, denoted as CorScore(v, V), is defined as follows. CorScore(v, V) = 1 Max{Cor(v, v ) v V} (3) In Equation (3), we take the maximal value, since want to select the attribute that has the minimal correlation value with selected attributes. Suppose Gain(v) is the information gain of attribute v, we define the contrast score, denoted as ConScore(v), as follows. ConScore(v) = Gain(v) * CorScore(v, V) (4) Given a high dimensional dataset, suppose we want to select k attributes as GEP terminals. First, we select the attribute with the highest information gain. Then we selected another k 1 attributes based on Equation (4) one by one. 3.3 Strategies of Searching Multiple Class Contrast Functions The next problem is how to discover multiple class contrast functions which may exist in the dataset. It is a challenging work since we have no idea about the form of the next class contrast function. Even we are not sure whether another class contrast function exists. Moreover, if we apply GEP to the dataset again, the same class

7 contrast function which has been discovered may be found again. So, we should avoid this situation. We design three strategies for searching multiple class contrast functions in three different cases. Keep the dataset unchanged, and record each discovered class contrast function. If the current GEP individual is a class contrast function discovered already, its fitness is assigned as the minimum value. For each discovered class contrast function, attributes in it are removed from the original dataset before finding the next class contrast function. For each discovered class contrast function, the attribute in the function and has the highest information gain is removed from the original dataset before finding the next class contrast function. The first strategy takes all attributes as GEP terminals each time, so it is suitable for the case that the number of dimensions of the dataset is small. The second strategy takes different attributes as GEP terminals each time, so it is suitable for the case that the number of dimensions of the dataset is large. And the third strategy is suitable for the case the number of dimensions of the dataset is neither large nor small. 4 Performance Evaluation To evaluate the performance of our method for mining class contrast functions from data, we implement all proposed algorithms and GEP algorithm in Java. The experiments are performed on an Intel Pentium Dual 1.80 GHz (2 Cores) PC with 2G memory running Windows XP operating system. Table 2 lists the parameters of GEP in our experiments. Moreover, to improve the function discovery ability of GEP, the constant set is {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 2, 4, 5, 6, 7, 8, 9} in our experiments. Please refer to [10] for the detail usages of these parameters. Table 2. Default parameters for GEP algorithm parameter value parameter value Population size 500 One-point recombination rate 0.4 Number of generations 200 Two-point recombination rate 0.2 Linking function addition Gene recombination rate 0.1 Function set {+,, *, / } IS transposition rate 0.1 Number of genes 3 IS elements length 1, 2, 3 Gene head size 7 RIS transposition rate 0.1 Selection operator tournament RIS elements length 1, 2, 3 Mutation rate 0.3 Gene transposition rate Experiments on Synthetic Datasets First, we generate some synthetic datasets that contain linear relation or non-linear relation to validate the effectiveness of the algorithms for mining class contrast function. For each synthetic dataset, there are 100 samples belong to c 1 class, and 100

8 samples belong to c 2 class. All values in each synthetic dataset are generated randomly under an even distribution, and the value range is [-200, 200]. Let D 1 = {X 1, X 2, X 3, X 4, X 5 } be a synthetic dataset. Suppose all samples of c 1 in D 1 satisfy linear relation f 1 : 2X 1 + X 2 X 3 = 0. That is, f 1 is a class contrast function of c 1 in D 1. Let D 2 = {X 1, X 2, X 3, X 4, X 5 } be a synthetic dataset. Suppose all samples of c 1 in D 2 satisfy non-linear relation f 2 : X 1 * X 2 + X 3 X 5 = 0. That is, f 2 is a class contrast function of c 1 in D 2. We apply our method to discovering f 1 and f 2 from D 1 and D 2 respectively. Three different contrast thresholds (5, 50 and 100) are set in the experiment. For each contrast threshold, we run the method 20 times independently, and record the GEP generation number when the predefined function is discovered. The success ratio is the percent of the times of finding the predefined function compared to the total running times. Figure 1 and 2 respectively illustrates the success ratios and average generation numbers of finding f 1, f 2 in D 1 and D 2 under different contrast thresholds. Fig. 1. The success ratios and average generation numbers of finding f 1 in D 1 Fig. 2. The success ratios and average generation numbers of finding f 2 in D 2 From Figure 1 and Figure 2, we can see that when the contrast threshold is 50, our method can find the predefined class contrast functions (f 1 and f 2 ) efficiently. The success ratios are 100% in both D 1 and D 2. The average generation number in each case is 4.3 and 46.2, respectively. Moreover, we can see that when the contrast threshold is 5, the average generation number is larger compared with when the contrast threshold equals to 50. The reason lies that if the contrast threshold is small, the first phrase of fitness evaluation (terr > t) will be tough for GEP individuals, most potential good individuals may be eliminated. So the evolution process is decreased. When the contrast threshold is 100, the success ratio is smaller compared with when

9 the contrast threshold equals to 50. The reason lies that if the contrast threshold is large, the first phrase of fitness evaluation (terr > t) cannot filter some bad GEP individuals out. Individuals which have big errors in either class but small contrast ratio may be selected as the best result. As a result, the success ratio is decreased. 4.2 Experiments on Real World Datasets Next, we apply our method to some microarray datasets, which are downloaded from Kent Ridge Bio-medical Dataset ( The characteristics of microarray dataset include: high dimension, numeric attributes, etc. Table 3 lists the characteristics of the microarray data test in our experiments. Table 3. Data characteristics of 3 microarray datasets Dataset # samples in class 1 # samples in class 2 # attributes Breast Cancer 44 (non-relapse) 34 (relapse) Central Nervous 21 (survivors) 39 (failures) 7129 Colon Cancer 22 (normal) 40 (cancer) 2000 As stated previously, the evolution efficiency of GEP is low when the terminal set size is large. For each microarray dataset, we calculate the contrast score of each attribute based on the method introduced in Subsection 3.2. We fetch 10 attributes with highest contrast scores, and find class contrast functions from them. Specifically, for Breast Cancer dataset, the index set of selected attribute is {376, 7813, 8781, 13620, 6326, 21943, 18424, 726, 7508, 19967} 2. For Central Nervous dataset, the index set of selected attribute is {7015, 5527, 2473, 2141, 843, 3419, 3774, 4605, 2088, 10}. And for Colon Cancer dataset, the index set of selected attribute is {1670, 248, 1041, 1292, 142, 1410, 1327, 1324, 1771, 896}. Subsection 3.3 introduces three strategies for finding multiple class contrast functions. In this experiment, we adopt the first strategy to find more class contrast functions. For each dataset, we apply our proposed method 20 times independently to discover class contrast functions. If a class contrast function whose fitness is greater than 0.5 is found, this run is marked as a success. In this case, the tiny error of the discovered function is greater than the contrast threshold. The success ratio is the percent of the number of success runs compared with the total running times. The number of generations is set as 500. Three different contrast thresholds (t) are set in experiments. We choose these values so that different success ratios can be got which is helpful for us to analyze the experimental results. Table 4 to Table 6 list the experimental results on these three data subsets. From Table 4, we can see that in Colon Cancer data subset it is failed to find class contrast function whose tiny error is less than The reasons may include: first, there is no function satisfies this constraint; Second, larger generation number for GEP should be set; Third, more functions should be added into GEP s function set. The success ratio can be increased by setting larger contrast threshold. But the tiny error of the best individual may be increased. It is worth to note that the difference 2 The index of the first attribute in the original dataset is 0. Here, we list the indexes of selected attributes in contrast score descending order.

10 between the average tiny error and the best tiny error is small. Since the fitness of each individual depends on its contrast ratio, some individuals are chosen due to the large coarse error. So, determining suitable contrast threshold is necessary and important. Similar conclusions can be got from Table 5 and Table 6. In Table 6, when the contrast threshold is 180 or 190, we get the same class contrast function several times. So the average values equal to the values of the best one. Table 4. The experimental results on Breast Cancer data subset t = 0.08 t = t = 0.09 Success ratio 0% 80% 100% Avg. contrast ratio \ Best contrast ratio \ Avg. tiny error \ Best tiny error \ Avg. coarse error \ Best coarse error \ Best class contrast function \ x *(x )-x =0 Table 5. The experimental results on Central Nervous data subset (x )*( )-x *x 376 * 0.7*x 376 -x =0 t = 85 t = 90 t = 95 Success ratio 40% 70% 100% Avg. contrast ratio Best contrast ratio Avg. tiny error Best tiny error Avg. coarse error Best coarse error (0.3+x Best class contrast function x x 4605 )*0.3- x x x 10 = x *x 10 = x * x 10 = 0 Table 6. The experimental results on Colon Cancer data subset t = 180 t = 190 t = 200 Success ratio 10% 25% 100% Avg. contrast ratio Best contrast ratio Avg. tiny error Best tiny error Avg. coarse error Best coarse error x Best class contrast function 1292 *(x )/(x (x 142 +x 248 )*0.2- x 248 /((0.3+x 1771 /5.0)* +x 1041 )-x 896 =0 x 896 =0 0.1)-x 896 Furthermore, we can adopt the second or the third strategy described in Subsection 3.3 to find other class contrast functions in these datasets. The basic process is similar to adopting the first strategy.

11 5 Discussions and Conclusions Finding the difference between different classes is an important data mining task. Some concepts on contrast mining have been proposed. For example, Emerging Patterns are item sets whose support changes significantly between two classes. However, methods for finding EPs cannot be applied to numeric dataset directly. In this paper, we propose a new data mining task called class contrast function mining. Shortly, class contrast functions are functions whose accuracies change significantly between two classes. Class contrast function mining is a challenging work. It is related to regression, contrast mining, classification, etc. Initially, we design a GEP based method to discover class contrast functions, and apply it to both synthetic and real world datasets. The experimental results show that our proposed methods are effective. Several class contrast functions are discovered from the real world datasets. We also get some conclusions by analyzing the experimental results. There are many works worth to be deeply analyzed in the future. For example, in our experiments we demonstrate that the contrast threshold is important. How to design a method that can adjust the contrast threshold self-adaptively is a desirable future work. Moreover, we will consider how to find all class contrast functions. References 1. Dong, G., Li, J.: Efficient Mining of Emerging Patterns: Discovering Trends and Differences. In: Proc. of KDD 1999: (1999). 2. Dong, G., Li, J.: Mining Border Descriptions of Emerging Patterns from Dataset Pairs. Knowl. Inf. Syst. 8(2): (2005). 3. Zhang, X., Dong, G., Ramamohanarao, K.: Exploring Constraints to Efficiently Mine Emerging Patterns from Large High-dimensional Datasets. In: Proc. of KDD 2000: (2000). 4. Bailey, J., Manoukian, T, Ramamohanarao, K.: Fast Algorithms for Mining Emerging Patterns. In: Proc. of PKDD 2002: (2002). 5. Fan, H., Ramamohanarao, K.: An Efficient Single-Scan Algorithm for Mining Essential Jumping Emerging Patterns for Classification. In: Proc. of PAKDD 2002: (2002). 6. Bailey, J., Manoukian, T., Ramamohanarao, K.: A Fast Algorithm for Computing Hypergraph Transversals and its Application in Mining Emerging Patterns. In: Proc. of ICDM 2003: (2003). 7. Loekito, E., Bailey, J.: Fast Mining of High Dimensional Expressive Contrast Patterns Using Zero-suppressed Binary Decision Diagrams. In: Proc. of KDD 2006: (2006). 8. Li, J., Liu, G., Wong, L.: Mining Statistically Important Equivalence Classes and Delta-Discriminative Emerging Patterns. In: Proc. of KDD 2007: (2007). 9. Ferreira, C.: Gene Expression Programming: A New Adaptive Algorithm for Solving Problems. Complex Systems, 13(2): (2001). 10. Ferreira, C.: Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence. Angra do Heroismo, Portugal (2002). 11. Ferreira, C.: Discovery of the Boolean Functions to the Best Density-Classification Rules Using Gene Expression Programming. In: Proc of the 4th EuroGP: (2002).

12 12. Ferreira, C.: Mutation, Transposition, and recombination: An analysis of the evolutionary Dynamics. 4th Int l Workshop on Frontiers in Evolutionary Algorithms, Research Triangle Park, North Carolina, USA (2002). 13. Lopes, H.S., Weinert, W. R.. EGIPSYS: An Enhanced Gene Expression Programming Approach for Symbolic Regression Problems. Int l Journal of Applied Mathematics and Computer Science, 14 (3): (2004). 14. Zhou, C., Xiao, W., Tirpak, T. M., Nelson, P. C.: Evolution Accurate and Compact Classification Rules with Gene Expression Programming. IEEE Transactions on Evolutionary Computation, 7(6): (2003). 15. Duan, L., Tang, C., Zhang, T., et.al: Distance Guided Classification with Gene Expression Programming. In: Proc. of ADMA 2006: (2006). 16. Zuo, J., Tang, C., Li, C., et al: Time Series Prediction based on Gene Expression Programming. In: Proc. of WAIM 2004: (2004). 17. Bailey, J., Dong, G.: Contrast Data Mining: Methods and Applications. Tutorial at 2007 IEEE ICDM (2007). 18. Dong, G., Zhang, X., Wong, L., Li, J.: CAEP: Classification by Aggregating Emerging Patterns. Discovery Science: (1999). 19. Li, J., Dong, G., Ramamohanarao, K.: Instance-Based Classification by Emerging Patterns. In: PKDD 2000: (2000). 20. Li, J., Dong, G., Ramamohanarao, K., Wong, L.: DeEPs: A New Instance-Based Lazy Discovery and Classification System. Machine Learning 54(2): (2004). 21. Li, J., Dong, G., Ramamohanarao, K.: Making Use of the Most Expressive Jumping Emerging Patterns for Classification. In: Proc. of PAKDD 2000: (2000). 22. Dougherty, J., Kohavi, R., Sahami, M.: Supervised and Unsupervised Discretization of Continuous Features. In: Proc. of the 20th ICML: (1995). 23. Fayyad, U., Irani, K.: Multi-interval Discretization of Continuous-valued Attributes for Classification Learning. In: Proc. of the 13th IJCAI: (1993).

A Comparative Study of Linear Encoding in Genetic Programming

A Comparative Study of Linear Encoding in Genetic Programming 2011 Ninth International Conference on ICT and Knowledge A Comparative Study of Linear Encoding in Genetic Programming Yuttana Suttasupa, Suppat Rungraungsilp, Suwat Pinyopan, Pravit Wungchusunti, Prabhas

More information

Mining Generalised Emerging Patterns

Mining Generalised Emerging Patterns Mining Generalised Emerging Patterns Xiaoyuan Qian, James Bailey, Christopher Leckie Department of Computer Science and Software Engineering University of Melbourne, Australia {jbailey, caleckie}@csse.unimelb.edu.au

More information

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Bhakti V. Gavali 1, Prof. Vivekanand Reddy 2 1 Department of Computer Science and Engineering, Visvesvaraya Technological

More information

The k-means Algorithm and Genetic Algorithm

The k-means Algorithm and Genetic Algorithm The k-means Algorithm and Genetic Algorithm k-means algorithm Genetic algorithm Rough set approach Fuzzy set approaches Chapter 8 2 The K-Means Algorithm The K-Means algorithm is a simple yet effective

More information

Fast Efficient Clustering Algorithm for Balanced Data

Fast Efficient Clustering Algorithm for Balanced Data Vol. 5, No. 6, 214 Fast Efficient Clustering Algorithm for Balanced Data Adel A. Sewisy Faculty of Computer and Information, Assiut University M. H. Marghny Faculty of Computer and Information, Assiut

More information

Classification of Concept-Drifting Data Streams using Optimized Genetic Algorithm

Classification of Concept-Drifting Data Streams using Optimized Genetic Algorithm Classification of Concept-Drifting Data Streams using Optimized Genetic Algorithm E. Padmalatha Asst.prof CBIT C.R.K. Reddy, PhD Professor CBIT B. Padmaja Rani, PhD Professor JNTUH ABSTRACT Data Stream

More information

A Parallel Evolutionary Algorithm for Discovery of Decision Rules

A Parallel Evolutionary Algorithm for Discovery of Decision Rules A Parallel Evolutionary Algorithm for Discovery of Decision Rules Wojciech Kwedlo Faculty of Computer Science Technical University of Bia lystok Wiejska 45a, 15-351 Bia lystok, Poland wkwedlo@ii.pb.bialystok.pl

More information

An Application of Genetic Algorithm for Auto-body Panel Die-design Case Library Based on Grid

An Application of Genetic Algorithm for Auto-body Panel Die-design Case Library Based on Grid An Application of Genetic Algorithm for Auto-body Panel Die-design Case Library Based on Grid Demin Wang 2, Hong Zhu 1, and Xin Liu 2 1 College of Computer Science and Technology, Jilin University, Changchun

More information

Literature Review On Implementing Binary Knapsack problem

Literature Review On Implementing Binary Knapsack problem Literature Review On Implementing Binary Knapsack problem Ms. Niyati Raj, Prof. Jahnavi Vitthalpura PG student Department of Information Technology, L.D. College of Engineering, Ahmedabad, India Assistant

More information

AN OPTIMIZATION GENETIC ALGORITHM FOR IMAGE DATABASES IN AGRICULTURE

AN OPTIMIZATION GENETIC ALGORITHM FOR IMAGE DATABASES IN AGRICULTURE AN OPTIMIZATION GENETIC ALGORITHM FOR IMAGE DATABASES IN AGRICULTURE Changwu Zhu 1, Guanxiang Yan 2, Zhi Liu 3, Li Gao 1,* 1 Department of Computer Science, Hua Zhong Normal University, Wuhan 430079, China

More information

Clustering Analysis of Simple K Means Algorithm for Various Data Sets in Function Optimization Problem (Fop) of Evolutionary Programming

Clustering Analysis of Simple K Means Algorithm for Various Data Sets in Function Optimization Problem (Fop) of Evolutionary Programming Clustering Analysis of Simple K Means Algorithm for Various Data Sets in Function Optimization Problem (Fop) of Evolutionary Programming R. Karthick 1, Dr. Malathi.A 2 Research Scholar, Department of Computer

More information

Using a genetic algorithm for editing k-nearest neighbor classifiers

Using a genetic algorithm for editing k-nearest neighbor classifiers Using a genetic algorithm for editing k-nearest neighbor classifiers R. Gil-Pita 1 and X. Yao 23 1 Teoría de la Señal y Comunicaciones, Universidad de Alcalá, Madrid (SPAIN) 2 Computer Sciences Department,

More information

Optimization of Association Rule Mining through Genetic Algorithm

Optimization of Association Rule Mining through Genetic Algorithm Optimization of Association Rule Mining through Genetic Algorithm RUPALI HALDULAKAR School of Information Technology, Rajiv Gandhi Proudyogiki Vishwavidyalaya Bhopal, Madhya Pradesh India Prof. JITENDRA

More information

Evolutionary Computation. Chao Lan

Evolutionary Computation. Chao Lan Evolutionary Computation Chao Lan Outline Introduction Genetic Algorithm Evolutionary Strategy Genetic Programming Introduction Evolutionary strategy can jointly optimize multiple variables. - e.g., max

More information

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Improving the Efficiency of Fast Using Semantic Similarity Algorithm International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year

More information

Evolving SQL Queries for Data Mining

Evolving SQL Queries for Data Mining Evolving SQL Queries for Data Mining Majid Salim and Xin Yao School of Computer Science, The University of Birmingham Edgbaston, Birmingham B15 2TT, UK {msc30mms,x.yao}@cs.bham.ac.uk Abstract. This paper

More information

4/22/2014. Genetic Algorithms. Diwakar Yagyasen Department of Computer Science BBDNITM. Introduction

4/22/2014. Genetic Algorithms. Diwakar Yagyasen Department of Computer Science BBDNITM. Introduction 4/22/24 s Diwakar Yagyasen Department of Computer Science BBDNITM Visit dylycknow.weebly.com for detail 2 The basic purpose of a genetic algorithm () is to mimic Nature s evolutionary approach The algorithm

More information

Multi-label classification using rule-based classifier systems

Multi-label classification using rule-based classifier systems Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar

More information

Lecture 8: Genetic Algorithms

Lecture 8: Genetic Algorithms Lecture 8: Genetic Algorithms Cognitive Systems - Machine Learning Part II: Special Aspects of Concept Learning Genetic Algorithms, Genetic Programming, Models of Evolution last change December 1, 2010

More information

Robust Gene Expression Programming

Robust Gene Expression Programming Available online at www.sciencedirect.com Procedia Computer Science 6 (2011) 165 170 Complex Adaptive Systems, Volume 1 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University of Science

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

DETERMINING MAXIMUM/MINIMUM VALUES FOR TWO- DIMENTIONAL MATHMATICLE FUNCTIONS USING RANDOM CREOSSOVER TECHNIQUES

DETERMINING MAXIMUM/MINIMUM VALUES FOR TWO- DIMENTIONAL MATHMATICLE FUNCTIONS USING RANDOM CREOSSOVER TECHNIQUES DETERMINING MAXIMUM/MINIMUM VALUES FOR TWO- DIMENTIONAL MATHMATICLE FUNCTIONS USING RANDOM CREOSSOVER TECHNIQUES SHIHADEH ALQRAINY. Department of Software Engineering, Albalqa Applied University. E-mail:

More information

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms

More information

A Survey of Classification on Emerging Pattern

A Survey of Classification on Emerging Pattern IJIRST International Journal for Innovative Research in Science & Technology Volume 1 Issue 7 December 2014 ISSN (online): 2349-6010 A Survey of Classification on Emerging Pattern Harsha Parmar PG Student

More information

Genetic Programming. Charles Chilaka. Department of Computational Science Memorial University of Newfoundland

Genetic Programming. Charles Chilaka. Department of Computational Science Memorial University of Newfoundland Genetic Programming Charles Chilaka Department of Computational Science Memorial University of Newfoundland Class Project for Bio 4241 March 27, 2014 Charles Chilaka (MUN) Genetic algorithms and programming

More information

A Survey on Postive and Unlabelled Learning

A Survey on Postive and Unlabelled Learning A Survey on Postive and Unlabelled Learning Gang Li Computer & Information Sciences University of Delaware ligang@udel.edu Abstract In this paper we survey the main algorithms used in positive and unlabeled

More information

Heuristic Optimisation

Heuristic Optimisation Heuristic Optimisation Part 10: Genetic Algorithm Basics Sándor Zoltán Németh http://web.mat.bham.ac.uk/s.z.nemeth s.nemeth@bham.ac.uk University of Birmingham S Z Németh (s.nemeth@bham.ac.uk) Heuristic

More information

Solving Sudoku Puzzles with Node Based Coincidence Algorithm

Solving Sudoku Puzzles with Node Based Coincidence Algorithm Solving Sudoku Puzzles with Node Based Coincidence Algorithm Kiatsopon Waiyapara Department of Compute Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, Thailand kiatsopon.w@gmail.com

More information

Limsoon Wong (Joint work with Mengling Feng, Thanh-Son Ngo, Jinyan Li, Guimei Liu)

Limsoon Wong (Joint work with Mengling Feng, Thanh-Son Ngo, Jinyan Li, Guimei Liu) Theory, Practice, and an Application of Frequent Pattern Space Maintenance Limsoon Wong (Joint work with Mengling Feng, Thanh-Son Ngo, Jinyan Li, Guimei Liu) 2 What Data? Transactional data Items, transactions,

More information

A Classifier with the Function-based Decision Tree

A Classifier with the Function-based Decision Tree A Classifier with the Function-based Decision Tree Been-Chian Chien and Jung-Yi Lin Institute of Information Engineering I-Shou University, Kaohsiung 84008, Taiwan, R.O.C E-mail: cbc@isu.edu.tw, m893310m@isu.edu.tw

More information

Finding Contrast Patterns in Imbalanced Classification based on Sliding Window

Finding Contrast Patterns in Imbalanced Classification based on Sliding Window 4th International Conference on Mechanical Materials and Manufacturing Engineering (MMME 2016) Finding Contrast Patterns in Imbalanced Classification based on Sliding Window Xiangtao Chen & Zhouzhou Liu

More information

A Novel Algorithm for Associative Classification

A Novel Algorithm for Associative Classification A Novel Algorithm for Associative Classification Gourab Kundu 1, Sirajum Munir 1, Md. Faizul Bari 1, Md. Monirul Islam 1, and K. Murase 2 1 Department of Computer Science and Engineering Bangladesh University

More information

Performance Assessment of DMOEA-DD with CEC 2009 MOEA Competition Test Instances

Performance Assessment of DMOEA-DD with CEC 2009 MOEA Competition Test Instances Performance Assessment of DMOEA-DD with CEC 2009 MOEA Competition Test Instances Minzhong Liu, Xiufen Zou, Yu Chen, Zhijian Wu Abstract In this paper, the DMOEA-DD, which is an improvement of DMOEA[1,

More information

Outline. Motivation. Introduction of GAs. Genetic Algorithm 9/7/2017. Motivation Genetic algorithms An illustrative example Hypothesis space search

Outline. Motivation. Introduction of GAs. Genetic Algorithm 9/7/2017. Motivation Genetic algorithms An illustrative example Hypothesis space search Outline Genetic Algorithm Motivation Genetic algorithms An illustrative example Hypothesis space search Motivation Evolution is known to be a successful, robust method for adaptation within biological

More information

Classification and Optimization using RF and Genetic Algorithm

Classification and Optimization using RF and Genetic Algorithm International Journal of Management, IT & Engineering Vol. 8 Issue 4, April 2018, ISSN: 2249-0558 Impact Factor: 7.119 Journal Homepage: Double-Blind Peer Reviewed Refereed Open Access International Journal

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

Scheme of Big-Data Supported Interactive Evolutionary Computation

Scheme of Big-Data Supported Interactive Evolutionary Computation 2017 2nd International Conference on Information Technology and Management Engineering (ITME 2017) ISBN: 978-1-60595-415-8 Scheme of Big-Data Supported Interactive Evolutionary Computation Guo-sheng HAO

More information

The Study of Genetic Algorithm-based Task Scheduling for Cloud Computing

The Study of Genetic Algorithm-based Task Scheduling for Cloud Computing The Study of Genetic Algorithm-based Task Scheduling for Cloud Computing Sung Ho Jang, Tae Young Kim, Jae Kwon Kim and Jong Sik Lee School of Information Engineering Inha University #253, YongHyun-Dong,

More information

A High Growth-Rate Emerging Pattern for Data Classification in Microarray Databases

A High Growth-Rate Emerging Pattern for Data Classification in Microarray Databases A High Growth-Rate Emerging Pattern for Data Classification in Microarray Databases Ye-In Chang, Zih-Siang Chen, and Tsung-Bin Yang Dept. of Computer Science and Engineering, National Sun Yat-Sen University

More information

UNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING. Daniela Joiţa Titu Maiorescu University, Bucharest, Romania

UNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING. Daniela Joiţa Titu Maiorescu University, Bucharest, Romania UNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING Daniela Joiţa Titu Maiorescu University, Bucharest, Romania danielajoita@utmro Abstract Discretization of real-valued data is often used as a pre-processing

More information

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace [Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 20 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(20), 2014 [12526-12531] Exploration on the data mining system construction

More information

A GENETIC ALGORITHM FOR CLUSTERING ON VERY LARGE DATA SETS

A GENETIC ALGORITHM FOR CLUSTERING ON VERY LARGE DATA SETS A GENETIC ALGORITHM FOR CLUSTERING ON VERY LARGE DATA SETS Jim Gasvoda and Qin Ding Department of Computer Science, Pennsylvania State University at Harrisburg, Middletown, PA 17057, USA {jmg289, qding}@psu.edu

More information

Mining Quantitative Association Rules on Overlapped Intervals

Mining Quantitative Association Rules on Overlapped Intervals Mining Quantitative Association Rules on Overlapped Intervals Qiang Tong 1,3, Baoping Yan 2, and Yuanchun Zhou 1,3 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China {tongqiang,

More information

Using a Probable Time Window for Efficient Pattern Mining in a Receptor Database

Using a Probable Time Window for Efficient Pattern Mining in a Receptor Database Using a Probable Time Window for Efficient Pattern Mining in a Receptor Database Edgar H. de Graaf and Walter A. Kosters Leiden Institute of Advanced Computer Science, Leiden University, The Netherlands

More information

Introduction to Artificial Intelligence

Introduction to Artificial Intelligence Introduction to Artificial Intelligence COMP307 Evolutionary Computing 3: Genetic Programming for Regression and Classification Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline Statistical parameter regression Symbolic

More information

The Genetic Algorithm for finding the maxima of single-variable functions

The Genetic Algorithm for finding the maxima of single-variable functions Research Inventy: International Journal Of Engineering And Science Vol.4, Issue 3(March 2014), PP 46-54 Issn (e): 2278-4721, Issn (p):2319-6483, www.researchinventy.com The Genetic Algorithm for finding

More information

Hardware Neuronale Netzwerke - Lernen durch künstliche Evolution (?)

Hardware Neuronale Netzwerke - Lernen durch künstliche Evolution (?) SKIP - May 2004 Hardware Neuronale Netzwerke - Lernen durch künstliche Evolution (?) S. G. Hohmann, Electronic Vision(s), Kirchhoff Institut für Physik, Universität Heidelberg Hardware Neuronale Netzwerke

More information

IJMIE Volume 2, Issue 9 ISSN:

IJMIE Volume 2, Issue 9 ISSN: Dimensionality Using Optimization Algorithm for High Dimensional Data Clustering Saranya.S* Dr.Punithavalli.M** Abstract: This paper present an efficient approach to a feature selection problem based on

More information

PAPER Application of an Artificial Fish Swarm Algorithm in Symbolic Regression

PAPER Application of an Artificial Fish Swarm Algorithm in Symbolic Regression 872 IEICE TRANS. INF. & SYST., VOL.E96 D, NO.4 APRIL 2013 PAPER Application of an Artificial Fish Swarm Algorithm in Symbolic Regression Qing LIU a), Tomohiro ODAKA, Jousuke KUROIWA, and Hisakazu OGURA,

More information

Keywords Fuzzy, Set Theory, KDD, Data Base, Transformed Database.

Keywords Fuzzy, Set Theory, KDD, Data Base, Transformed Database. Volume 6, Issue 5, May 016 ISSN: 77 18X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Fuzzy Logic in Online

More information

Genetic Algorithms. Kang Zheng Karl Schober

Genetic Algorithms. Kang Zheng Karl Schober Genetic Algorithms Kang Zheng Karl Schober Genetic algorithm What is Genetic algorithm? A genetic algorithm (or GA) is a search technique used in computing to find true or approximate solutions to optimization

More information

Suppose you have a problem You don t know how to solve it What can you do? Can you use a computer to somehow find a solution for you?

Suppose you have a problem You don t know how to solve it What can you do? Can you use a computer to somehow find a solution for you? Gurjit Randhawa Suppose you have a problem You don t know how to solve it What can you do? Can you use a computer to somehow find a solution for you? This would be nice! Can it be done? A blind generate

More information

Data Mining Concepts

Data Mining Concepts Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms Sequential

More information

An Evolutionary Algorithm for the Multi-objective Shortest Path Problem

An Evolutionary Algorithm for the Multi-objective Shortest Path Problem An Evolutionary Algorithm for the Multi-objective Shortest Path Problem Fangguo He Huan Qi Qiong Fan Institute of Systems Engineering, Huazhong University of Science & Technology, Wuhan 430074, P. R. China

More information

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra Pattern Recall Analysis of the Hopfield Neural Network with a Genetic Algorithm Susmita Mohapatra Department of Computer Science, Utkal University, India Abstract: This paper is focused on the implementation

More information

Dynamic Clustering of Data with Modified K-Means Algorithm

Dynamic Clustering of Data with Modified K-Means Algorithm 2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq

More information

A Clustering Method with Efficient Number of Clusters Selected Automatically Based on Shortest Path

A Clustering Method with Efficient Number of Clusters Selected Automatically Based on Shortest Path A Clustering Method with Efficient Number of Clusters Selected Automatically Based on Shortest Path Makki Akasha, Ibrahim Musa Ishag, Dong Gyu Lee, Keun Ho Ryu Database/Bioinformatics Laboratory Chungbuk

More information

A THREAD BUILDING BLOCKS BASED PARALLEL GENETIC ALGORITHM

A THREAD BUILDING BLOCKS BASED PARALLEL GENETIC ALGORITHM www.arpapress.com/volumes/vol31issue1/ijrras_31_1_01.pdf A THREAD BUILDING BLOCKS BASED PARALLEL GENETIC ALGORITHM Erkan Bostanci *, Yilmaz Ar & Sevgi Yigit-Sert SAAT Laboratory, Computer Engineering Department,

More information

On Demand Phenotype Ranking through Subspace Clustering

On Demand Phenotype Ranking through Subspace Clustering On Demand Phenotype Ranking through Subspace Clustering Xiang Zhang, Wei Wang Department of Computer Science University of North Carolina at Chapel Hill Chapel Hill, NC 27599, USA {xiang, weiwang}@cs.unc.edu

More information

Traffic Signal Control Based On Fuzzy Artificial Neural Networks With Particle Swarm Optimization

Traffic Signal Control Based On Fuzzy Artificial Neural Networks With Particle Swarm Optimization Traffic Signal Control Based On Fuzzy Artificial Neural Networks With Particle Swarm Optimization J.Venkatesh 1, B.Chiranjeevulu 2 1 PG Student, Dept. of ECE, Viswanadha Institute of Technology And Management,

More information

Genetic Fuzzy Discretization with Adaptive Intervals for Classification Problems

Genetic Fuzzy Discretization with Adaptive Intervals for Classification Problems Genetic Fuzzy Discretization with Adaptive Intervals for Classification Problems Yoon-Seok Choi School of Computer Science & Engineering, Seoul National University Shillim-dong, Gwanak-gu, Seoul, 151-742,

More information

Unsupervised Feature Selection Using Multi-Objective Genetic Algorithms for Handwritten Word Recognition

Unsupervised Feature Selection Using Multi-Objective Genetic Algorithms for Handwritten Word Recognition Unsupervised Feature Selection Using Multi-Objective Genetic Algorithms for Handwritten Word Recognition M. Morita,2, R. Sabourin 3, F. Bortolozzi 3 and C. Y. Suen 2 École de Technologie Supérieure, Montreal,

More information

Machine Learning: Algorithms and Applications Mockup Examination

Machine Learning: Algorithms and Applications Mockup Examination Machine Learning: Algorithms and Applications Mockup Examination 14 May 2012 FIRST NAME STUDENT NUMBER LAST NAME SIGNATURE Instructions for students Write First Name, Last Name, Student Number and Signature

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information

Basic Data Mining Technique

Basic Data Mining Technique Basic Data Mining Technique What is classification? What is prediction? Supervised and Unsupervised Learning Decision trees Association rule K-nearest neighbor classifier Case-based reasoning Genetic algorithm

More information

Approach Using Genetic Algorithm for Intrusion Detection System

Approach Using Genetic Algorithm for Intrusion Detection System Approach Using Genetic Algorithm for Intrusion Detection System 544 Abhijeet Karve Government College of Engineering, Aurangabad, Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, Maharashtra-

More information

Mutation in Compressed Encoding in Estimation of Distribution Algorithm

Mutation in Compressed Encoding in Estimation of Distribution Algorithm Mutation in Compressed Encoding in Estimation of Distribution Algorithm Orawan Watchanupaporn, Worasait Suwannik Department of Computer Science asetsart University Bangkok, Thailand orawan.liu@gmail.com,

More information

Santa Fe Trail Problem Solution Using Grammatical Evolution

Santa Fe Trail Problem Solution Using Grammatical Evolution 2012 International Conference on Industrial and Intelligent Information (ICIII 2012) IPCSIT vol.31 (2012) (2012) IACSIT Press, Singapore Santa Fe Trail Problem Solution Using Grammatical Evolution Hideyuki

More information

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Abstract - The primary goal of the web site is to provide the

More information

Multi Expression Programming. Mihai Oltean

Multi Expression Programming. Mihai Oltean Multi Expression Programming Mihai Oltean Department of Computer Science, Faculty of Mathematics and Computer Science, Babeş-Bolyai University, Kogălniceanu 1, Cluj-Napoca, 3400, Romania. email: mihai.oltean@gmail.com

More information

Noise-based Feature Perturbation as a Selection Method for Microarray Data

Noise-based Feature Perturbation as a Selection Method for Microarray Data Noise-based Feature Perturbation as a Selection Method for Microarray Data Li Chen 1, Dmitry B. Goldgof 1, Lawrence O. Hall 1, and Steven A. Eschrich 2 1 Department of Computer Science and Engineering

More information

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,

More information

MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS

MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS J.I. Serrano M.D. Del Castillo Instituto de Automática Industrial CSIC. Ctra. Campo Real km.0 200. La Poveda. Arganda del Rey. 28500

More information

Attribute Selection with a Multiobjective Genetic Algorithm

Attribute Selection with a Multiobjective Genetic Algorithm Attribute Selection with a Multiobjective Genetic Algorithm Gisele L. Pappa, Alex A. Freitas, Celso A.A. Kaestner Pontifícia Universidade Catolica do Parana (PUCPR), Postgraduated Program in Applied Computer

More information

JHPCSN: Volume 4, Number 1, 2012, pp. 1-7

JHPCSN: Volume 4, Number 1, 2012, pp. 1-7 JHPCSN: Volume 4, Number 1, 2012, pp. 1-7 QUERY OPTIMIZATION BY GENETIC ALGORITHM P. K. Butey 1, Shweta Meshram 2 & R. L. Sonolikar 3 1 Kamala Nehru Mahavidhyalay, Nagpur. 2 Prof. Priyadarshini Institute

More information

Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Search & Optimization Search and Optimization method deals with

More information

Argha Roy* Dept. of CSE Netaji Subhash Engg. College West Bengal, India.

Argha Roy* Dept. of CSE Netaji Subhash Engg. College West Bengal, India. Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Training Artificial

More information

Louis Fourrier Fabien Gaie Thomas Rolf

Louis Fourrier Fabien Gaie Thomas Rolf CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted

More information

CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM

CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM 1 CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM John R. Koza Computer Science Department Stanford University Stanford, California 94305 USA E-MAIL: Koza@Sunburn.Stanford.Edu

More information

CHAPTER 5 ENERGY MANAGEMENT USING FUZZY GENETIC APPROACH IN WSN

CHAPTER 5 ENERGY MANAGEMENT USING FUZZY GENETIC APPROACH IN WSN 97 CHAPTER 5 ENERGY MANAGEMENT USING FUZZY GENETIC APPROACH IN WSN 5.1 INTRODUCTION Fuzzy systems have been applied to the area of routing in ad hoc networks, aiming to obtain more adaptive and flexible

More information

Genetic Algorithms for Classification and Feature Extraction

Genetic Algorithms for Classification and Feature Extraction Genetic Algorithms for Classification and Feature Extraction Min Pei, Erik D. Goodman, William F. Punch III and Ying Ding, (1995), Genetic Algorithms For Classification and Feature Extraction, Michigan

More information

Two Algorithms of Image Segmentation and Measurement Method of Particle s Parameters

Two Algorithms of Image Segmentation and Measurement Method of Particle s Parameters Appl. Math. Inf. Sci. 6 No. 1S pp. 105S-109S (2012) Applied Mathematics & Information Sciences An International Journal @ 2012 NSP Natural Sciences Publishing Cor. Two Algorithms of Image Segmentation

More information

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology 9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example

More information

A Conflict-Based Confidence Measure for Associative Classification

A Conflict-Based Confidence Measure for Associative Classification A Conflict-Based Confidence Measure for Associative Classification Peerapon Vateekul and Mei-Ling Shyu Department of Electrical and Computer Engineering University of Miami Coral Gables, FL 33124, USA

More information

AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS

AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS H.S Behera Department of Computer Science and Engineering, Veer Surendra Sai University

More information

Mutations for Permutations

Mutations for Permutations Mutations for Permutations Insert mutation: Pick two allele values at random Move the second to follow the first, shifting the rest along to accommodate Note: this preserves most of the order and adjacency

More information

Clustering Analysis based on Data Mining Applications Xuedong Fan

Clustering Analysis based on Data Mining Applications Xuedong Fan Applied Mechanics and Materials Online: 203-02-3 ISSN: 662-7482, Vols. 303-306, pp 026-029 doi:0.4028/www.scientific.net/amm.303-306.026 203 Trans Tech Publications, Switzerland Clustering Analysis based

More information

Aero-engine PID parameters Optimization based on Adaptive Genetic Algorithm. Yinling Wang, Huacong Li

Aero-engine PID parameters Optimization based on Adaptive Genetic Algorithm. Yinling Wang, Huacong Li International Conference on Applied Science and Engineering Innovation (ASEI 215) Aero-engine PID parameters Optimization based on Adaptive Genetic Algorithm Yinling Wang, Huacong Li School of Power and

More information

Meta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic Algorithm and Particle Swarm Optimization

Meta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic Algorithm and Particle Swarm Optimization 2017 2 nd International Electrical Engineering Conference (IEEC 2017) May. 19 th -20 th, 2017 at IEP Centre, Karachi, Pakistan Meta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic

More information

Automatic Generation of Test Case based on GATS Algorithm *

Automatic Generation of Test Case based on GATS Algorithm * Automatic Generation of Test Case based on GATS Algorithm * Xiajiong Shen and Qian Wang Institute of Data and Knowledge Engineering Henan University Kaifeng, Henan Province 475001, China shenxj@henu.edu.cn

More information

Individualized Error Estimation for Classification and Regression Models

Individualized Error Estimation for Classification and Regression Models Individualized Error Estimation for Classification and Regression Models Krisztian Buza, Alexandros Nanopoulos, Lars Schmidt-Thieme Abstract Estimating the error of classification and regression models

More information

A Genetic Algorithm-Based Approach for Energy- Efficient Clustering of Wireless Sensor Networks

A Genetic Algorithm-Based Approach for Energy- Efficient Clustering of Wireless Sensor Networks A Genetic Algorithm-Based Approach for Energy- Efficient Clustering of Wireless Sensor Networks A. Zahmatkesh and M. H. Yaghmaee Abstract In this paper, we propose a Genetic Algorithm (GA) to optimize

More information

An Empirical Analysis of Communities in Real-World Networks

An Empirical Analysis of Communities in Real-World Networks An Empirical Analysis of Communities in Real-World Networks Chuan Sheng Foo Computer Science Department Stanford University csfoo@cs.stanford.edu ABSTRACT Little work has been done on the characterization

More information

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH International Journal of Information Technology and Knowledge Management January-June 2011, Volume 4, No. 1, pp. 27-32 DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY)

More information

GENETIC ALGORITHM METHOD FOR COMPUTER AIDED QUALITY CONTROL

GENETIC ALGORITHM METHOD FOR COMPUTER AIDED QUALITY CONTROL 3 rd Research/Expert Conference with International Participations QUALITY 2003, Zenica, B&H, 13 and 14 November, 2003 GENETIC ALGORITHM METHOD FOR COMPUTER AIDED QUALITY CONTROL Miha Kovacic, Miran Brezocnik

More information

A Genetic Algorithm-Based Approach for Building Accurate Decision Trees

A Genetic Algorithm-Based Approach for Building Accurate Decision Trees A Genetic Algorithm-Based Approach for Building Accurate Decision Trees by Z. Fu, Fannie Mae Bruce Golden, University of Maryland S. Lele,, University of Maryland S. Raghavan,, University of Maryland Edward

More information

Redundancy Based Feature Selection for Microarray Data

Redundancy Based Feature Selection for Microarray Data Redundancy Based Feature Selection for Microarray Data Lei Yu Department of Computer Science & Engineering Arizona State University Tempe, AZ 85287-8809 leiyu@asu.edu Huan Liu Department of Computer Science

More information

Automatic Selection of GCC Optimization Options Using A Gene Weighted Genetic Algorithm

Automatic Selection of GCC Optimization Options Using A Gene Weighted Genetic Algorithm Automatic Selection of GCC Optimization Options Using A Gene Weighted Genetic Algorithm San-Chih Lin, Chi-Kuang Chang, Nai-Wei Lin National Chung Cheng University Chiayi, Taiwan 621, R.O.C. {lsch94,changck,naiwei}@cs.ccu.edu.tw

More information

Study on the Application Analysis and Future Development of Data Mining Technology

Study on the Application Analysis and Future Development of Data Mining Technology Study on the Application Analysis and Future Development of Data Mining Technology Ge ZHU 1, Feng LIN 2,* 1 Department of Information Science and Technology, Heilongjiang University, Harbin 150080, China

More information

Study on GA-based matching method of railway vehicle wheels

Study on GA-based matching method of railway vehicle wheels Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(4):536-542 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Study on GA-based matching method of railway vehicle

More information