Mining Class Contrast Functions by Gene Expression Programming 1
|
|
- Oliver Horn
- 6 years ago
- Views:
Transcription
1 Mining Class Contrast Functions by Gene Expression Programming 1 Lei Duan, Changjie Tang, Liang Tang, Tianqing Zhang and Jie Zuo School of Computer Science, Sichuan University, Chengdu , China {leiduan, cjtang}@scu.edu.cn Abstract. Finding functions whose accuracies change significantly between two classes is an interesting work. In this paper, this kind of functions is defined as class contrast functions. As Gene Expression Programming (GEP) can discover essential relations from data and express them mathematically, it is desirable to apply GEP to mining such class contrast functions from data. The main contributions of this paper include: (1) proposing a new data mining task class contrast function mining, (2) designing a GEP based method to find class contrast functions, (3) presenting several strategies for finding multiple class contrast functions in data, (4) giving an extensive performance study on both synthetic and real world datasets. The experimental results show that the proposed methods are effective. Several class contrast functions are discovered from the real world datasets. Some potential works on class contrast function mining are discussed based on the experimental results. Keywords: Gene Expression Programming, Contrast Mining, Data Mining 1 Introduction Discovering mathematic models that can precisely describe the underlying relationships and be easily understood from observed data is helpful for scientists to know better on the unknown. Given a set containing several class samples, it is interesting to find some models or relationships that exist in one class while not in others. For example, the height of an ordinary person equals to his/her arm span length. However, for most basketball players and swimmers, their arm span lengths are greater than their heights. This kind of relationships differs from the regression models which do not take class information into consideration. In this paper, we propose a new data mining task called class contrast function mining. The class contrast function has following characteristics: Each variable in class contrast function is an attribute in the sample set. A class contrast function has high accuracy in one class but not in other classes. 1 This work was supported by the National Natural Science Foundation of China under grant No , and the 11th Five Years Key Programs for Sci. &Tech. Development of China under grant No. 2006BAI05A01.
2 Definition 1 (Contrast Ratio). Given a dataset D that contains two class samples (c 1 and c 2 ). Let f be a function. Suppose Err(f, c 1 ) and Err(f, c 2 ) are average errors of f in c 1 and c 2, respectively. If Err(f, c 1 ) Err(f, c 2 ), then the contrast ratio of f, CR(f): CR(f) = Err(f, c 1 ) / Err(f, c 2 ) And Err(f, c 1 ) is called tiny error, Err(f, c 2 ) is called coarse error. Definition 2 (Class Contrast Function). Let CR(f) be the contrast ratio of f on dataset D. Given an user defined parameter ε, 0 < ε < 1.0. If CR(f) < ε, then f is a class contrast function on D. Example 1. Given a sample set with two classes (c 1 and c 2 ), the samples are list in Table 1. Given a function f(a 1, a 2, a 3, a 4 ): a 1 + a 2 + a 3 a 4 = 0, then the average absolute errors of f in c 1 and c 2 are and 4.0 respectively. Let CR(f) be Then f is a class contrast function of c 1. Table 1. A sample set with two classes (c 1 and c 2 ) a 1 a 2 a 3 a 4 a 5 Class s c 1 s c 1 s c 1 s c 2 s c 2 s c 2 From Definition 1 and Definition 2, we can see that a good class contrast function should have small contrast ratio. In other words, the tiny error should be small and the coarse error should be large. However, small contrast ratio may not mean small tiny error. So, in our work we define a contrast threshold t that the tiny error of the class contrast function discovered must be less than it. In other words, if the tiny error of a discovered class contrast function is greater than contrast threshold, this function may be meaningless (big error in each class). Intuitively, the concept of class contrast function is similar to the concept of emerging patterns (EPs) which was proposed by Dong and Li [1]. Due to wide applications of EPs, many high performance algorithms on discovering EPs have been proposed [2-8]. However, class contrast function is not as the same as EP, since EPs are set of items whose support changes significantly between the two classes. Algorithms of emerging patterns discovery cannot be applied to datasets with numeric attributes directly, unless some discretization method is adopt. However, information may be lost in the process of converting numeric values to items. To the best of our knowledge, there is no previous work on finding function relationships whose accuracies are different between classes have been done. Generally, finding the class contrast functions has following challenges. How to determine the form of class contrast functions to be discovered. In the real world applications, there is no priori knowledge on function form,
3 variables, and parameters of the function. Even we are not sure whether there exists class contrast function in data or not. How to find class contrast functions from high dimensional dataset? How to find all class contrast functions those exist in the given dataset? Traditional regression methods can be used to discover functions from dataset. However, such methods need user to define some hypothesis. Gene Expression Programming (GEP) [9, 10], which is the newest development of Genetic Algorithms (GA) and Genetic Programming (GP), has strong calculation power due to the special individual structure. GEP can evolve functions with little priori knowledge. And it is unnecessary to define the function form in GEP. GEP can select function variables automatically from all given attributes. GEP has been widely used in data mining [11-16]. Section 2 introduces the preliminary knowledge of GEP. The main contributions of this work include: (1) proposing a new data mining task class contrast function mining, (2) designing a GEP based method to find class contrast functions, (3) presenting several strategies for finding multiple class contrast functions in datasets, (4) giving an extensive performance study on proposed methods and discussing some potential work on class contrast functions. The rest of this paper is organized as follows. Section 2 introduces related works. Section 3 presents the main ideas used by our algorithms and the algorithms. Section 4 reports an experimental evaluation of the algorithms. Section 5 discusses future works, and concluding remarks. 2 Related Work 2.1 Emerging Patterns Emerging Patterns (EPs) are contrast items between two classes of data whose support changes significantly between the two classes [1]. Specially, pattern which just occurs in some samples of one class is called jumping EP [1]. Since the first EP mining algorithm was proposed in [1], several methods had been designed, including: Constraint-based approach [3], Tree-based approach [4, 5], projection based algorithm [6], ZBDD based method [7], and Equivalence Classes based method [8]. The complexity of finding emerging patterns is MAX SNP-hard [17]. EPs can be found in many real world datasets. Since EPs have high discrimination power [17], many EP based classification methods have been proposed, such as CAEP [18], DeEPs [19, 20], Jumping EP based method [21]. By using EPs, the discriminating power of low support EPs, together with high support ones and multi-feature conditions are taken into consideration when building a classifier. The research results show that EP based classification methods often out perform state of the art classifiers, including C4.5 and SVM [17].
4 2.2 The Basic Concepts and Terminology Definitions of GEP The basic steps of using GEP to seek the optimal solution are the same as those of GA and GP [9]. The main players in GEP are: the chromosome and the expression tree. The chromosome is a linear, symbolic string of fixed length, while the expression tree contains the genetic information of the chromosome. A chromosome consists of one or more genes. Each gene is divided into a head and a tail. The head contains symbols that represent functions or terminals, whereas the tail contains only terminals. In GEP, the length of a gene and the number of genes composed in a chromosome are fixed. However, each gene can code for an expression tree of different sizes and shapes. The valid part of GEP genes can be got by parsing the expression tree from left to right and from top to bottom. Since the structural organization of GEP genes is flexible, any modification made in the chromosome can generate a valid expression tree. So all programs evolved by GEP are syntactically correct. Based on the natural selection principle, GEP operates iteratively evolving a population of chromosomes, encoding candidate solutions, through genetic operators, such as selection, crossover, and mutation, to find an optimum solution. The details of GEP implementation can be referred in [6]. Other than C. Ferreira s researches [9-12], GEP has been widely used in data mining research fields, such as, symbolic regression [13], classification [14, 15], and time series analysis [16]. 3 Class Contrast Function Mining 3.1 Fitness Function Design in GEP The GEP algorithm begins with the random generation of a set of chromosomes, which is called the initial population. Then the fitness of each individual is evaluated according to fitness function. The individuals are then selected according to fitness to reproduce with modification, leaving progeny with new characteristics. The individuals of this new generation are subjected to the same evolution process: expression of the genomes, confrontation of the selection environment, and reproduction with modification. This procedure is repeated until a satisfactory solution is found, or a predetermined number of generations is reached. Then evolution stops and the best-so-far solution is returned [9, 10]. The fitness function in GEP determines the evolution direction of candidate solutions. As stated before, we prefer to find class contrast function that not only has small contrast ratio but also small tiny error. Given a GEP individual g, the fitness of g is calculated as follows. fitness(g) = 1/ serr t 0.5 serr > t (1 serr / cerr) serr t where serr is the tiny error, cerr is the coarse error and t is contrast threshold. There are two phrases of evaluating fitness of a GEP individual g, (1)
5 The tiny error is greater than the predefined value. In this phrase, the fitness is evaluated by the tiny error. The fitness range is (0, 0.5]. The tiny error is not greater than the predefined value. In this phrase, the fitness is evaluated by both tiny error and coarse error. The fitness range is [0.5, 1.0]. Based on Equation (1), the GEP individual whose tiny error is small and contrast ratio is small will get high fitness value. Since individuals with higher fitness values will get larger opportunities to survive and evolve, using Equation (1) in GEP can generate desirable results. In our work, we are interested in finding functions that has high accuracy in one class but not in other classes. This is the main difference between our work and other function discovery works. Algorithm 1 describes the pseudo code of evaluating GEP individuals to evolve class contrast functions. Algorithm 1: GEP_Evaluate(Pop, D1, D2, t) Input: (1) A set of evolving GEP individuals: Pop; (2) A set of samples that belong to class c: D1; (3) A set of samples that do not belong to class c: D2; (4) A user defined threshold: t. Output: the individual with the highest fitness: bestindividual. begin 1. bestfit 0 2. bestindividual NULL 3. For each individual ind in Pop 4. Err1 Min(getAvgErr(ind, D1), getavgerr(ind, D2)) 5. Err2 Max(getAvgErr(ind, D1), getavgerr(ind, D2)) 6. if Err1 > t 7. Fit 1/Err1 * t * else 9. Fit (1 Err1/Err2) * if bestfit < Fit 11. bestfit Fit 12. bestindividual ind 13. return bestindividual end. Algorithm 1 states the process of the evaluation in GEP. In Step 4 and 5, Function getavgerr() returns the average errors of current individual ind in dataset D1 and D2, respectively. From Step 6 to 9, fitness is evaluated according to Equation (1). The time complexity of Algorithm 1 is O(m*n), where m is the number of GEP individuals in population, and n is the number of samples in D1 and D2. There are two stop conditions for GEP evolving class contrast function: A function whose contrast ratio is greater than the predefined value is found. The number of generations equals to the predetermined value. If no class contrast function is found by GEP, we have two choices: increasing the predefined generation number or restarting GEP once more. If no satisfactory function is found after several independent GEP runs, we stop the searching and conclude there may be no class contrast function exists in the given dataset.
6 3.2 Finding Class Contrast Function in High Dimensional Dataset In Subsection 3.1, we describe the basic steps of applying GEP to finding class contrast functions. In a naïve way, we take all attribute values of samples as GEP terminals. However, this naïve way may be inefficient when the dataset is high dimensional. The reason lies that in GEP when the number of terminals is large, the evolution efficiency is low. As a result, it is undesirable to apply GEP to evolving class contrast functions in high dimensional dataset directly. As a more challenging problem in this work, we investigate how to select a part of original attributes of dataset as GEP terminals. Naturally, the selected attributes should contribute to generating good class contrast functions. Intuitively, attributes whose values differ greatly in different classes are preferable. Since class distribution information is available for each attribute in class contrast function mining, we adopt information gain based method to select GEP terminals from dataset. In addition, we take the correlation information of selected attributes into consideration. The details of calculating information gain are presented in [22, 23]. For each attribute in the dataset, we calculate its information gain. And we sort all attributes in information gain descending order. Suppose the size of GEP terminal set is k. The simple way is fetching attributes with top k information gains. However, some correlations may exist in attributes. So, we select attributes that have high information gains and low correlations among them. Let v 1 and v 2 be two attribute vectors. The correlation between them, denoted as Cor(v 1, v 2 ), is calculated in following way. Cor(v 1, v 2 ) = v 1 v 2 /( v 1 v 2 ) (2) Suppose V is the set of selected attribute vectors. For each unselected attribute vector v, we calculate its correlation with each selected attribute. And take the maximum as the correlation between v and V. The correlation score between v and V, denoted as CorScore(v, V), is defined as follows. CorScore(v, V) = 1 Max{Cor(v, v ) v V} (3) In Equation (3), we take the maximal value, since want to select the attribute that has the minimal correlation value with selected attributes. Suppose Gain(v) is the information gain of attribute v, we define the contrast score, denoted as ConScore(v), as follows. ConScore(v) = Gain(v) * CorScore(v, V) (4) Given a high dimensional dataset, suppose we want to select k attributes as GEP terminals. First, we select the attribute with the highest information gain. Then we selected another k 1 attributes based on Equation (4) one by one. 3.3 Strategies of Searching Multiple Class Contrast Functions The next problem is how to discover multiple class contrast functions which may exist in the dataset. It is a challenging work since we have no idea about the form of the next class contrast function. Even we are not sure whether another class contrast function exists. Moreover, if we apply GEP to the dataset again, the same class
7 contrast function which has been discovered may be found again. So, we should avoid this situation. We design three strategies for searching multiple class contrast functions in three different cases. Keep the dataset unchanged, and record each discovered class contrast function. If the current GEP individual is a class contrast function discovered already, its fitness is assigned as the minimum value. For each discovered class contrast function, attributes in it are removed from the original dataset before finding the next class contrast function. For each discovered class contrast function, the attribute in the function and has the highest information gain is removed from the original dataset before finding the next class contrast function. The first strategy takes all attributes as GEP terminals each time, so it is suitable for the case that the number of dimensions of the dataset is small. The second strategy takes different attributes as GEP terminals each time, so it is suitable for the case that the number of dimensions of the dataset is large. And the third strategy is suitable for the case the number of dimensions of the dataset is neither large nor small. 4 Performance Evaluation To evaluate the performance of our method for mining class contrast functions from data, we implement all proposed algorithms and GEP algorithm in Java. The experiments are performed on an Intel Pentium Dual 1.80 GHz (2 Cores) PC with 2G memory running Windows XP operating system. Table 2 lists the parameters of GEP in our experiments. Moreover, to improve the function discovery ability of GEP, the constant set is {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 2, 4, 5, 6, 7, 8, 9} in our experiments. Please refer to [10] for the detail usages of these parameters. Table 2. Default parameters for GEP algorithm parameter value parameter value Population size 500 One-point recombination rate 0.4 Number of generations 200 Two-point recombination rate 0.2 Linking function addition Gene recombination rate 0.1 Function set {+,, *, / } IS transposition rate 0.1 Number of genes 3 IS elements length 1, 2, 3 Gene head size 7 RIS transposition rate 0.1 Selection operator tournament RIS elements length 1, 2, 3 Mutation rate 0.3 Gene transposition rate Experiments on Synthetic Datasets First, we generate some synthetic datasets that contain linear relation or non-linear relation to validate the effectiveness of the algorithms for mining class contrast function. For each synthetic dataset, there are 100 samples belong to c 1 class, and 100
8 samples belong to c 2 class. All values in each synthetic dataset are generated randomly under an even distribution, and the value range is [-200, 200]. Let D 1 = {X 1, X 2, X 3, X 4, X 5 } be a synthetic dataset. Suppose all samples of c 1 in D 1 satisfy linear relation f 1 : 2X 1 + X 2 X 3 = 0. That is, f 1 is a class contrast function of c 1 in D 1. Let D 2 = {X 1, X 2, X 3, X 4, X 5 } be a synthetic dataset. Suppose all samples of c 1 in D 2 satisfy non-linear relation f 2 : X 1 * X 2 + X 3 X 5 = 0. That is, f 2 is a class contrast function of c 1 in D 2. We apply our method to discovering f 1 and f 2 from D 1 and D 2 respectively. Three different contrast thresholds (5, 50 and 100) are set in the experiment. For each contrast threshold, we run the method 20 times independently, and record the GEP generation number when the predefined function is discovered. The success ratio is the percent of the times of finding the predefined function compared to the total running times. Figure 1 and 2 respectively illustrates the success ratios and average generation numbers of finding f 1, f 2 in D 1 and D 2 under different contrast thresholds. Fig. 1. The success ratios and average generation numbers of finding f 1 in D 1 Fig. 2. The success ratios and average generation numbers of finding f 2 in D 2 From Figure 1 and Figure 2, we can see that when the contrast threshold is 50, our method can find the predefined class contrast functions (f 1 and f 2 ) efficiently. The success ratios are 100% in both D 1 and D 2. The average generation number in each case is 4.3 and 46.2, respectively. Moreover, we can see that when the contrast threshold is 5, the average generation number is larger compared with when the contrast threshold equals to 50. The reason lies that if the contrast threshold is small, the first phrase of fitness evaluation (terr > t) will be tough for GEP individuals, most potential good individuals may be eliminated. So the evolution process is decreased. When the contrast threshold is 100, the success ratio is smaller compared with when
9 the contrast threshold equals to 50. The reason lies that if the contrast threshold is large, the first phrase of fitness evaluation (terr > t) cannot filter some bad GEP individuals out. Individuals which have big errors in either class but small contrast ratio may be selected as the best result. As a result, the success ratio is decreased. 4.2 Experiments on Real World Datasets Next, we apply our method to some microarray datasets, which are downloaded from Kent Ridge Bio-medical Dataset ( The characteristics of microarray dataset include: high dimension, numeric attributes, etc. Table 3 lists the characteristics of the microarray data test in our experiments. Table 3. Data characteristics of 3 microarray datasets Dataset # samples in class 1 # samples in class 2 # attributes Breast Cancer 44 (non-relapse) 34 (relapse) Central Nervous 21 (survivors) 39 (failures) 7129 Colon Cancer 22 (normal) 40 (cancer) 2000 As stated previously, the evolution efficiency of GEP is low when the terminal set size is large. For each microarray dataset, we calculate the contrast score of each attribute based on the method introduced in Subsection 3.2. We fetch 10 attributes with highest contrast scores, and find class contrast functions from them. Specifically, for Breast Cancer dataset, the index set of selected attribute is {376, 7813, 8781, 13620, 6326, 21943, 18424, 726, 7508, 19967} 2. For Central Nervous dataset, the index set of selected attribute is {7015, 5527, 2473, 2141, 843, 3419, 3774, 4605, 2088, 10}. And for Colon Cancer dataset, the index set of selected attribute is {1670, 248, 1041, 1292, 142, 1410, 1327, 1324, 1771, 896}. Subsection 3.3 introduces three strategies for finding multiple class contrast functions. In this experiment, we adopt the first strategy to find more class contrast functions. For each dataset, we apply our proposed method 20 times independently to discover class contrast functions. If a class contrast function whose fitness is greater than 0.5 is found, this run is marked as a success. In this case, the tiny error of the discovered function is greater than the contrast threshold. The success ratio is the percent of the number of success runs compared with the total running times. The number of generations is set as 500. Three different contrast thresholds (t) are set in experiments. We choose these values so that different success ratios can be got which is helpful for us to analyze the experimental results. Table 4 to Table 6 list the experimental results on these three data subsets. From Table 4, we can see that in Colon Cancer data subset it is failed to find class contrast function whose tiny error is less than The reasons may include: first, there is no function satisfies this constraint; Second, larger generation number for GEP should be set; Third, more functions should be added into GEP s function set. The success ratio can be increased by setting larger contrast threshold. But the tiny error of the best individual may be increased. It is worth to note that the difference 2 The index of the first attribute in the original dataset is 0. Here, we list the indexes of selected attributes in contrast score descending order.
10 between the average tiny error and the best tiny error is small. Since the fitness of each individual depends on its contrast ratio, some individuals are chosen due to the large coarse error. So, determining suitable contrast threshold is necessary and important. Similar conclusions can be got from Table 5 and Table 6. In Table 6, when the contrast threshold is 180 or 190, we get the same class contrast function several times. So the average values equal to the values of the best one. Table 4. The experimental results on Breast Cancer data subset t = 0.08 t = t = 0.09 Success ratio 0% 80% 100% Avg. contrast ratio \ Best contrast ratio \ Avg. tiny error \ Best tiny error \ Avg. coarse error \ Best coarse error \ Best class contrast function \ x *(x )-x =0 Table 5. The experimental results on Central Nervous data subset (x )*( )-x *x 376 * 0.7*x 376 -x =0 t = 85 t = 90 t = 95 Success ratio 40% 70% 100% Avg. contrast ratio Best contrast ratio Avg. tiny error Best tiny error Avg. coarse error Best coarse error (0.3+x Best class contrast function x x 4605 )*0.3- x x x 10 = x *x 10 = x * x 10 = 0 Table 6. The experimental results on Colon Cancer data subset t = 180 t = 190 t = 200 Success ratio 10% 25% 100% Avg. contrast ratio Best contrast ratio Avg. tiny error Best tiny error Avg. coarse error Best coarse error x Best class contrast function 1292 *(x )/(x (x 142 +x 248 )*0.2- x 248 /((0.3+x 1771 /5.0)* +x 1041 )-x 896 =0 x 896 =0 0.1)-x 896 Furthermore, we can adopt the second or the third strategy described in Subsection 3.3 to find other class contrast functions in these datasets. The basic process is similar to adopting the first strategy.
11 5 Discussions and Conclusions Finding the difference between different classes is an important data mining task. Some concepts on contrast mining have been proposed. For example, Emerging Patterns are item sets whose support changes significantly between two classes. However, methods for finding EPs cannot be applied to numeric dataset directly. In this paper, we propose a new data mining task called class contrast function mining. Shortly, class contrast functions are functions whose accuracies change significantly between two classes. Class contrast function mining is a challenging work. It is related to regression, contrast mining, classification, etc. Initially, we design a GEP based method to discover class contrast functions, and apply it to both synthetic and real world datasets. The experimental results show that our proposed methods are effective. Several class contrast functions are discovered from the real world datasets. We also get some conclusions by analyzing the experimental results. There are many works worth to be deeply analyzed in the future. For example, in our experiments we demonstrate that the contrast threshold is important. How to design a method that can adjust the contrast threshold self-adaptively is a desirable future work. Moreover, we will consider how to find all class contrast functions. References 1. Dong, G., Li, J.: Efficient Mining of Emerging Patterns: Discovering Trends and Differences. In: Proc. of KDD 1999: (1999). 2. Dong, G., Li, J.: Mining Border Descriptions of Emerging Patterns from Dataset Pairs. Knowl. Inf. Syst. 8(2): (2005). 3. Zhang, X., Dong, G., Ramamohanarao, K.: Exploring Constraints to Efficiently Mine Emerging Patterns from Large High-dimensional Datasets. In: Proc. of KDD 2000: (2000). 4. Bailey, J., Manoukian, T, Ramamohanarao, K.: Fast Algorithms for Mining Emerging Patterns. In: Proc. of PKDD 2002: (2002). 5. Fan, H., Ramamohanarao, K.: An Efficient Single-Scan Algorithm for Mining Essential Jumping Emerging Patterns for Classification. In: Proc. of PAKDD 2002: (2002). 6. Bailey, J., Manoukian, T., Ramamohanarao, K.: A Fast Algorithm for Computing Hypergraph Transversals and its Application in Mining Emerging Patterns. In: Proc. of ICDM 2003: (2003). 7. Loekito, E., Bailey, J.: Fast Mining of High Dimensional Expressive Contrast Patterns Using Zero-suppressed Binary Decision Diagrams. In: Proc. of KDD 2006: (2006). 8. Li, J., Liu, G., Wong, L.: Mining Statistically Important Equivalence Classes and Delta-Discriminative Emerging Patterns. In: Proc. of KDD 2007: (2007). 9. Ferreira, C.: Gene Expression Programming: A New Adaptive Algorithm for Solving Problems. Complex Systems, 13(2): (2001). 10. Ferreira, C.: Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence. Angra do Heroismo, Portugal (2002). 11. Ferreira, C.: Discovery of the Boolean Functions to the Best Density-Classification Rules Using Gene Expression Programming. In: Proc of the 4th EuroGP: (2002).
12 12. Ferreira, C.: Mutation, Transposition, and recombination: An analysis of the evolutionary Dynamics. 4th Int l Workshop on Frontiers in Evolutionary Algorithms, Research Triangle Park, North Carolina, USA (2002). 13. Lopes, H.S., Weinert, W. R.. EGIPSYS: An Enhanced Gene Expression Programming Approach for Symbolic Regression Problems. Int l Journal of Applied Mathematics and Computer Science, 14 (3): (2004). 14. Zhou, C., Xiao, W., Tirpak, T. M., Nelson, P. C.: Evolution Accurate and Compact Classification Rules with Gene Expression Programming. IEEE Transactions on Evolutionary Computation, 7(6): (2003). 15. Duan, L., Tang, C., Zhang, T., et.al: Distance Guided Classification with Gene Expression Programming. In: Proc. of ADMA 2006: (2006). 16. Zuo, J., Tang, C., Li, C., et al: Time Series Prediction based on Gene Expression Programming. In: Proc. of WAIM 2004: (2004). 17. Bailey, J., Dong, G.: Contrast Data Mining: Methods and Applications. Tutorial at 2007 IEEE ICDM (2007). 18. Dong, G., Zhang, X., Wong, L., Li, J.: CAEP: Classification by Aggregating Emerging Patterns. Discovery Science: (1999). 19. Li, J., Dong, G., Ramamohanarao, K.: Instance-Based Classification by Emerging Patterns. In: PKDD 2000: (2000). 20. Li, J., Dong, G., Ramamohanarao, K., Wong, L.: DeEPs: A New Instance-Based Lazy Discovery and Classification System. Machine Learning 54(2): (2004). 21. Li, J., Dong, G., Ramamohanarao, K.: Making Use of the Most Expressive Jumping Emerging Patterns for Classification. In: Proc. of PAKDD 2000: (2000). 22. Dougherty, J., Kohavi, R., Sahami, M.: Supervised and Unsupervised Discretization of Continuous Features. In: Proc. of the 20th ICML: (1995). 23. Fayyad, U., Irani, K.: Multi-interval Discretization of Continuous-valued Attributes for Classification Learning. In: Proc. of the 13th IJCAI: (1993).
A Comparative Study of Linear Encoding in Genetic Programming
2011 Ninth International Conference on ICT and Knowledge A Comparative Study of Linear Encoding in Genetic Programming Yuttana Suttasupa, Suppat Rungraungsilp, Suwat Pinyopan, Pravit Wungchusunti, Prabhas
More informationMining Generalised Emerging Patterns
Mining Generalised Emerging Patterns Xiaoyuan Qian, James Bailey, Christopher Leckie Department of Computer Science and Software Engineering University of Melbourne, Australia {jbailey, caleckie}@csse.unimelb.edu.au
More informationPreprocessing of Stream Data using Attribute Selection based on Survival of the Fittest
Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Bhakti V. Gavali 1, Prof. Vivekanand Reddy 2 1 Department of Computer Science and Engineering, Visvesvaraya Technological
More informationThe k-means Algorithm and Genetic Algorithm
The k-means Algorithm and Genetic Algorithm k-means algorithm Genetic algorithm Rough set approach Fuzzy set approaches Chapter 8 2 The K-Means Algorithm The K-Means algorithm is a simple yet effective
More informationFast Efficient Clustering Algorithm for Balanced Data
Vol. 5, No. 6, 214 Fast Efficient Clustering Algorithm for Balanced Data Adel A. Sewisy Faculty of Computer and Information, Assiut University M. H. Marghny Faculty of Computer and Information, Assiut
More informationClassification of Concept-Drifting Data Streams using Optimized Genetic Algorithm
Classification of Concept-Drifting Data Streams using Optimized Genetic Algorithm E. Padmalatha Asst.prof CBIT C.R.K. Reddy, PhD Professor CBIT B. Padmaja Rani, PhD Professor JNTUH ABSTRACT Data Stream
More informationA Parallel Evolutionary Algorithm for Discovery of Decision Rules
A Parallel Evolutionary Algorithm for Discovery of Decision Rules Wojciech Kwedlo Faculty of Computer Science Technical University of Bia lystok Wiejska 45a, 15-351 Bia lystok, Poland wkwedlo@ii.pb.bialystok.pl
More informationAn Application of Genetic Algorithm for Auto-body Panel Die-design Case Library Based on Grid
An Application of Genetic Algorithm for Auto-body Panel Die-design Case Library Based on Grid Demin Wang 2, Hong Zhu 1, and Xin Liu 2 1 College of Computer Science and Technology, Jilin University, Changchun
More informationLiterature Review On Implementing Binary Knapsack problem
Literature Review On Implementing Binary Knapsack problem Ms. Niyati Raj, Prof. Jahnavi Vitthalpura PG student Department of Information Technology, L.D. College of Engineering, Ahmedabad, India Assistant
More informationAN OPTIMIZATION GENETIC ALGORITHM FOR IMAGE DATABASES IN AGRICULTURE
AN OPTIMIZATION GENETIC ALGORITHM FOR IMAGE DATABASES IN AGRICULTURE Changwu Zhu 1, Guanxiang Yan 2, Zhi Liu 3, Li Gao 1,* 1 Department of Computer Science, Hua Zhong Normal University, Wuhan 430079, China
More informationClustering Analysis of Simple K Means Algorithm for Various Data Sets in Function Optimization Problem (Fop) of Evolutionary Programming
Clustering Analysis of Simple K Means Algorithm for Various Data Sets in Function Optimization Problem (Fop) of Evolutionary Programming R. Karthick 1, Dr. Malathi.A 2 Research Scholar, Department of Computer
More informationUsing a genetic algorithm for editing k-nearest neighbor classifiers
Using a genetic algorithm for editing k-nearest neighbor classifiers R. Gil-Pita 1 and X. Yao 23 1 Teoría de la Señal y Comunicaciones, Universidad de Alcalá, Madrid (SPAIN) 2 Computer Sciences Department,
More informationOptimization of Association Rule Mining through Genetic Algorithm
Optimization of Association Rule Mining through Genetic Algorithm RUPALI HALDULAKAR School of Information Technology, Rajiv Gandhi Proudyogiki Vishwavidyalaya Bhopal, Madhya Pradesh India Prof. JITENDRA
More informationEvolutionary Computation. Chao Lan
Evolutionary Computation Chao Lan Outline Introduction Genetic Algorithm Evolutionary Strategy Genetic Programming Introduction Evolutionary strategy can jointly optimize multiple variables. - e.g., max
More informationImproving the Efficiency of Fast Using Semantic Similarity Algorithm
International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year
More informationEvolving SQL Queries for Data Mining
Evolving SQL Queries for Data Mining Majid Salim and Xin Yao School of Computer Science, The University of Birmingham Edgbaston, Birmingham B15 2TT, UK {msc30mms,x.yao}@cs.bham.ac.uk Abstract. This paper
More information4/22/2014. Genetic Algorithms. Diwakar Yagyasen Department of Computer Science BBDNITM. Introduction
4/22/24 s Diwakar Yagyasen Department of Computer Science BBDNITM Visit dylycknow.weebly.com for detail 2 The basic purpose of a genetic algorithm () is to mimic Nature s evolutionary approach The algorithm
More informationMulti-label classification using rule-based classifier systems
Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar
More informationLecture 8: Genetic Algorithms
Lecture 8: Genetic Algorithms Cognitive Systems - Machine Learning Part II: Special Aspects of Concept Learning Genetic Algorithms, Genetic Programming, Models of Evolution last change December 1, 2010
More informationRobust Gene Expression Programming
Available online at www.sciencedirect.com Procedia Computer Science 6 (2011) 165 170 Complex Adaptive Systems, Volume 1 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University of Science
More informationEnhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques
24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE
More informationDETERMINING MAXIMUM/MINIMUM VALUES FOR TWO- DIMENTIONAL MATHMATICLE FUNCTIONS USING RANDOM CREOSSOVER TECHNIQUES
DETERMINING MAXIMUM/MINIMUM VALUES FOR TWO- DIMENTIONAL MATHMATICLE FUNCTIONS USING RANDOM CREOSSOVER TECHNIQUES SHIHADEH ALQRAINY. Department of Software Engineering, Albalqa Applied University. E-mail:
More informationChapter 28. Outline. Definitions of Data Mining. Data Mining Concepts
Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms
More informationA Survey of Classification on Emerging Pattern
IJIRST International Journal for Innovative Research in Science & Technology Volume 1 Issue 7 December 2014 ISSN (online): 2349-6010 A Survey of Classification on Emerging Pattern Harsha Parmar PG Student
More informationGenetic Programming. Charles Chilaka. Department of Computational Science Memorial University of Newfoundland
Genetic Programming Charles Chilaka Department of Computational Science Memorial University of Newfoundland Class Project for Bio 4241 March 27, 2014 Charles Chilaka (MUN) Genetic algorithms and programming
More informationA Survey on Postive and Unlabelled Learning
A Survey on Postive and Unlabelled Learning Gang Li Computer & Information Sciences University of Delaware ligang@udel.edu Abstract In this paper we survey the main algorithms used in positive and unlabeled
More informationHeuristic Optimisation
Heuristic Optimisation Part 10: Genetic Algorithm Basics Sándor Zoltán Németh http://web.mat.bham.ac.uk/s.z.nemeth s.nemeth@bham.ac.uk University of Birmingham S Z Németh (s.nemeth@bham.ac.uk) Heuristic
More informationSolving Sudoku Puzzles with Node Based Coincidence Algorithm
Solving Sudoku Puzzles with Node Based Coincidence Algorithm Kiatsopon Waiyapara Department of Compute Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, Thailand kiatsopon.w@gmail.com
More informationLimsoon Wong (Joint work with Mengling Feng, Thanh-Son Ngo, Jinyan Li, Guimei Liu)
Theory, Practice, and an Application of Frequent Pattern Space Maintenance Limsoon Wong (Joint work with Mengling Feng, Thanh-Son Ngo, Jinyan Li, Guimei Liu) 2 What Data? Transactional data Items, transactions,
More informationA Classifier with the Function-based Decision Tree
A Classifier with the Function-based Decision Tree Been-Chian Chien and Jung-Yi Lin Institute of Information Engineering I-Shou University, Kaohsiung 84008, Taiwan, R.O.C E-mail: cbc@isu.edu.tw, m893310m@isu.edu.tw
More informationFinding Contrast Patterns in Imbalanced Classification based on Sliding Window
4th International Conference on Mechanical Materials and Manufacturing Engineering (MMME 2016) Finding Contrast Patterns in Imbalanced Classification based on Sliding Window Xiangtao Chen & Zhouzhou Liu
More informationA Novel Algorithm for Associative Classification
A Novel Algorithm for Associative Classification Gourab Kundu 1, Sirajum Munir 1, Md. Faizul Bari 1, Md. Monirul Islam 1, and K. Murase 2 1 Department of Computer Science and Engineering Bangladesh University
More informationPerformance Assessment of DMOEA-DD with CEC 2009 MOEA Competition Test Instances
Performance Assessment of DMOEA-DD with CEC 2009 MOEA Competition Test Instances Minzhong Liu, Xiufen Zou, Yu Chen, Zhijian Wu Abstract In this paper, the DMOEA-DD, which is an improvement of DMOEA[1,
More informationOutline. Motivation. Introduction of GAs. Genetic Algorithm 9/7/2017. Motivation Genetic algorithms An illustrative example Hypothesis space search
Outline Genetic Algorithm Motivation Genetic algorithms An illustrative example Hypothesis space search Motivation Evolution is known to be a successful, robust method for adaptation within biological
More informationClassification and Optimization using RF and Genetic Algorithm
International Journal of Management, IT & Engineering Vol. 8 Issue 4, April 2018, ISSN: 2249-0558 Impact Factor: 7.119 Journal Homepage: Double-Blind Peer Reviewed Refereed Open Access International Journal
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationScheme of Big-Data Supported Interactive Evolutionary Computation
2017 2nd International Conference on Information Technology and Management Engineering (ITME 2017) ISBN: 978-1-60595-415-8 Scheme of Big-Data Supported Interactive Evolutionary Computation Guo-sheng HAO
More informationThe Study of Genetic Algorithm-based Task Scheduling for Cloud Computing
The Study of Genetic Algorithm-based Task Scheduling for Cloud Computing Sung Ho Jang, Tae Young Kim, Jae Kwon Kim and Jong Sik Lee School of Information Engineering Inha University #253, YongHyun-Dong,
More informationA High Growth-Rate Emerging Pattern for Data Classification in Microarray Databases
A High Growth-Rate Emerging Pattern for Data Classification in Microarray Databases Ye-In Chang, Zih-Siang Chen, and Tsung-Bin Yang Dept. of Computer Science and Engineering, National Sun Yat-Sen University
More informationUNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING. Daniela Joiţa Titu Maiorescu University, Bucharest, Romania
UNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING Daniela Joiţa Titu Maiorescu University, Bucharest, Romania danielajoita@utmro Abstract Discretization of real-valued data is often used as a pre-processing
More informationYunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace
[Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 20 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(20), 2014 [12526-12531] Exploration on the data mining system construction
More informationA GENETIC ALGORITHM FOR CLUSTERING ON VERY LARGE DATA SETS
A GENETIC ALGORITHM FOR CLUSTERING ON VERY LARGE DATA SETS Jim Gasvoda and Qin Ding Department of Computer Science, Pennsylvania State University at Harrisburg, Middletown, PA 17057, USA {jmg289, qding}@psu.edu
More informationMining Quantitative Association Rules on Overlapped Intervals
Mining Quantitative Association Rules on Overlapped Intervals Qiang Tong 1,3, Baoping Yan 2, and Yuanchun Zhou 1,3 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China {tongqiang,
More informationUsing a Probable Time Window for Efficient Pattern Mining in a Receptor Database
Using a Probable Time Window for Efficient Pattern Mining in a Receptor Database Edgar H. de Graaf and Walter A. Kosters Leiden Institute of Advanced Computer Science, Leiden University, The Netherlands
More informationIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence COMP307 Evolutionary Computing 3: Genetic Programming for Regression and Classification Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline Statistical parameter regression Symbolic
More informationThe Genetic Algorithm for finding the maxima of single-variable functions
Research Inventy: International Journal Of Engineering And Science Vol.4, Issue 3(March 2014), PP 46-54 Issn (e): 2278-4721, Issn (p):2319-6483, www.researchinventy.com The Genetic Algorithm for finding
More informationHardware Neuronale Netzwerke - Lernen durch künstliche Evolution (?)
SKIP - May 2004 Hardware Neuronale Netzwerke - Lernen durch künstliche Evolution (?) S. G. Hohmann, Electronic Vision(s), Kirchhoff Institut für Physik, Universität Heidelberg Hardware Neuronale Netzwerke
More informationIJMIE Volume 2, Issue 9 ISSN:
Dimensionality Using Optimization Algorithm for High Dimensional Data Clustering Saranya.S* Dr.Punithavalli.M** Abstract: This paper present an efficient approach to a feature selection problem based on
More informationPAPER Application of an Artificial Fish Swarm Algorithm in Symbolic Regression
872 IEICE TRANS. INF. & SYST., VOL.E96 D, NO.4 APRIL 2013 PAPER Application of an Artificial Fish Swarm Algorithm in Symbolic Regression Qing LIU a), Tomohiro ODAKA, Jousuke KUROIWA, and Hisakazu OGURA,
More informationKeywords Fuzzy, Set Theory, KDD, Data Base, Transformed Database.
Volume 6, Issue 5, May 016 ISSN: 77 18X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Fuzzy Logic in Online
More informationGenetic Algorithms. Kang Zheng Karl Schober
Genetic Algorithms Kang Zheng Karl Schober Genetic algorithm What is Genetic algorithm? A genetic algorithm (or GA) is a search technique used in computing to find true or approximate solutions to optimization
More informationSuppose you have a problem You don t know how to solve it What can you do? Can you use a computer to somehow find a solution for you?
Gurjit Randhawa Suppose you have a problem You don t know how to solve it What can you do? Can you use a computer to somehow find a solution for you? This would be nice! Can it be done? A blind generate
More informationData Mining Concepts
Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms Sequential
More informationAn Evolutionary Algorithm for the Multi-objective Shortest Path Problem
An Evolutionary Algorithm for the Multi-objective Shortest Path Problem Fangguo He Huan Qi Qiong Fan Institute of Systems Engineering, Huazhong University of Science & Technology, Wuhan 430074, P. R. China
More information1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra
Pattern Recall Analysis of the Hopfield Neural Network with a Genetic Algorithm Susmita Mohapatra Department of Computer Science, Utkal University, India Abstract: This paper is focused on the implementation
More informationDynamic Clustering of Data with Modified K-Means Algorithm
2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq
More informationA Clustering Method with Efficient Number of Clusters Selected Automatically Based on Shortest Path
A Clustering Method with Efficient Number of Clusters Selected Automatically Based on Shortest Path Makki Akasha, Ibrahim Musa Ishag, Dong Gyu Lee, Keun Ho Ryu Database/Bioinformatics Laboratory Chungbuk
More informationA THREAD BUILDING BLOCKS BASED PARALLEL GENETIC ALGORITHM
www.arpapress.com/volumes/vol31issue1/ijrras_31_1_01.pdf A THREAD BUILDING BLOCKS BASED PARALLEL GENETIC ALGORITHM Erkan Bostanci *, Yilmaz Ar & Sevgi Yigit-Sert SAAT Laboratory, Computer Engineering Department,
More informationOn Demand Phenotype Ranking through Subspace Clustering
On Demand Phenotype Ranking through Subspace Clustering Xiang Zhang, Wei Wang Department of Computer Science University of North Carolina at Chapel Hill Chapel Hill, NC 27599, USA {xiang, weiwang}@cs.unc.edu
More informationTraffic Signal Control Based On Fuzzy Artificial Neural Networks With Particle Swarm Optimization
Traffic Signal Control Based On Fuzzy Artificial Neural Networks With Particle Swarm Optimization J.Venkatesh 1, B.Chiranjeevulu 2 1 PG Student, Dept. of ECE, Viswanadha Institute of Technology And Management,
More informationGenetic Fuzzy Discretization with Adaptive Intervals for Classification Problems
Genetic Fuzzy Discretization with Adaptive Intervals for Classification Problems Yoon-Seok Choi School of Computer Science & Engineering, Seoul National University Shillim-dong, Gwanak-gu, Seoul, 151-742,
More informationUnsupervised Feature Selection Using Multi-Objective Genetic Algorithms for Handwritten Word Recognition
Unsupervised Feature Selection Using Multi-Objective Genetic Algorithms for Handwritten Word Recognition M. Morita,2, R. Sabourin 3, F. Bortolozzi 3 and C. Y. Suen 2 École de Technologie Supérieure, Montreal,
More informationMachine Learning: Algorithms and Applications Mockup Examination
Machine Learning: Algorithms and Applications Mockup Examination 14 May 2012 FIRST NAME STUDENT NUMBER LAST NAME SIGNATURE Instructions for students Write First Name, Last Name, Student Number and Signature
More informationFeature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate
More informationBasic Data Mining Technique
Basic Data Mining Technique What is classification? What is prediction? Supervised and Unsupervised Learning Decision trees Association rule K-nearest neighbor classifier Case-based reasoning Genetic algorithm
More informationApproach Using Genetic Algorithm for Intrusion Detection System
Approach Using Genetic Algorithm for Intrusion Detection System 544 Abhijeet Karve Government College of Engineering, Aurangabad, Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, Maharashtra-
More informationMutation in Compressed Encoding in Estimation of Distribution Algorithm
Mutation in Compressed Encoding in Estimation of Distribution Algorithm Orawan Watchanupaporn, Worasait Suwannik Department of Computer Science asetsart University Bangkok, Thailand orawan.liu@gmail.com,
More informationSanta Fe Trail Problem Solution Using Grammatical Evolution
2012 International Conference on Industrial and Intelligent Information (ICIII 2012) IPCSIT vol.31 (2012) (2012) IACSIT Press, Singapore Santa Fe Trail Problem Solution Using Grammatical Evolution Hideyuki
More informationWeb Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India
Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Abstract - The primary goal of the web site is to provide the
More informationMulti Expression Programming. Mihai Oltean
Multi Expression Programming Mihai Oltean Department of Computer Science, Faculty of Mathematics and Computer Science, Babeş-Bolyai University, Kogălniceanu 1, Cluj-Napoca, 3400, Romania. email: mihai.oltean@gmail.com
More informationNoise-based Feature Perturbation as a Selection Method for Microarray Data
Noise-based Feature Perturbation as a Selection Method for Microarray Data Li Chen 1, Dmitry B. Goldgof 1, Lawrence O. Hall 1, and Steven A. Eschrich 2 1 Department of Computer Science and Engineering
More informationResearch on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a
International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,
More informationMODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS
MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS J.I. Serrano M.D. Del Castillo Instituto de Automática Industrial CSIC. Ctra. Campo Real km.0 200. La Poveda. Arganda del Rey. 28500
More informationAttribute Selection with a Multiobjective Genetic Algorithm
Attribute Selection with a Multiobjective Genetic Algorithm Gisele L. Pappa, Alex A. Freitas, Celso A.A. Kaestner Pontifícia Universidade Catolica do Parana (PUCPR), Postgraduated Program in Applied Computer
More informationJHPCSN: Volume 4, Number 1, 2012, pp. 1-7
JHPCSN: Volume 4, Number 1, 2012, pp. 1-7 QUERY OPTIMIZATION BY GENETIC ALGORITHM P. K. Butey 1, Shweta Meshram 2 & R. L. Sonolikar 3 1 Kamala Nehru Mahavidhyalay, Nagpur. 2 Prof. Priyadarshini Institute
More informationData Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Search & Optimization Search and Optimization method deals with
More informationArgha Roy* Dept. of CSE Netaji Subhash Engg. College West Bengal, India.
Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Training Artificial
More informationLouis Fourrier Fabien Gaie Thomas Rolf
CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted
More informationCONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM
1 CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM John R. Koza Computer Science Department Stanford University Stanford, California 94305 USA E-MAIL: Koza@Sunburn.Stanford.Edu
More informationCHAPTER 5 ENERGY MANAGEMENT USING FUZZY GENETIC APPROACH IN WSN
97 CHAPTER 5 ENERGY MANAGEMENT USING FUZZY GENETIC APPROACH IN WSN 5.1 INTRODUCTION Fuzzy systems have been applied to the area of routing in ad hoc networks, aiming to obtain more adaptive and flexible
More informationGenetic Algorithms for Classification and Feature Extraction
Genetic Algorithms for Classification and Feature Extraction Min Pei, Erik D. Goodman, William F. Punch III and Ying Ding, (1995), Genetic Algorithms For Classification and Feature Extraction, Michigan
More informationTwo Algorithms of Image Segmentation and Measurement Method of Particle s Parameters
Appl. Math. Inf. Sci. 6 No. 1S pp. 105S-109S (2012) Applied Mathematics & Information Sciences An International Journal @ 2012 NSP Natural Sciences Publishing Cor. Two Algorithms of Image Segmentation
More information9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology
9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example
More informationA Conflict-Based Confidence Measure for Associative Classification
A Conflict-Based Confidence Measure for Associative Classification Peerapon Vateekul and Mei-Ling Shyu Department of Electrical and Computer Engineering University of Miami Coral Gables, FL 33124, USA
More informationAN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS
AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS H.S Behera Department of Computer Science and Engineering, Veer Surendra Sai University
More informationMutations for Permutations
Mutations for Permutations Insert mutation: Pick two allele values at random Move the second to follow the first, shifting the rest along to accommodate Note: this preserves most of the order and adjacency
More informationClustering Analysis based on Data Mining Applications Xuedong Fan
Applied Mechanics and Materials Online: 203-02-3 ISSN: 662-7482, Vols. 303-306, pp 026-029 doi:0.4028/www.scientific.net/amm.303-306.026 203 Trans Tech Publications, Switzerland Clustering Analysis based
More informationAero-engine PID parameters Optimization based on Adaptive Genetic Algorithm. Yinling Wang, Huacong Li
International Conference on Applied Science and Engineering Innovation (ASEI 215) Aero-engine PID parameters Optimization based on Adaptive Genetic Algorithm Yinling Wang, Huacong Li School of Power and
More informationMeta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic Algorithm and Particle Swarm Optimization
2017 2 nd International Electrical Engineering Conference (IEEC 2017) May. 19 th -20 th, 2017 at IEP Centre, Karachi, Pakistan Meta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic
More informationAutomatic Generation of Test Case based on GATS Algorithm *
Automatic Generation of Test Case based on GATS Algorithm * Xiajiong Shen and Qian Wang Institute of Data and Knowledge Engineering Henan University Kaifeng, Henan Province 475001, China shenxj@henu.edu.cn
More informationIndividualized Error Estimation for Classification and Regression Models
Individualized Error Estimation for Classification and Regression Models Krisztian Buza, Alexandros Nanopoulos, Lars Schmidt-Thieme Abstract Estimating the error of classification and regression models
More informationA Genetic Algorithm-Based Approach for Energy- Efficient Clustering of Wireless Sensor Networks
A Genetic Algorithm-Based Approach for Energy- Efficient Clustering of Wireless Sensor Networks A. Zahmatkesh and M. H. Yaghmaee Abstract In this paper, we propose a Genetic Algorithm (GA) to optimize
More informationAn Empirical Analysis of Communities in Real-World Networks
An Empirical Analysis of Communities in Real-World Networks Chuan Sheng Foo Computer Science Department Stanford University csfoo@cs.stanford.edu ABSTRACT Little work has been done on the characterization
More informationDISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH
International Journal of Information Technology and Knowledge Management January-June 2011, Volume 4, No. 1, pp. 27-32 DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY)
More informationGENETIC ALGORITHM METHOD FOR COMPUTER AIDED QUALITY CONTROL
3 rd Research/Expert Conference with International Participations QUALITY 2003, Zenica, B&H, 13 and 14 November, 2003 GENETIC ALGORITHM METHOD FOR COMPUTER AIDED QUALITY CONTROL Miha Kovacic, Miran Brezocnik
More informationA Genetic Algorithm-Based Approach for Building Accurate Decision Trees
A Genetic Algorithm-Based Approach for Building Accurate Decision Trees by Z. Fu, Fannie Mae Bruce Golden, University of Maryland S. Lele,, University of Maryland S. Raghavan,, University of Maryland Edward
More informationRedundancy Based Feature Selection for Microarray Data
Redundancy Based Feature Selection for Microarray Data Lei Yu Department of Computer Science & Engineering Arizona State University Tempe, AZ 85287-8809 leiyu@asu.edu Huan Liu Department of Computer Science
More informationAutomatic Selection of GCC Optimization Options Using A Gene Weighted Genetic Algorithm
Automatic Selection of GCC Optimization Options Using A Gene Weighted Genetic Algorithm San-Chih Lin, Chi-Kuang Chang, Nai-Wei Lin National Chung Cheng University Chiayi, Taiwan 621, R.O.C. {lsch94,changck,naiwei}@cs.ccu.edu.tw
More informationStudy on the Application Analysis and Future Development of Data Mining Technology
Study on the Application Analysis and Future Development of Data Mining Technology Ge ZHU 1, Feng LIN 2,* 1 Department of Information Science and Technology, Heilongjiang University, Harbin 150080, China
More informationStudy on GA-based matching method of railway vehicle wheels
Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(4):536-542 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Study on GA-based matching method of railway vehicle
More information