Evolving SQL Queries for Data Mining

Size: px
Start display at page:

Download "Evolving SQL Queries for Data Mining"

Transcription

1 Evolving SQL Queries for Data Mining Majid Salim and Xin Yao School of Computer Science, The University of Birmingham Edgbaston, Birmingham B15 2TT, UK Abstract. This paper presents a methodology for applying the principles of evolutionary computation to knowledge discovery in databases by evolving SQL queries that describe datasets. In our system, the fittest queries are rewarded by having their attributes being given a higher probability of surviving in subsequent queries. The advantages of using SQL queries include their readability for non-experts and ease of integration with existing databases. The evolutionary algorithm (EA) used in our system is very different from existing EAs, but seems to be effective and efficient according to the experiments to date with three different testing data sets. 1 Introduction Data mining studies the identification and extraction of useful knowledge from large amounts of data [5]. There are a number of different fields of inquiry within data mining, of which classification is particularly popular. Machine learning algorithms that can learn to classify datum correctly can be applied to a wide variety of problem domains, including credit card fraud detection and medical diagnostics [1,2,3]. An important aspect of such algorithms is ensuring that they are easy to comprehend, to facilitate the transfer of machine discovered knowledge to people easily [4]. This paper will present a framework for discovering classification knowledge hidden in a database through evolutionary computation techniques, as applied to SQL queries. The task is related to but different from the conventional classification problem. Instead of trying to learn a classifier for predicting an unseen example, we are most interested in discovering the underlying knowledge and concept that best describes a given set of data from a large database. SQL is a standardised data manipulation language that is widely supported by database vendors. Constructing a data mining framework using SQL is therefore very useful, as it would inherit SQL s portability and readability. Ryu and Eick [7] proposed a genetic programming (GP) based approach to deriving queries from examples. However, there are two major differences between the work presented here and theirs. First, the query languages used are different and, as a result, the chromosome representations are different. Our use of SQL has made the whole system much simpler and more portable. Second, the evolutionary algorithms used are different. While Ryu and Eick [7] used GP, we H. Yin et al. (Eds.): IDEAL 2002, LNCS 2412, pp , c Springer-Verlag Berlin Heidelberg 2002

2 Evolving SQL Queries for Data Mining 63 have developed a much simpler algorithm which does not use any conventional crossover and mutation operators. Instead, the idea of self-adaptation at the gene level is exploited. Our initial experimental studies have shown that such a simple scheme is very easy to implement, yet very effective and efficient. The rest of this paper is structured as follows. Section 2 describes the architecture of the proposed framework, justifying design decisions made and explaining the benefits and drawbacks that were perceived in the process. Section 3 presents initial results obtained with the framework, and Section 4 concludes the paper with a brief discussion of future work that is planned. 2 Evolving SQL Queries It was necessary to find a way of representing SQL queries genotypically, to allow for the application of evolutionary search operators. Another issue was the design of a fitness function to apply evolutionary pressure to the queries, to guide them towards the correct classification rules. Genotypes were required to encode the list of conditional constraints that specify the criterion by which records should be selected. Each conditional constraint in SQL follows the structure [attribute name] [logical operator] [value]. This sequence was chosen as the basic unit of information, or gene, from which genotypes would be constructed. Genotypic representations varied randomly in length. 2.1 Evolutionary Search The algorithm that was implemented is described in this section. 100 genotypes were constructed by randomly selecting attribute names, logical operators and values. Each attribute in the dataset began with a 0.5 probability of being included in any given genotype. Genotypes were then translated into SQL by initialising a String with the value SELECT * FROM [tablename] WHERE, and then appending each gene in the genotype to the end of the String. For example, a genotype such as this: (LEGS = 4) (PREDATOR = TRUE) (FEATHERS = FALSE) (VENOMOUS = FALSE) would be translated into the following SQL query, through the random addition of AND and OR conditionals: SELECT * FROM Animals WHERE LEGS = 4 AN D PREDATOR = true AND FEATHERS = false OR VENOMOUS = false Such SQL queries, once constructed, were sent to the database, and the results analysed. Each genotype was assigned a fitness value according to the extent to which its results corresponded with a target result set T. The fitness function used was

3 64 M. Salim and X. Yao fitness = falsepositives - (2 * falsenegatives), where 100 was an arbitrarily chosen constant. This fitness function was adapted from a paper by Ryu and Eick [7], dealing with deriving queries from object oriented databases. falsepositives is the number of records that were incorrectly identified as belonging to T, and falsenegatives is the number of records that should have been included T, but were not. The fitness function punishes false negatives more than it punishes false positives. If a query returns no false negatives, but several false positives, it can be seen to be correctly identifying the target result set, but generalising too much, whereas a query that returns false negatives is simply incorrect. By punishing false negatives more, it was hoped to apply evolutionary pressure that would favour queries that better classified the training data. After assigning fitness values for the 100 queries, the best and worst three were selected. If a perfect classifier was found (with fitness of 100) the evolution would terminate, otherwise the attributes would have their probabilities re-weighted. Every attribute that appeared in the top three fittest genotypes had its selection probability incremented by 1%. Every attribute in the worst three genotypes had its probability decremented by 1%. The old genotypes were then discarded, and a new set of 100 genotypes were randomly created using the self-adapted probabilities. Over a period of generations, attributes that contributed to higher fitness values came to dominate in the genotype set, whereas attributes that contributed little to a genotype featured less and less. 2.2 Discussions Our algorithm departs from the metaphor commonly used in evolutionary algorithms; however it does offer a mechanism through which the genotypes are iteratively converging on the sector of the search space that offers the greatest classification utility. Although genetic information of parents are not inherited directly by offspring, the genetic information in the whole population is inherited by the next population. Such inheritance is biased toward more useful genetic materials probabilistically. Hence, more useful genetic materials will occur more frequently in a population. It is hoped that classification rules may be discovered as a consequence of this. 3 Experimental Studies Several experiments have been carried out to evaluate the effectiveness and efficiency of the proposed framework. All datasets were downloaded from the UCI Machine Learning Repository 1. Each dataset was tested with 20 independent runs. If after 100 generations a perfect classifier was not found, the best classifier found to date was returned. The results were averaged over the 20 runs, and are presented below. 1 mlearn/mlrepository.html

4 Evolving SQL Queries for Data Mining The Zoo Dataset The Zoo dataset contains data items that describe animals. In total 14 attributes are provided, of which 13 are boolean and one has a predefined integer range. The animals are classified into 7 different types. Table 1 describes the results from the Zoo dataset. ANG refers to the average number of generations that it took for our algorithm to find a perfect classifier. Table 1. Results for the Zoo dataset, showing performance of the evolved classifying queries for each animal type. The results were averaged over 20 runs. Type False Positives False Negatives ANG Accuracy % % n/a 83.3% % % % n/a 83.3% It can be seen that our algorithm performed well on most of the classification tasks. The two instances in which it failed to find perfect classifiers are the most difficult tasks within the dataset, as both tasks involve a very small set of animals. In both cases, however, the best queries did not include false negatives. 3.2 Monk s Problems The Monks Problem dataset involves data items with six attributes, all of which are predefined integers between 1 and 4. The first Monk s problem is the identification of data patterns where (B=C) or (E=1). The second problem is the identification of all data patterns that feature exactly two of (B = 1, C = 1, D =1, E = 1, F = 1 or G = 1). The third Monk s problem is the identification of data patterns where (F = 3 and E = 1) or ( F!= 4 and C!= 3), and features 5% noise added to the training set. The results averaged over 20 runs are summarised in Table 2. Our algorithm performed perfectly on the first problem, and very well on the third, but performed poorly on the second problem. Part of the reason lies in SQL s inherent difficulty in expressing the desired conditions. The second Monks Problem requires a solution that compares relative attribute values, whereas SQL is usually used to select records according to a set of disjunctive attribute constraints.

5 66 M. Salim and X. Yao Table 2. Results for Monks Problem datasets, showing performance of the best queries for each problem. ANG refers to the average number of generations that it took for our algorithm to find a perfect classifier. Type False Positives False Negatives ANG Accuracy Problem % Problem n/a 16.9% Problem n/a 94.7% 3.3 Credit Card Approval The credit card approval dataset contains anonymised information on credit card application approvals and rejections. The dataset contains a variety of attribute types, with some attributes having predefined values and others having continuous values. The dataset also features 5% noise. Our algorithm succeeded in correctly identifying, on average, 82.9% of the rejections. However, this relative success is countered by the fact that this classifier also included a large number of false positives on average, accounting for nearly 20% of the dataset size. 3.4 Discussion of the Results The results for the Zoo and Monk s Problem datasets are encouraging. Our algorithm demonstrates the poorest performance on the second Monk s problem, which may be because the problem is not structurally conducive to an SQL based classification rule, although future refinements of our algorithm will hopefully improve upon these results. The results with the credit card approval dataset also show room for improvement. This may be due to its inclusion of continuous variables. Our algorithm performs poorly with continuous valued attributes because, although it can identify attributes that are valuable in making a classification, it cannot make the same distinction for logical operators or values. It is necessary for the algorithm to find the variable values as well as attribute values that are necessary for good classification. It is proposed that logical operators will be given initial selection probabilities as well, which will decrement or increment according to the effect they play upon the fitness value of their genotype. 4 Conclusions By using evolutionary computation techniques to evolve SQL queries it is possible to create a data mining framework that both produces easily readable results, and also can be applied to any SQL compliant database system. The problem considered here is somewhat different from the conventional classification problem. The key question we are addressing here is: Given a subset of data in a

6 Evolving SQL Queries for Data Mining 67 large database, how can we gain a better understanding of them? Our solution is to evolve human comprehensible SQL queries that describe the data. The algorithm proposed in this paper differs from many traditional evolutionary algorithms, in that it does not use the metaphor of selection, whereby the fittest individuals have their traits inherited by the new generation of individuals, through operations such as crossover or mutation. Rather, it rewards the attributes that make individuals successful, and then iterates the initial step of creation. In other words, rather than survival of the fittest, this work operates upon the principle of survival of the qualities that make the fittest fit. Although many genetic algorithms feature mutation, it is usually scaled down so that it does not destroy any useful structures that evolution may have already constructed. This approach differs in that it divorces the importance of the attribute from the values that the attribute happens to have in a given gene. As such it effects an evolutionary liquidity that in turn results in an appealingly diverse population, more likely to distribute itself over an entire search space than it is to converge on some local optima. Although our preliminary experimental results are promising, they also offer room for improvement. It is hoped that future improvements with regard to dealing with continuous variables will improve performance. References 1. X. Yao and Y. Liu, A new evolutionary system for evolving artificial neural networks, IEEE Transactions on Neural Networks, 8(3): , May X. Yao and Y. Liu, Making use of population information in evolutionary artificial neural networks, IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics, 28(3): , June Y. Liu, X. Yao and T. Higuchi, Evolutionary ensembles with negative correlation learning, IEEE Transactions on Evolutionary Computation, 4(4): , November J. Bobbin and X. Yao, Evolving rules for nonlinear control, In New Frontier in Computational Intelligence and its Applications, M. Mohammadian (ed.), IOS Press, Amsterdam, 2000, pp A. A. Freitas, A genetic programming framework for two data mining tasks: classification and knowledge discovery, Genetic Programming 1997: Proc. 2nd Annual Conference, pp , Stanford University, A. A. Freitas, A survey of evolutionary algorithms for data mining and knowledge discovery, In: A. Ghosh, S. Tsutsui (eds.), Advances in Evolutionary Computation, Springer-Verlag, T. W. Ryu, C. F. Eick, Deriving queries from results using genetic programming, Proc. 2nd International Conference, Knowledge Discovery and Data Mining, pp , AAAI Press, 1996

A Parallel Evolutionary Algorithm for Discovery of Decision Rules

A Parallel Evolutionary Algorithm for Discovery of Decision Rules A Parallel Evolutionary Algorithm for Discovery of Decision Rules Wojciech Kwedlo Faculty of Computer Science Technical University of Bia lystok Wiejska 45a, 15-351 Bia lystok, Poland wkwedlo@ii.pb.bialystok.pl

More information

The k-means Algorithm and Genetic Algorithm

The k-means Algorithm and Genetic Algorithm The k-means Algorithm and Genetic Algorithm k-means algorithm Genetic algorithm Rough set approach Fuzzy set approaches Chapter 8 2 The K-Means Algorithm The K-Means algorithm is a simple yet effective

More information

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Bhakti V. Gavali 1, Prof. Vivekanand Reddy 2 1 Department of Computer Science and Engineering, Visvesvaraya Technological

More information

Neural Network Weight Selection Using Genetic Algorithms

Neural Network Weight Selection Using Genetic Algorithms Neural Network Weight Selection Using Genetic Algorithms David Montana presented by: Carl Fink, Hongyi Chen, Jack Cheng, Xinglong Li, Bruce Lin, Chongjie Zhang April 12, 2005 1 Neural Networks Neural networks

More information

Genetic Programming. and its use for learning Concepts in Description Logics

Genetic Programming. and its use for learning Concepts in Description Logics Concepts in Description Artificial Intelligence Institute Computer Science Department Dresden Technical University May 29, 2006 Outline Outline: brief introduction to explanation of the workings of a algorithm

More information

Using a genetic algorithm for editing k-nearest neighbor classifiers

Using a genetic algorithm for editing k-nearest neighbor classifiers Using a genetic algorithm for editing k-nearest neighbor classifiers R. Gil-Pita 1 and X. Yao 23 1 Teoría de la Señal y Comunicaciones, Universidad de Alcalá, Madrid (SPAIN) 2 Computer Sciences Department,

More information

The Genetic Algorithm for finding the maxima of single-variable functions

The Genetic Algorithm for finding the maxima of single-variable functions Research Inventy: International Journal Of Engineering And Science Vol.4, Issue 3(March 2014), PP 46-54 Issn (e): 2278-4721, Issn (p):2319-6483, www.researchinventy.com The Genetic Algorithm for finding

More information

Classification of Concept-Drifting Data Streams using Optimized Genetic Algorithm

Classification of Concept-Drifting Data Streams using Optimized Genetic Algorithm Classification of Concept-Drifting Data Streams using Optimized Genetic Algorithm E. Padmalatha Asst.prof CBIT C.R.K. Reddy, PhD Professor CBIT B. Padmaja Rani, PhD Professor JNTUH ABSTRACT Data Stream

More information

Genetic Algorithms. Kang Zheng Karl Schober

Genetic Algorithms. Kang Zheng Karl Schober Genetic Algorithms Kang Zheng Karl Schober Genetic algorithm What is Genetic algorithm? A genetic algorithm (or GA) is a search technique used in computing to find true or approximate solutions to optimization

More information

Constructing X-of-N Attributes with a Genetic Algorithm

Constructing X-of-N Attributes with a Genetic Algorithm Constructing X-of-N Attributes with a Genetic Algorithm Otavio Larsen 1 Alex Freitas 2 Julio C. Nievola 1 1 Postgraduate Program in Applied Computer Science 2 Computing Laboratory Pontificia Universidade

More information

Approach Using Genetic Algorithm for Intrusion Detection System

Approach Using Genetic Algorithm for Intrusion Detection System Approach Using Genetic Algorithm for Intrusion Detection System 544 Abhijeet Karve Government College of Engineering, Aurangabad, Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, Maharashtra-

More information

CS5401 FS2015 Exam 1 Key

CS5401 FS2015 Exam 1 Key CS5401 FS2015 Exam 1 Key This is a closed-book, closed-notes exam. The only items you are allowed to use are writing implements. Mark each sheet of paper you use with your name and the string cs5401fs2015

More information

HYBRID GENETIC ALGORITHM WITH GREAT DELUGE TO SOLVE CONSTRAINED OPTIMIZATION PROBLEMS

HYBRID GENETIC ALGORITHM WITH GREAT DELUGE TO SOLVE CONSTRAINED OPTIMIZATION PROBLEMS HYBRID GENETIC ALGORITHM WITH GREAT DELUGE TO SOLVE CONSTRAINED OPTIMIZATION PROBLEMS NABEEL AL-MILLI Financial and Business Administration and Computer Science Department Zarqa University College Al-Balqa'

More information

A Web Page Recommendation system using GA based biclustering of web usage data

A Web Page Recommendation system using GA based biclustering of web usage data A Web Page Recommendation system using GA based biclustering of web usage data Raval Pratiksha M. 1, Mehul Barot 2 1 Computer Engineering, LDRP-ITR,Gandhinagar,cepratiksha.2011@gmail.com 2 Computer Engineering,

More information

Evolutionary Art with Cartesian Genetic Programming

Evolutionary Art with Cartesian Genetic Programming Evolutionary Art with Cartesian Genetic Programming Laurence Ashmore 1, and Julian Francis Miller 2 1 Department of Informatics, University of Sussex, Falmer, BN1 9QH, UK emoai@hotmail.com http://www.gaga.demon.co.uk/

More information

Time Complexity Analysis of the Genetic Algorithm Clustering Method

Time Complexity Analysis of the Genetic Algorithm Clustering Method Time Complexity Analysis of the Genetic Algorithm Clustering Method Z. M. NOPIAH, M. I. KHAIRIR, S. ABDULLAH, M. N. BAHARIN, and A. ARIFIN Department of Mechanical and Materials Engineering Universiti

More information

CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM

CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM 20 CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM 2.1 CLASSIFICATION OF CONVENTIONAL TECHNIQUES Classical optimization methods can be classified into two distinct groups:

More information

Evolutionary form design: the application of genetic algorithmic techniques to computer-aided product design

Evolutionary form design: the application of genetic algorithmic techniques to computer-aided product design Loughborough University Institutional Repository Evolutionary form design: the application of genetic algorithmic techniques to computer-aided product design This item was submitted to Loughborough University's

More information

Hybridization EVOLUTIONARY COMPUTING. Reasons for Hybridization - 1. Naming. Reasons for Hybridization - 3. Reasons for Hybridization - 2

Hybridization EVOLUTIONARY COMPUTING. Reasons for Hybridization - 1. Naming. Reasons for Hybridization - 3. Reasons for Hybridization - 2 Hybridization EVOLUTIONARY COMPUTING Hybrid Evolutionary Algorithms hybridization of an EA with local search techniques (commonly called memetic algorithms) EA+LS=MA constructive heuristics exact methods

More information

Introduction to Evolutionary Computation

Introduction to Evolutionary Computation Introduction to Evolutionary Computation The Brought to you by (insert your name) The EvoNet Training Committee Some of the Slides for this lecture were taken from the Found at: www.cs.uh.edu/~ceick/ai/ec.ppt

More information

Escaping Local Optima: Genetic Algorithm

Escaping Local Optima: Genetic Algorithm Artificial Intelligence Escaping Local Optima: Genetic Algorithm Dae-Won Kim School of Computer Science & Engineering Chung-Ang University We re trying to escape local optima To achieve this, we have learned

More information

A Classifier with the Function-based Decision Tree

A Classifier with the Function-based Decision Tree A Classifier with the Function-based Decision Tree Been-Chian Chien and Jung-Yi Lin Institute of Information Engineering I-Shou University, Kaohsiung 84008, Taiwan, R.O.C E-mail: cbc@isu.edu.tw, m893310m@isu.edu.tw

More information

Genetic Programming for Data Classification: Partitioning the Search Space

Genetic Programming for Data Classification: Partitioning the Search Space Genetic Programming for Data Classification: Partitioning the Search Space Jeroen Eggermont jeggermo@liacs.nl Joost N. Kok joost@liacs.nl Walter A. Kosters kosters@liacs.nl ABSTRACT When Genetic Programming

More information

Heuristic Optimisation

Heuristic Optimisation Heuristic Optimisation Part 10: Genetic Algorithm Basics Sándor Zoltán Németh http://web.mat.bham.ac.uk/s.z.nemeth s.nemeth@bham.ac.uk University of Birmingham S Z Németh (s.nemeth@bham.ac.uk) Heuristic

More information

A Framework for adaptive focused web crawling and information retrieval using genetic algorithms

A Framework for adaptive focused web crawling and information retrieval using genetic algorithms A Framework for adaptive focused web crawling and information retrieval using genetic algorithms Kevin Sebastian Dept of Computer Science, BITS Pilani kevseb1993@gmail.com 1 Abstract The web is undeniably

More information

Solving Sudoku Puzzles with Node Based Coincidence Algorithm

Solving Sudoku Puzzles with Node Based Coincidence Algorithm Solving Sudoku Puzzles with Node Based Coincidence Algorithm Kiatsopon Waiyapara Department of Compute Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, Thailand kiatsopon.w@gmail.com

More information

4/22/2014. Genetic Algorithms. Diwakar Yagyasen Department of Computer Science BBDNITM. Introduction

4/22/2014. Genetic Algorithms. Diwakar Yagyasen Department of Computer Science BBDNITM. Introduction 4/22/24 s Diwakar Yagyasen Department of Computer Science BBDNITM Visit dylycknow.weebly.com for detail 2 The basic purpose of a genetic algorithm () is to mimic Nature s evolutionary approach The algorithm

More information

Attribute Selection with a Multiobjective Genetic Algorithm

Attribute Selection with a Multiobjective Genetic Algorithm Attribute Selection with a Multiobjective Genetic Algorithm Gisele L. Pappa, Alex A. Freitas, Celso A.A. Kaestner Pontifícia Universidade Catolica do Parana (PUCPR), Postgraduated Program in Applied Computer

More information

Offspring Generation Method using Delaunay Triangulation for Real-Coded Genetic Algorithms

Offspring Generation Method using Delaunay Triangulation for Real-Coded Genetic Algorithms Offspring Generation Method using Delaunay Triangulation for Real-Coded Genetic Algorithms Hisashi Shimosaka 1, Tomoyuki Hiroyasu 2, and Mitsunori Miki 2 1 Graduate School of Engineering, Doshisha University,

More information

Evolutionary Algorithms. CS Evolutionary Algorithms 1

Evolutionary Algorithms. CS Evolutionary Algorithms 1 Evolutionary Algorithms CS 478 - Evolutionary Algorithms 1 Evolutionary Computation/Algorithms Genetic Algorithms l Simulate natural evolution of structures via selection and reproduction, based on performance

More information

Introduction to Genetic Algorithms

Introduction to Genetic Algorithms Advanced Topics in Image Analysis and Machine Learning Introduction to Genetic Algorithms Week 3 Faculty of Information Science and Engineering Ritsumeikan University Today s class outline Genetic Algorithms

More information

Information Fusion Dr. B. K. Panigrahi

Information Fusion Dr. B. K. Panigrahi Information Fusion By Dr. B. K. Panigrahi Asst. Professor Department of Electrical Engineering IIT Delhi, New Delhi-110016 01/12/2007 1 Introduction Classification OUTLINE K-fold cross Validation Feature

More information

A Genetic Algorithm Approach for Clustering

A Genetic Algorithm Approach for Clustering www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3 Issue 6 June, 2014 Page No. 6442-6447 A Genetic Algorithm Approach for Clustering Mamta Mor 1, Poonam Gupta

More information

Using Genetic Algorithms to optimize ACS-TSP

Using Genetic Algorithms to optimize ACS-TSP Using Genetic Algorithms to optimize ACS-TSP Marcin L. Pilat and Tony White School of Computer Science, Carleton University, 1125 Colonel By Drive, Ottawa, ON, K1S 5B6, Canada {mpilat,arpwhite}@scs.carleton.ca

More information

Inducing Parameters of a Decision Tree for Expert System Shell McESE by Genetic Algorithm

Inducing Parameters of a Decision Tree for Expert System Shell McESE by Genetic Algorithm Inducing Parameters of a Decision Tree for Expert System Shell McESE by Genetic Algorithm I. Bruha and F. Franek Dept of Computing & Software, McMaster University Hamilton, Ont., Canada, L8S4K1 Email:

More information

Hardware Neuronale Netzwerke - Lernen durch künstliche Evolution (?)

Hardware Neuronale Netzwerke - Lernen durch künstliche Evolution (?) SKIP - May 2004 Hardware Neuronale Netzwerke - Lernen durch künstliche Evolution (?) S. G. Hohmann, Electronic Vision(s), Kirchhoff Institut für Physik, Universität Heidelberg Hardware Neuronale Netzwerke

More information

Evolutionary Decision Trees and Software Metrics for Module Defects Identification

Evolutionary Decision Trees and Software Metrics for Module Defects Identification World Academy of Science, Engineering and Technology 38 008 Evolutionary Decision Trees and Software Metrics for Module Defects Identification Monica Chiş Abstract Software metric is a measure of some

More information

A Memetic Genetic Program for Knowledge Discovery

A Memetic Genetic Program for Knowledge Discovery A Memetic Genetic Program for Knowledge Discovery by Gert Nel Submitted in partial fulfilment of the requirements for the degree Master of Science in the Faculty of Engineering, Built Environment and Information

More information

Clustering Analysis of Simple K Means Algorithm for Various Data Sets in Function Optimization Problem (Fop) of Evolutionary Programming

Clustering Analysis of Simple K Means Algorithm for Various Data Sets in Function Optimization Problem (Fop) of Evolutionary Programming Clustering Analysis of Simple K Means Algorithm for Various Data Sets in Function Optimization Problem (Fop) of Evolutionary Programming R. Karthick 1, Dr. Malathi.A 2 Research Scholar, Department of Computer

More information

Basic Data Mining Technique

Basic Data Mining Technique Basic Data Mining Technique What is classification? What is prediction? Supervised and Unsupervised Learning Decision trees Association rule K-nearest neighbor classifier Case-based reasoning Genetic algorithm

More information

Optimization of Association Rule Mining through Genetic Algorithm

Optimization of Association Rule Mining through Genetic Algorithm Optimization of Association Rule Mining through Genetic Algorithm RUPALI HALDULAKAR School of Information Technology, Rajiv Gandhi Proudyogiki Vishwavidyalaya Bhopal, Madhya Pradesh India Prof. JITENDRA

More information

A Memetic Heuristic for the Co-clustering Problem

A Memetic Heuristic for the Co-clustering Problem A Memetic Heuristic for the Co-clustering Problem Mohammad Khoshneshin 1, Mahtab Ghazizadeh 2, W. Nick Street 1, and Jeffrey W. Ohlmann 1 1 The University of Iowa, Iowa City IA 52242, USA {mohammad-khoshneshin,nick-street,jeffrey-ohlmann}@uiowa.edu

More information

Suppose you have a problem You don t know how to solve it What can you do? Can you use a computer to somehow find a solution for you?

Suppose you have a problem You don t know how to solve it What can you do? Can you use a computer to somehow find a solution for you? Gurjit Randhawa Suppose you have a problem You don t know how to solve it What can you do? Can you use a computer to somehow find a solution for you? This would be nice! Can it be done? A blind generate

More information

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES 6.1 INTRODUCTION The exploration of applications of ANN for image classification has yielded satisfactory results. But, the scope for improving

More information

Using Genetic Algorithms in Integer Programming for Decision Support

Using Genetic Algorithms in Integer Programming for Decision Support Doi:10.5901/ajis.2014.v3n6p11 Abstract Using Genetic Algorithms in Integer Programming for Decision Support Dr. Youcef Souar Omar Mouffok Taher Moulay University Saida, Algeria Email:Syoucef12@yahoo.fr

More information

International Journal of Information Technology and Knowledge Management (ISSN: ) July-December 2012, Volume 5, No. 2, pp.

International Journal of Information Technology and Knowledge Management (ISSN: ) July-December 2012, Volume 5, No. 2, pp. Empirical Evaluation of Metaheuristic Approaches for Symbolic Execution based Automated Test Generation Surender Singh [1], Parvin Kumar [2] [1] CMJ University, Shillong, Meghalya, (INDIA) [2] Meerut Institute

More information

Applied Cloning Techniques for a Genetic Algorithm Used in Evolvable Hardware Design

Applied Cloning Techniques for a Genetic Algorithm Used in Evolvable Hardware Design Applied Cloning Techniques for a Genetic Algorithm Used in Evolvable Hardware Design Viet C. Trinh vtrinh@isl.ucf.edu Gregory A. Holifield greg.holifield@us.army.mil School of Electrical Engineering and

More information

Introduction to Data Mining and Data Analytics

Introduction to Data Mining and Data Analytics 1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns

More information

A Genetic Algorithm for Graph Matching using Graph Node Characteristics 1 2

A Genetic Algorithm for Graph Matching using Graph Node Characteristics 1 2 Chapter 5 A Genetic Algorithm for Graph Matching using Graph Node Characteristics 1 2 Graph Matching has attracted the exploration of applying new computing paradigms because of the large number of applications

More information

An Application of Genetic Algorithm for Auto-body Panel Die-design Case Library Based on Grid

An Application of Genetic Algorithm for Auto-body Panel Die-design Case Library Based on Grid An Application of Genetic Algorithm for Auto-body Panel Die-design Case Library Based on Grid Demin Wang 2, Hong Zhu 1, and Xin Liu 2 1 College of Computer Science and Technology, Jilin University, Changchun

More information

A Data Mining technique for Data Clustering based on Genetic Algorithm

A Data Mining technique for Data Clustering based on Genetic Algorithm Proceedings of the 6th WSEAS Int. Conf. on EVOLUTIONAR COMPUTING, Lisbon, Portugal, June 16-18, 2005 (pp269-274) A Data Mining technique for Data Clustering based on Genetic Algorithm J. Aguilar CEMISID.

More information

Genetic Image Network for Image Classification

Genetic Image Network for Image Classification Genetic Image Network for Image Classification Shinichi Shirakawa, Shiro Nakayama, and Tomoharu Nagao Graduate School of Environment and Information Sciences, Yokohama National University, 79-7, Tokiwadai,

More information

Combinational Circuit Design Using Genetic Algorithms

Combinational Circuit Design Using Genetic Algorithms Combinational Circuit Design Using Genetic Algorithms Nithyananthan K Bannari Amman institute of technology M.E.Embedded systems, Anna University E-mail:nithyananthan.babu@gmail.com Abstract - In the paper

More information

Using Genetic Algorithms to Solve the Box Stacking Problem

Using Genetic Algorithms to Solve the Box Stacking Problem Using Genetic Algorithms to Solve the Box Stacking Problem Jenniffer Estrada, Kris Lee, Ryan Edgar October 7th, 2010 Abstract The box stacking or strip stacking problem is exceedingly difficult to solve

More information

Deduplication of Hospital Data using Genetic Programming

Deduplication of Hospital Data using Genetic Programming Deduplication of Hospital Data using Genetic Programming P. Gujar Department of computer engineering Thakur college of engineering and Technology, Kandiwali, Maharashtra, India Priyanka Desai Department

More information

Unsupervised Feature Selection Using Multi-Objective Genetic Algorithms for Handwritten Word Recognition

Unsupervised Feature Selection Using Multi-Objective Genetic Algorithms for Handwritten Word Recognition Unsupervised Feature Selection Using Multi-Objective Genetic Algorithms for Handwritten Word Recognition M. Morita,2, R. Sabourin 3, F. Bortolozzi 3 and C. Y. Suen 2 École de Technologie Supérieure, Montreal,

More information

Job Shop Scheduling Problem (JSSP) Genetic Algorithms Critical Block and DG distance Neighbourhood Search

Job Shop Scheduling Problem (JSSP) Genetic Algorithms Critical Block and DG distance Neighbourhood Search A JOB-SHOP SCHEDULING PROBLEM (JSSP) USING GENETIC ALGORITHM (GA) Mahanim Omar, Adam Baharum, Yahya Abu Hasan School of Mathematical Sciences, Universiti Sains Malaysia 11800 Penang, Malaysia Tel: (+)

More information

Lecture 8: Genetic Algorithms

Lecture 8: Genetic Algorithms Lecture 8: Genetic Algorithms Cognitive Systems - Machine Learning Part II: Special Aspects of Concept Learning Genetic Algorithms, Genetic Programming, Models of Evolution last change December 1, 2010

More information

Keywords: clustering algorithms, unsupervised learning, cluster validity

Keywords: clustering algorithms, unsupervised learning, cluster validity Volume 6, Issue 1, January 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Clustering Based

More information

A Combined Meta-Heuristic with Hyper-Heuristic Approach to Single Machine Production Scheduling Problem

A Combined Meta-Heuristic with Hyper-Heuristic Approach to Single Machine Production Scheduling Problem A Combined Meta-Heuristic with Hyper-Heuristic Approach to Single Machine Production Scheduling Problem C. E. Nugraheni, L. Abednego Abstract This paper is concerned with minimization of mean tardiness

More information

Santa Fe Trail Problem Solution Using Grammatical Evolution

Santa Fe Trail Problem Solution Using Grammatical Evolution 2012 International Conference on Industrial and Intelligent Information (ICIII 2012) IPCSIT vol.31 (2012) (2012) IACSIT Press, Singapore Santa Fe Trail Problem Solution Using Grammatical Evolution Hideyuki

More information

Mutations for Permutations

Mutations for Permutations Mutations for Permutations Insert mutation: Pick two allele values at random Move the second to follow the first, shifting the rest along to accommodate Note: this preserves most of the order and adjacency

More information

An Introduction to Evolutionary Algorithms

An Introduction to Evolutionary Algorithms An Introduction to Evolutionary Algorithms Karthik Sindhya, PhD Postdoctoral Researcher Industrial Optimization Group Department of Mathematical Information Technology Karthik.sindhya@jyu.fi http://users.jyu.fi/~kasindhy/

More information

CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM

CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM 1 CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM John R. Koza Computer Science Department Stanford University Stanford, California 94305 USA E-MAIL: Koza@Sunburn.Stanford.Edu

More information

Adaptive Information Filtering: evolutionary computation and n-gram representation 1

Adaptive Information Filtering: evolutionary computation and n-gram representation 1 Adaptive Information Filtering: evolutionary computation and n-gram representation 1 Daniel R. Tauritz a Ida G. Sprinkhuizen-Kuyper b a Leiden Institute of Advanced Computer Science, Universiteit Leiden,

More information

Artificial Intelligence Application (Genetic Algorithm)

Artificial Intelligence Application (Genetic Algorithm) Babylon University College of Information Technology Software Department Artificial Intelligence Application (Genetic Algorithm) By Dr. Asaad Sabah Hadi 2014-2015 EVOLUTIONARY ALGORITHM The main idea about

More information

A Web-Based Evolutionary Algorithm Demonstration using the Traveling Salesman Problem

A Web-Based Evolutionary Algorithm Demonstration using the Traveling Salesman Problem A Web-Based Evolutionary Algorithm Demonstration using the Traveling Salesman Problem Richard E. Mowe Department of Statistics St. Cloud State University mowe@stcloudstate.edu Bryant A. Julstrom Department

More information

A Steady-State Genetic Algorithm for Traveling Salesman Problem with Pickup and Delivery

A Steady-State Genetic Algorithm for Traveling Salesman Problem with Pickup and Delivery A Steady-State Genetic Algorithm for Traveling Salesman Problem with Pickup and Delivery Monika Sharma 1, Deepak Sharma 2 1 Research Scholar Department of Computer Science and Engineering, NNSS SGI Samalkha,

More information

GRANULAR COMPUTING AND EVOLUTIONARY FUZZY MODELLING FOR MECHANICAL PROPERTIES OF ALLOY STEELS. G. Panoutsos and M. Mahfouf

GRANULAR COMPUTING AND EVOLUTIONARY FUZZY MODELLING FOR MECHANICAL PROPERTIES OF ALLOY STEELS. G. Panoutsos and M. Mahfouf GRANULAR COMPUTING AND EVOLUTIONARY FUZZY MODELLING FOR MECHANICAL PROPERTIES OF ALLOY STEELS G. Panoutsos and M. Mahfouf Institute for Microstructural and Mechanical Process Engineering: The University

More information

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,

More information

Metaheuristic Optimization with Evolver, Genocop and OptQuest

Metaheuristic Optimization with Evolver, Genocop and OptQuest Metaheuristic Optimization with Evolver, Genocop and OptQuest MANUEL LAGUNA Graduate School of Business Administration University of Colorado, Boulder, CO 80309-0419 Manuel.Laguna@Colorado.EDU Last revision:

More information

Artificial Neural Network based Curve Prediction

Artificial Neural Network based Curve Prediction Artificial Neural Network based Curve Prediction LECTURE COURSE: AUSGEWÄHLTE OPTIMIERUNGSVERFAHREN FÜR INGENIEURE SUPERVISOR: PROF. CHRISTIAN HAFNER STUDENTS: ANTHONY HSIAO, MICHAEL BOESCH Abstract We

More information

JHPCSN: Volume 4, Number 1, 2012, pp. 1-7

JHPCSN: Volume 4, Number 1, 2012, pp. 1-7 JHPCSN: Volume 4, Number 1, 2012, pp. 1-7 QUERY OPTIMIZATION BY GENETIC ALGORITHM P. K. Butey 1, Shweta Meshram 2 & R. L. Sonolikar 3 1 Kamala Nehru Mahavidhyalay, Nagpur. 2 Prof. Priyadarshini Institute

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

Outline. Motivation. Introduction of GAs. Genetic Algorithm 9/7/2017. Motivation Genetic algorithms An illustrative example Hypothesis space search

Outline. Motivation. Introduction of GAs. Genetic Algorithm 9/7/2017. Motivation Genetic algorithms An illustrative example Hypothesis space search Outline Genetic Algorithm Motivation Genetic algorithms An illustrative example Hypothesis space search Motivation Evolution is known to be a successful, robust method for adaptation within biological

More information

Revision of a Floating-Point Genetic Algorithm GENOCOP V for Nonlinear Programming Problems

Revision of a Floating-Point Genetic Algorithm GENOCOP V for Nonlinear Programming Problems 4 The Open Cybernetics and Systemics Journal, 008,, 4-9 Revision of a Floating-Point Genetic Algorithm GENOCOP V for Nonlinear Programming Problems K. Kato *, M. Sakawa and H. Katagiri Department of Artificial

More information

Application of a Genetic Programming Based Rule Discovery System to Recurring Miscarriage Data

Application of a Genetic Programming Based Rule Discovery System to Recurring Miscarriage Data Application of a Genetic Programming Based Rule Discovery System to Recurring Miscarriage Data Christian Setzkorn 1, Ray C. Paton 1, Leanne Bricker 2, and Roy G. Farquharson 2 1 Department of Computer

More information

A Genetic k-modes Algorithm for Clustering Categorical Data

A Genetic k-modes Algorithm for Clustering Categorical Data A Genetic k-modes Algorithm for Clustering Categorical Data Guojun Gan, Zijiang Yang, and Jianhong Wu Department of Mathematics and Statistics, York University, Toronto, Ontario, Canada M3J 1P3 {gjgan,

More information

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,

More information

Improving interpretability in approximative fuzzy models via multi-objective evolutionary algorithms.

Improving interpretability in approximative fuzzy models via multi-objective evolutionary algorithms. Improving interpretability in approximative fuzzy models via multi-objective evolutionary algorithms. Gómez-Skarmeta, A.F. University of Murcia skarmeta@dif.um.es Jiménez, F. University of Murcia fernan@dif.um.es

More information

Multiobjective Optimization Using Adaptive Pareto Archived Evolution Strategy

Multiobjective Optimization Using Adaptive Pareto Archived Evolution Strategy Multiobjective Optimization Using Adaptive Pareto Archived Evolution Strategy Mihai Oltean Babeş-Bolyai University Department of Computer Science Kogalniceanu 1, Cluj-Napoca, 3400, Romania moltean@cs.ubbcluj.ro

More information

Enhancing K-means Clustering Algorithm with Improved Initial Center

Enhancing K-means Clustering Algorithm with Improved Initial Center Enhancing K-means Clustering Algorithm with Improved Initial Center Madhu Yedla #1, Srinivasa Rao Pathakota #2, T M Srinivasa #3 # Department of Computer Science and Engineering, National Institute of

More information

Monika Maharishi Dayanand University Rohtak

Monika Maharishi Dayanand University Rohtak Performance enhancement for Text Data Mining using k means clustering based genetic optimization (KMGO) Monika Maharishi Dayanand University Rohtak ABSTRACT For discovering hidden patterns and structures

More information

GENETIC ALGORITHM with Hands-On exercise

GENETIC ALGORITHM with Hands-On exercise GENETIC ALGORITHM with Hands-On exercise Adopted From Lecture by Michael Negnevitsky, Electrical Engineering & Computer Science University of Tasmania 1 Objective To understand the processes ie. GAs Basic

More information

Discrete Particle Swarm Optimization With Local Search Strategy for Rule Classification

Discrete Particle Swarm Optimization With Local Search Strategy for Rule Classification Discrete Particle Swarm Optimization With Local Search Strategy for Rule Classification Min Chen and Simone A. Ludwig Department of Computer Science North Dakota State University Fargo, ND, USA min.chen@my.ndsu.edu,

More information

International Journal of Digital Application & Contemporary research Website: (Volume 1, Issue 7, February 2013)

International Journal of Digital Application & Contemporary research Website:   (Volume 1, Issue 7, February 2013) Performance Analysis of GA and PSO over Economic Load Dispatch Problem Sakshi Rajpoot sakshirajpoot1988@gmail.com Dr. Sandeep Bhongade sandeepbhongade@rediffmail.com Abstract Economic Load dispatch problem

More information

A Survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery

A Survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery A Survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery Alex A. Freitas Postgraduate Program in Computer Science, Pontificia Universidade Catolica do Parana Rua Imaculada Conceicao,

More information

Finding Effective Software Security Metrics Using A Genetic Algorithm

Finding Effective Software Security Metrics Using A Genetic Algorithm International Journal of Software Engineering. ISSN 0974-3162 Volume 4, Number 2 (2013), pp. 1-6 International Research Publication House http://www.irphouse.com Finding Effective Software Security Metrics

More information

Multi-relational Decision Tree Induction

Multi-relational Decision Tree Induction Multi-relational Decision Tree Induction Arno J. Knobbe 1,2, Arno Siebes 2, Daniël van der Wallen 1 1 Syllogic B.V., Hoefseweg 1, 3821 AE, Amersfoort, The Netherlands, {a.knobbe, d.van.der.wallen}@syllogic.com

More information

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms

More information

Neural Network Regularization and Ensembling Using Multi-objective Evolutionary Algorithms

Neural Network Regularization and Ensembling Using Multi-objective Evolutionary Algorithms Neural Network Regularization and Ensembling Using Multi-objective Evolutionary Algorithms Yaochu Jin Honda Research Institute Europe Carl-Legien-Str 7 Offenbach, GERMANY Email: yaochujin@honda-ride Tatsuya

More information

Argha Roy* Dept. of CSE Netaji Subhash Engg. College West Bengal, India.

Argha Roy* Dept. of CSE Netaji Subhash Engg. College West Bengal, India. Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Training Artificial

More information

FEATURE GENERATION USING GENETIC PROGRAMMING BASED ON FISHER CRITERION

FEATURE GENERATION USING GENETIC PROGRAMMING BASED ON FISHER CRITERION FEATURE GENERATION USING GENETIC PROGRAMMING BASED ON FISHER CRITERION Hong Guo, Qing Zhang and Asoke K. Nandi Signal Processing and Communications Group, Department of Electrical Engineering and Electronics,

More information

Towards Automatic Recognition of Fonts using Genetic Approach

Towards Automatic Recognition of Fonts using Genetic Approach Towards Automatic Recognition of Fonts using Genetic Approach M. SARFRAZ Department of Information and Computer Science King Fahd University of Petroleum and Minerals KFUPM # 1510, Dhahran 31261, Saudi

More information

Multiple Classifier Fusion using k-nearest Localized Templates

Multiple Classifier Fusion using k-nearest Localized Templates Multiple Classifier Fusion using k-nearest Localized Templates Jun-Ki Min and Sung-Bae Cho Department of Computer Science, Yonsei University Biometrics Engineering Research Center 134 Shinchon-dong, Sudaemoon-ku,

More information

Multi-Objective Pipe Smoothing Genetic Algorithm For Water Distribution Network Design

Multi-Objective Pipe Smoothing Genetic Algorithm For Water Distribution Network Design City University of New York (CUNY) CUNY Academic Works International Conference on Hydroinformatics 8-1-2014 Multi-Objective Pipe Smoothing Genetic Algorithm For Water Distribution Network Design Matthew

More information

ISSN: [Keswani* et al., 7(1): January, 2018] Impact Factor: 4.116

ISSN: [Keswani* et al., 7(1): January, 2018] Impact Factor: 4.116 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AUTOMATIC TEST CASE GENERATION FOR PERFORMANCE ENHANCEMENT OF SOFTWARE THROUGH GENETIC ALGORITHM AND RANDOM TESTING Bright Keswani,

More information

Using CODEQ to Train Feed-forward Neural Networks

Using CODEQ to Train Feed-forward Neural Networks Using CODEQ to Train Feed-forward Neural Networks Mahamed G. H. Omran 1 and Faisal al-adwani 2 1 Department of Computer Science, Gulf University for Science and Technology, Kuwait, Kuwait omran.m@gust.edu.kw

More information

IJMIE Volume 2, Issue 9 ISSN:

IJMIE Volume 2, Issue 9 ISSN: Dimensionality Using Optimization Algorithm for High Dimensional Data Clustering Saranya.S* Dr.Punithavalli.M** Abstract: This paper present an efficient approach to a feature selection problem based on

More information

Meta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic Algorithm and Particle Swarm Optimization

Meta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic Algorithm and Particle Swarm Optimization 2017 2 nd International Electrical Engineering Conference (IEEC 2017) May. 19 th -20 th, 2017 at IEP Centre, Karachi, Pakistan Meta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic

More information