Data Mining using Ant Colony Optimization. Presentation Outline. Introduction. Thanks to: Johannes Singler, Bryan Atkinson

Size: px
Start display at page:

Download "Data Mining using Ant Colony Optimization. Presentation Outline. Introduction. Thanks to: Johannes Singler, Bryan Atkinson"

Transcription

1 Data Mining using Ant Colony Optimization Thanks to: Johannes Singler, Bryan Atkinson Presentation Outline Introduction to Data Mining Rule Induction for Classification AntMiner Overview: Input/Output Rule Construction Quality Measurement Pheromone: Initial/Updating Experiments/Results Performance/Complexity Swarm-based Genetic Programming Introduction to GP, Symbolic Regression Crossover problems Ant Colony Crossover Experiments and Results Introduction Data Mining tries to find: hidden knowledge unexpected patterns new rules in large databases. Discovery of useful summaries of data Is a key element of much more elaborate process: Knowledge Discovery in Databases (KDD) 1

2 Goals of Rule Induction Stage of Data Mining: Rule Induction Find rules to describe data in some way Not only accurate but also comprehensible for a human user to support decision making Focus in this Talk Rule Induction for Classification using ACO Given: training set (instances/cases to classify) Goal: to come up with (preferably simple) rules to classify data Algorithm by Parpinelli, Lopes and Freitas: AntMiner ACO + Genetic Programming Symbolic regression Rule Induction Possible Outputs for Rule Induction decision trees (ordered) decision lists [here] if <attribute1>=<value1> and <attribute2>=<value2> and then <class>=<class1> else if 2

3 AntMiner Input Training set / test set Attribute / value pairs Given classes / classification AntMiner Output Ordered decision list Ordered list of IF-THEN-Rules like IF <condition> THEN <class> <condition> = <term1> AND <term2> AND <term> = <attribute> = <value> + Default rule (majority value) First rule fires. Only discrete attributes supported so far. Continuous values must be discretized before. This is a quite limited version of a decision list. Prerequisites for an ACO (Review) Problem-dependent heuristic function (η) for measuring the quality of items that could be added to the partial solution so far. Pheromone updating rule (τ) Probabilistic transition rule based on η and τ Difference to most ACO algorithms mentioned in class: Does not use a graph representation of the problem. 3

4 AntMiner Algorithm: Top-Level Pseudo-Code for finding one rule set: trainingset = {all training cases} discoveredrulelist = [ ] WHILE( trainingset still too big) Initialize pheromone (equally distributed) Ants try to find a good classification rule by the ACO heuristic Add best rule found to discoveredrulelist Remove correctly covered examples from trainingset AntMiner Algorithm: Mid-Level Pseudo-Code for finding one rule: Repeat Start new ant with empty rule (antecedent) Construct rule by adding one term at a time and choosing the rule consequent subsequently Prune rule Increase pheromone on trail which ant used according to the quality of the rule Until (maximum number z of ants exceeded) or (no improvement any more during the last k iterations) Actually only the population of one ant at a time working. AntMiner Algorithm: Bottom- Level Repeat as long as possible: Add one condition to the rule. Use probabilistic approach referring to pheromone concentration and heuristic. Do not use attributes twice. Resulting rule must cover at least a minimum of cases. After having finished the antecedent, calculate the resulting class. 4

5 Rule Construction Probability for adding <A i >=<V ij > P ij = " ij # ij (t) [normalized] where A i the i-th attribute V ij the j-th possible value of the i-th attribute η heuristic function, τ pheromone trail Heuristic Function (η) Analogous to: Proximity function in TSP Colouring matrix in graph colouring problem. Uses information theory (entropy). Split instances using rule. Quality corresponds to entropy of remaining buckets ; the less, the better. k H(W A j = V ij ) = "#(P(w A j = V ij ). log 2 P(w A j = V ij )) w=1 " ij # log 2 k $ H(W A j = V ij ) [normalized] where k is number of classes Information Heuristic Example For T, high = >80, mild = 70<T 80, cold = 0<T 70 (for later) P(play outlook=sunny)=2/14=0.143, P(don t play outlook=sunny)=3/14=0.214 H(W,outlook=sunny)= log(0.143) log(0.214)=0.877 η= log 2 k H(W,outlook=sunny) = =

6 Information Heuristic Example For H, high = >85, normal= 0<T 85, (for later) P(play outlook=overcast)=4/14=0.286, P(don t play outlook=overcast)=0/14=0 H(W,outlook=sunny)= log(0.286)=0.516 η= log 2 k H(W,outlook=sunny) = =0.484 Quality Function Measuring the classification quality of a rule / several rules. For one rule: sensitivity specificity TP Q = TP + FN. TN FP + TN where T=true, F=false, P=positive, N=negative The bigger the value of Q, the better Measuring the simplicity of a rule: number of rules average number of terms per rule The less, the simpler, thus the better. Rule Pruning Iteratively remove one-term-at-a-time from the rule while this process improves the classification accuracy of the rule. Majority class might change. If ambiguous, remove term that improves the accuracy the most. Simplicity improves anyway. 6

7 Pheromone Initial pheromone value: " ij (t = 0) = 1 [normalized] a b i # i=1 where a is the total number of attributes and b i is the number of possible values of A i. Pheromone Updating (τ) Values before (1). First increase pheromone of used terms regarding rule quality (2): " ij (t +1) = " ij (t).(1+ Q) Then normalize the pheromone level of all terms pheromone evaporation (3) Using the Discovered Rules Apply in the order they were discovered. First rule that covers case is applied. If no rule covers case, apply default result (majority value). 7

8 Possible Discretization of Continuous Attributes Use C4.5-Disc Quick overview: Extract reduced data set that only contains attribute to discretize and desired classification. From that build up decision tree using the C4.5 algorithm (another rule induction algorithm). Result: Decision tree with binary decisions x a go left; x > a go right Each path corresponds to the definition of a categorical interval. AntMiner s Parameters Number of ants (3000 used in experiments). Also limits the maximum number of rules found for a classification. Is not necessarily exploited because algorithm might converge before. Minimum number of cases per rule (10). Each rule must at least cover so many cases. Avoids overfitting. Maximum number of uncovered classes in the training set (10). The algorithm stops when there are only fewer instances left. Number of rules to test for the convergence of the ants (10). The algorithm waits so long for an improvement. Sample Run Start Deciding whether to play outside Attributes: outlook, temperature, humidity, windy, play Classes: play (yes), do not play (no) sunny,hot,high,false,no (1) sunny,hot,high,true,no (2) overcast,hot,normal,false,yes (3) rainy,mild,high,false,yes (4) rainy,cool,normal,false,yes (5) rainy,cool,normal,true,no (6) overcast,cool,normal,true,yes (7) sunny,mild,high,false,no (8) sunny,cool,normal,false,yes (9) rainy,mild,normal,false,yes (10) sunny,mild,normal,true,yes (11) overcast,mild,high,true,yes (12) overcast,hot,normal,false,yes (13) rainy,mild,high,true,no (14) Sample run for finding one rule set. Start: I={all}, R={} Ant 1: Choose probabilistically outlook=overcast (then play=yes) Ant 1: Chooses values for other attributes Ant 1: Finishes because all attributes are used. Ant 1: Last three conditions are pruned away. I={1,2,4,5,6,8,9,10,11,14}, R={outlook=overcast yes) Ant 2: Choose outlook=rainy (then play=yes) Rule is not good enough (3:2) Ant 2: Choose windy=true (then play=no) Ant 2 finishes because otherwise covered set would be too small. No pruning possible either. 8

9 Sample Run Result Possible result (not most simple): outlook=overcast play=yes outlook=rainy, windy=false play=yes outlook=sunny, humidity=normal play=yes otherwise play=no Comparison to CN2 Algorithm Uses beam search (limited breadth first search with beam width b). Add all possible terms to current partial rules, evaluate, and retain only the b best ones. No feedback for constructing new rules. Output format is the same (ordered rule list). Uses entropy heuristic as well. Experiment Setup Dimension roughly: cases, 9 34 attributes, 2 6 classes Tests run using a 10-fold cross-validation procedure Divide data into 10 partitions. For each partition do Treat it as the test data and use the other 90% as the training data. Measure the performance. Take the average value. This helps to achieve significant results. 9

10 DataSets Performance Results No particular parameter optimizations for both algorithms. Same computation time. Extensions to the Algorithm By Galea [3]. Deterministic rule with q probability as in ACS-TSP. Choose probabilistically (considering pheromone trail and heuristic function) with probability q. Otherwise deterministically choose term with maximum probability. Improves results slightly. Extension for fuzzy rules also possible. 10

11 Comparative Results Side-by-side Comparison Effects of Rule Pruning 11

12 Generated Rules Terms per Rule Algorithm Complexity Introducing a lot of variables n: number of cases a: number of attributes v: number of values per attribute; considered small; O(1) k: number of conditions per inspected rule while evaluating and pruning z: number of ants r: number of discovered rules 12

13 Complexity Comparison Ant-Miner, average case: Ant-Miner, worst case k = O(a): CN2: O(r.z.[k.a + n.k 3 ] + a.n) O(r.z.a 3.n) O(a(n + log(a))) Further Experiments Further experiments by the authors of AntMiner show that ACO really helps: Use of pheromone trails improves the average solution. Use of rule pruning improves the simplicity without harming the quality. References [1] Data Mining with an Ant Colony Optimization Algorithm. Parpinelli, Lopez, Freitas [2] An Ant Colony Based System for Data Mining: Applications To Medical. Parpinelli, Lopez, Freitas 2001 [3] Applying Swarm Intelligence to Rule Induction. Michelle Galea [4] The CN2 Induction Algorithm. Clark, Niblett [5] Data Mining. Adriaans, Zantinge. Addison-Wesley [6] Learning Fuzzy Rules Using Ant Colony Optimization Algorithms. Casillas, Cordón, Herrera [7] Bryan Atkinson Honours Project Report: n-atkinson-winter-2006.pdf 13

14 Ant-based Programming Genetic Programming has been successful at inducing program descriptions Problems with scaling: Diversity Retaining useful fragments: Avoiding disruption of higher order functions Can ACO help? Maybe, learn useful associations, avoid disruption Genetic Programming Programs represented in tree structure Learning through: Population-based, evolutionary search Genetic operators: crossover, mutation Requires specification of: Functions (F): internal nodes Terminals (T): leaf nodes Symbolic Regression: F = {+, -, /, *, sin, cos, exp} T = {integers in range (-5, 5), } Symbolic Regression Find function that best fits a number of sample points. Good fit determined by hits: candidate function within threshold distance size(d) 1 f (k) = h(k)" # e(k,i) max(h(k),1) i =1 e(k,i) = abs( ( v(k,x(i))" y(i) )) v(k,x) = Value of k th program for x h(k) = size(d ) " i =1 hits(k,i) 0 if e(k,i)# $ hits(k,i) = 1 otherwise 14

15 Symbolic Regression Example GP: Mathematically: 3x + sin(x) + Crossover + * cos sin cos * * * sin + * cos Problem: can disrupt useful couplings *- easily * * Adapting Crossover with ACO Use context-aware crossover Basic crossover chooses node randomly -- context unaware Adapt crossover to remember useful function couplings Not automatically defined functions (ADFs) 15

16 Function Coupling Matrix (C) Function + * sin cos * sin cos Important couplings have high values; e.g. sin-x Swarm-based GP (SB-GP) Three modifications to GP: 1. Initialization of Coupling matrix, C. 2. Crossover using coupling matrix. 3. Pheromone update based upon program fitness. Pheromone Initialization For all function and terminal coupling (i, j): Initialize pheromone, τ i,j, to initial value, τ 0 τ 0 is system parameter 16

17 Ant Colony (AC) Crossover Choose a random branch, B, from root to a leaf in program tree P n For every edge i, j in B Probability of choosing node i as root of subtree S n where i is parent and j is a child node is given by: p(i, n) = (τ max (n) - τ min (n) + τ i,j (n)) / Τ(n) Choose random branch, B, from root to a leaf in program tree P m For every edge i, j in B Probability of choosing node i as root of subtree S m p(i, m) = (τ max (m) - τ min (m) + τ i,j (m)) / Τ(m) where T(k) is given by: AC Crossover Continued T(k) = Σ i,j E(k) (τ max (k) - τ min (k) + τ i,j (k)) and τ i,j (k) = C(V(k,i),V(k,j)) τ max (k) = max i,j E(k) (τ i,j (k)) τ min (k) = min i,j E(k) (τ i,j (k)) and E(k) = { edges in k th program subtree } AC Crossover Example 17

18 Experimental Parameters Parameter Value Parameter Value Initial Pheromone 0 Evaporation rate p Best k programs used for evaluation Max Program Depth 10-5 Min Program Depth 0.9 Tournament Size 30 Crossover probability 15 Mutation Probability 1 Number of Generations (default) Functions and Results F1: cos( 2 )+sin( 2 )+ 2 F2: cos( 2 )+sin( 2 )+ 2 +cos()+sin() F3: sin()* 4 +sin()* 3 +sin()* 2 + sin()* Test GP Mean GP STD SB- GP Mean SB- GP STD P Value Population Size F F F F F F3: Function Couplings 18

19 Conclusions Statistically significant improvement in performance Useful couplings learnt Number of successful trials increased Couplings can saturate: Use ACS-style q mechanism to choose randomly some of time 19

An Ant Colony Based System for Data Mining: Applications to Medical Data

An Ant Colony Based System for Data Mining: Applications to Medical Data An Ant Colony Based System for Data Mining: Applications to Medical Data Rafael S. Parpinelli 1 Heitor S. Lopes 1 Alex A. Freitas 2 1 CEFET-PR, CPGEI Av. Sete de Setembro, 3165 Curitiba - PR, 80230-901

More information

Ant ColonyAlgorithms. for Data Classification

Ant ColonyAlgorithms. for Data Classification Ant ColonyAlgorithms for Data Classification Alex A. Freitas University of Kent, Canterbury, UK Rafael S. Parpinelli UDESC, Joinville, Brazil Heitor S. Lopes UTFPR, Curitiba, Brazil INTRODUCTION Ant Colony

More information

Representing structural patterns: Reading Material: Chapter 3 of the textbook by Witten

Representing structural patterns: Reading Material: Chapter 3 of the textbook by Witten Representing structural patterns: Plain Classification rules Decision Tree Rules with exceptions Relational solution Tree for Numerical Prediction Instance-based presentation Reading Material: Chapter

More information

PASS EVALUATING IN SIMULATED SOCCER DOMAIN USING ANT-MINER ALGORITHM

PASS EVALUATING IN SIMULATED SOCCER DOMAIN USING ANT-MINER ALGORITHM PASS EVALUATING IN SIMULATED SOCCER DOMAIN USING ANT-MINER ALGORITHM Mohammad Ali Darvish Darab Qazvin Azad University Mechatronics Research Laboratory, Qazvin Azad University, Qazvin, Iran ali@armanteam.org

More information

Data Mining Ant Colony for Classifiers

Data Mining Ant Colony for Classifiers International Journal of Basic & Applied Sciences IJBAS-IJENS Vol: 10 No: 03 28 Data Mining Ant Colony for Classifiers Ahmed Sameh*, Khalid Magdy** *Department of Computer & Information Systems Prince

More information

IMPLEMENTATION OF ANT COLONY ALGORITHMS IN MATLAB R. Seidlová, J. Poživil

IMPLEMENTATION OF ANT COLONY ALGORITHMS IN MATLAB R. Seidlová, J. Poživil Abstract IMPLEMENTATION OF ANT COLONY ALGORITHMS IN MATLAB R. Seidlová, J. Poživil Institute of Chemical Technology, Department of Computing and Control Engineering Technická 5, Prague 6, 166 28, Czech

More information

Classification Using Unstructured Rules and Ant Colony Optimization

Classification Using Unstructured Rules and Ant Colony Optimization Classification Using Unstructured Rules and Ant Colony Optimization Negar Zakeri Nejad, Amir H. Bakhtiary, and Morteza Analoui Abstract In this paper a new method based on the algorithm is proposed to

More information

Genetic Algorithms and Genetic Programming Lecture 13

Genetic Algorithms and Genetic Programming Lecture 13 Genetic Algorithms and Genetic Programming Lecture 13 Gillian Hayes 9th November 2007 Ant Colony Optimisation and Bin Packing Problems Ant Colony Optimisation - review Pheromone Trail and Heuristic The

More information

Using Genetic Algorithms to optimize ACS-TSP

Using Genetic Algorithms to optimize ACS-TSP Using Genetic Algorithms to optimize ACS-TSP Marcin L. Pilat and Tony White School of Computer Science, Carleton University, 1125 Colonel By Drive, Ottawa, ON, K1S 5B6, Canada {mpilat,arpwhite}@scs.carleton.ca

More information

Chapter 4: Algorithms CS 795

Chapter 4: Algorithms CS 795 Chapter 4: Algorithms CS 795 Inferring Rudimentary Rules 1R Single rule one level decision tree Pick each attribute and form a single level tree without overfitting and with minimal branches Pick that

More information

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Midterm Examination CS540-2: Introduction to Artificial Intelligence Midterm Examination CS540-2: Introduction to Artificial Intelligence March 15, 2018 LAST NAME: FIRST NAME: Problem Score Max Score 1 12 2 13 3 9 4 11 5 8 6 13 7 9 8 16 9 9 Total 100 Question 1. [12] Search

More information

Decision Tree Learning

Decision Tree Learning Decision Tree Learning 1 Simple example of object classification Instances Size Color Shape C(x) x1 small red circle positive x2 large red circle positive x3 small red triangle negative x4 large blue circle

More information

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Midterm Exam Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Covers topics through Decision Trees and Random Forests (does not include constraint satisfaction) Closed book 8.5 x 11 sheet with notes

More information

Inducing Decision Trees with an Ant Colony Optimization Algorithm

Inducing Decision Trees with an Ant Colony Optimization Algorithm To Appear in Applied Soft Computing (2012), DOI: 10.1016/j.asoc.2012.05.028 1 Inducing Decision Trees with an Ant Colony Optimization Algorithm Fernando E. B. Otero, Alex A. Freitas, Colin G. Johnson School

More information

Classification with Decision Tree Induction

Classification with Decision Tree Induction Classification with Decision Tree Induction This algorithm makes Classification Decision for a test sample with the help of tree like structure (Similar to Binary Tree OR k-ary tree) Nodes in the tree

More information

Algorithm in Classification Rule Mining. Application of a New m. Ant-Miner PR

Algorithm in Classification Rule Mining. Application of a New m. Ant-Miner PR Application of a New m Algorithm in Classification Rule Mining L. Yang 1*, K.S. Li 1, W.S. Zhang 2, Y. Wang 3, Z.X. Ke 1 1 College of Mathematics and Informatics South China Agricultural University, Guangzhou

More information

METAHEURISTICS. Introduction. Introduction. Nature of metaheuristics. Local improvement procedure. Example: objective function

METAHEURISTICS. Introduction. Introduction. Nature of metaheuristics. Local improvement procedure. Example: objective function Introduction METAHEURISTICS Some problems are so complicated that are not possible to solve for an optimal solution. In these problems, it is still important to find a good feasible solution close to the

More information

A New Ant Colony Algorithm for Multi-Label Classification with Applications in Bioinformatics

A New Ant Colony Algorithm for Multi-Label Classification with Applications in Bioinformatics A New Ant Colony Algorithm for Multi-Label Classification with Applications in Bioinformatics Allen Chan and Alex A. Freitas Computing Laboratory University of Kent Canterbury, CT2 7NZ, UK achan.83@googlemail.com,

More information

LECTURE 20: SWARM INTELLIGENCE 6 / ANT COLONY OPTIMIZATION 2

LECTURE 20: SWARM INTELLIGENCE 6 / ANT COLONY OPTIMIZATION 2 15-382 COLLECTIVE INTELLIGENCE - S18 LECTURE 20: SWARM INTELLIGENCE 6 / ANT COLONY OPTIMIZATION 2 INSTRUCTOR: GIANNI A. DI CARO ANT-ROUTING TABLE: COMBINING PHEROMONE AND HEURISTIC 2 STATE-TRANSITION:

More information

Evolutionary Algorithms. CS Evolutionary Algorithms 1

Evolutionary Algorithms. CS Evolutionary Algorithms 1 Evolutionary Algorithms CS 478 - Evolutionary Algorithms 1 Evolutionary Computation/Algorithms Genetic Algorithms l Simulate natural evolution of structures via selection and reproduction, based on performance

More information

Decision Tree CE-717 : Machine Learning Sharif University of Technology

Decision Tree CE-717 : Machine Learning Sharif University of Technology Decision Tree CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adapted from: Prof. Tom Mitchell Decision tree Approximating functions of usually discrete

More information

CT79 SOFT COMPUTING ALCCS-FEB 2014

CT79 SOFT COMPUTING ALCCS-FEB 2014 Q.1 a. Define Union, Intersection and complement operations of Fuzzy sets. For fuzzy sets A and B Figure Fuzzy sets A & B The union of two fuzzy sets A and B is a fuzzy set C, written as C=AUB or C=A OR

More information

A Hierarchical Multi-Label Classification Ant Colony Algorithm for Protein Function Prediction

A Hierarchical Multi-Label Classification Ant Colony Algorithm for Protein Function Prediction Noname manuscript No. (will be inserted by the editor) Fernando E. B. Otero Alex A. Freitas Colin G. Johnson A Hierarchical Multi-Label Classification Ant Colony Algorithm for Protein Function Prediction

More information

Nominal Data. May not have a numerical representation Distance measures might not make sense PR, ANN, & ML

Nominal Data. May not have a numerical representation Distance measures might not make sense PR, ANN, & ML Decision Trees Nominal Data So far we consider patterns to be represented by feature vectors of real or integer values Easy to come up with a distance (similarity) measure by using a variety of mathematical

More information

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN NonMetric Data Nominal Data So far we consider patterns to be represented by feature vectors of real or integer values Easy to come up with a distance (similarity) measure by using a variety of mathematical

More information

Chapter 4: Algorithms CS 795

Chapter 4: Algorithms CS 795 Chapter 4: Algorithms CS 795 Inferring Rudimentary Rules 1R Single rule one level decision tree Pick each attribute and form a single level tree without overfitting and with minimal branches Pick that

More information

An Efficient Analysis for High Dimensional Dataset Using K-Means Hybridization with Ant Colony Optimization Algorithm

An Efficient Analysis for High Dimensional Dataset Using K-Means Hybridization with Ant Colony Optimization Algorithm An Efficient Analysis for High Dimensional Dataset Using K-Means Hybridization with Ant Colony Optimization Algorithm Prabha S. 1, Arun Prabha K. 2 1 Research Scholar, Department of Computer Science, Vellalar

More information

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Decision Tree Example Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short} Class: Country = {Gromland, Polvia} CS4375 --- Fall 2018 a

More information

Sequential Covering Strategy Based Classification Approach Using Ant Colony Optimization

Sequential Covering Strategy Based Classification Approach Using Ant Colony Optimization ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal Innovative Research in Science, Engineering and Technology Volume 3, Special Issue 3, March 2014 2014 International Conference on

More information

Ant Colony Optimization: The Traveling Salesman Problem

Ant Colony Optimization: The Traveling Salesman Problem Ant Colony Optimization: The Traveling Salesman Problem Section 2.3 from Swarm Intelligence: From Natural to Artificial Systems by Bonabeau, Dorigo, and Theraulaz Andrew Compton Ian Rogers 12/4/2006 Traveling

More information

A Survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery

A Survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery A Survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery Alex A. Freitas Postgraduate Program in Computer Science, Pontificia Universidade Catolica do Parana Rua Imaculada Conceicao,

More information

Mutations for Permutations

Mutations for Permutations Mutations for Permutations Insert mutation: Pick two allele values at random Move the second to follow the first, shifting the rest along to accommodate Note: this preserves most of the order and adjacency

More information

Algorithms: Decision Trees

Algorithms: Decision Trees Algorithms: Decision Trees A small dataset: Miles Per Gallon Suppose we want to predict MPG From the UCI repository A Decision Stump Recursion Step Records in which cylinders = 4 Records in which cylinders

More information

Genetic Programming for Data Classification: Partitioning the Search Space

Genetic Programming for Data Classification: Partitioning the Search Space Genetic Programming for Data Classification: Partitioning the Search Space Jeroen Eggermont jeggermo@liacs.nl Joost N. Kok joost@liacs.nl Walter A. Kosters kosters@liacs.nl ABSTRACT When Genetic Programming

More information

Data Mining and Knowledge Discovery Practice notes 2

Data Mining and Knowledge Discovery Practice notes 2 Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Decision trees Extending previous approach: Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank to permit numeric s: straightforward

More information

Genetic Programming. Modern optimization methods 1

Genetic Programming. Modern optimization methods 1 Genetic Programming Developed in USA during 90 s Patented by J. Koza Solves typical problems: Prediction, classification, approximation, programming Properties Competitor of neural networks Need for huge

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 8.11.2017 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization

More information

KNAPSACK BASED ACCS INFORMATION RETRIEVAL FRAMEWORK FOR BIO-MEDICAL LITERATURE USING SIMILARITY BASED CLUSTERING APPROACH.

KNAPSACK BASED ACCS INFORMATION RETRIEVAL FRAMEWORK FOR BIO-MEDICAL LITERATURE USING SIMILARITY BASED CLUSTERING APPROACH. KNAPSACK BASED ACCS INFORMATION RETRIEVAL FRAMEWORK FOR BIO-MEDICAL LITERATURE USING SIMILARITY BASED CLUSTERING APPROACH. 1 K.Latha 2 S.Archana 2 R.John Regies 3 Dr. Rajaram 1 Lecturer of Information

More information

Discrete Particle Swarm Optimization With Local Search Strategy for Rule Classification

Discrete Particle Swarm Optimization With Local Search Strategy for Rule Classification Discrete Particle Swarm Optimization With Local Search Strategy for Rule Classification Min Chen and Simone A. Ludwig Department of Computer Science North Dakota State University Fargo, ND, USA min.chen@my.ndsu.edu,

More information

Fuzzy Ant Clustering by Centroid Positioning

Fuzzy Ant Clustering by Centroid Positioning Fuzzy Ant Clustering by Centroid Positioning Parag M. Kanade and Lawrence O. Hall Computer Science & Engineering Dept University of South Florida, Tampa FL 33620 @csee.usf.edu Abstract We

More information

Association Rule Learning

Association Rule Learning Association Rule Learning 16s1: COMP9417 Machine Learning and Data Mining School of Computer Science and Engineering, University of New South Wales March 15, 2016 COMP9417 ML & DM (CSE, UNSW) Association

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.4. Spring 2010 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/11/16 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization

More information

Kyrre Glette INF3490 Evolvable Hardware Cartesian Genetic Programming

Kyrre Glette INF3490 Evolvable Hardware Cartesian Genetic Programming Kyrre Glette kyrrehg@ifi INF3490 Evolvable Hardware Cartesian Genetic Programming Overview Introduction to Evolvable Hardware (EHW) Cartesian Genetic Programming Applications of EHW 3 Evolvable Hardware

More information

Data Mining and Analytics

Data Mining and Analytics Data Mining and Analytics Aik Choon Tan, Ph.D. Associate Professor of Bioinformatics Division of Medical Oncology Department of Medicine aikchoon.tan@ucdenver.edu 9/22/2017 http://tanlab.ucdenver.edu/labhomepage/teaching/bsbt6111/

More information

Introduction to Artificial Intelligence

Introduction to Artificial Intelligence Introduction to Artificial Intelligence COMP307 Evolutionary Computing 3: Genetic Programming for Regression and Classification Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline Statistical parameter regression Symbolic

More information

Local Search (Ch )

Local Search (Ch ) Local Search (Ch. 4-4.1) Local search Before we tried to find a path from the start state to a goal state using a fringe set Now we will look at algorithms that do not care about a fringe, but just neighbors

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

Instance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges.

Instance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges. Instance-Based Representations exemplars + distance measure Challenges. algorithm: IB1 classify based on majority class of k nearest neighbors learned structure is not explicitly represented choosing k

More information

CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM

CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM 1 CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM John R. Koza Computer Science Department Stanford University Stanford, California 94305 USA E-MAIL: Koza@Sunburn.Stanford.Edu

More information

Ant colony optimization with genetic operations

Ant colony optimization with genetic operations Automation, Control and Intelligent Systems ; (): - Published online June, (http://www.sciencepublishinggroup.com/j/acis) doi:./j.acis.. Ant colony optimization with genetic operations Matej Ciba, Ivan

More information

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10) CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification

More information

Decision tree learning

Decision tree learning Decision tree learning Andrea Passerini passerini@disi.unitn.it Machine Learning Learning the concept Go to lesson OUTLOOK Rain Overcast Sunny TRANSPORTATION LESSON NO Uncovered Covered Theoretical Practical

More information

Midterm Examination CS 540-2: Introduction to Artificial Intelligence

Midterm Examination CS 540-2: Introduction to Artificial Intelligence Midterm Examination CS 54-2: Introduction to Artificial Intelligence March 9, 217 LAST NAME: FIRST NAME: Problem Score Max Score 1 15 2 17 3 12 4 6 5 12 6 14 7 15 8 9 Total 1 1 of 1 Question 1. [15] State

More information

Solving the Traveling Salesman Problem using Reinforced Ant Colony Optimization techniques

Solving the Traveling Salesman Problem using Reinforced Ant Colony Optimization techniques Solving the Traveling Salesman Problem using Reinforced Ant Colony Optimization techniques N.N.Poddar 1, D. Kaur 2 1 Electrical Engineering and Computer Science, University of Toledo, Toledo, OH, USA 2

More information

Inducing Parameters of a Decision Tree for Expert System Shell McESE by Genetic Algorithm

Inducing Parameters of a Decision Tree for Expert System Shell McESE by Genetic Algorithm Inducing Parameters of a Decision Tree for Expert System Shell McESE by Genetic Algorithm I. Bruha and F. Franek Dept of Computing & Software, McMaster University Hamilton, Ont., Canada, L8S4K1 Email:

More information

A Constrained-Syntax Genetic Programming System for Discovering Classification Rules: Application to Medical Data Sets

A Constrained-Syntax Genetic Programming System for Discovering Classification Rules: Application to Medical Data Sets A Constrained-Syntax Genetic Programming System for Discovering Classification Rules: Application to Medical Data Sets Celia C. Bojarczuk 1 ; Heitor S. Lopes 2 ; Alex A. Freitas 3 ; Edson L. Michalkiewicz

More information

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability Decision trees A decision tree is a method for classification/regression that aims to ask a few relatively simple questions about an input and then predicts the associated output Decision trees are useful

More information

Business Club. Decision Trees

Business Club. Decision Trees Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building

More information

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control. What is Learning? CS 343: Artificial Intelligence Machine Learning Herbert Simon: Learning is any process by which a system improves performance from experience. What is the task? Classification Problem

More information

A Classifier with the Function-based Decision Tree

A Classifier with the Function-based Decision Tree A Classifier with the Function-based Decision Tree Been-Chian Chien and Jung-Yi Lin Institute of Information Engineering I-Shou University, Kaohsiung 84008, Taiwan, R.O.C E-mail: cbc@isu.edu.tw, m893310m@isu.edu.tw

More information

Crew Scheduling Problem: A Column Generation Approach Improved by a Genetic Algorithm. Santos and Mateus (2007)

Crew Scheduling Problem: A Column Generation Approach Improved by a Genetic Algorithm. Santos and Mateus (2007) In the name of God Crew Scheduling Problem: A Column Generation Approach Improved by a Genetic Algorithm Spring 2009 Instructor: Dr. Masoud Yaghini Outlines Problem Definition Modeling As A Set Partitioning

More information

GA is the most popular population based heuristic algorithm since it was developed by Holland in 1975 [1]. This algorithm runs faster and requires les

GA is the most popular population based heuristic algorithm since it was developed by Holland in 1975 [1]. This algorithm runs faster and requires les Chaotic Crossover Operator on Genetic Algorithm Hüseyin Demirci Computer Engineering, Sakarya University, Sakarya, 54187, Turkey Ahmet Turan Özcerit Computer Engineering, Sakarya University, Sakarya, 54187,

More information

Advanced learning algorithms

Advanced learning algorithms Advanced learning algorithms Extending decision trees; Extraction of good classification rules; Support vector machines; Weighted instance-based learning; Design of Model Tree Clustering Association Mining

More information

Genetic Programming. and its use for learning Concepts in Description Logics

Genetic Programming. and its use for learning Concepts in Description Logics Concepts in Description Artificial Intelligence Institute Computer Science Department Dresden Technical University May 29, 2006 Outline Outline: brief introduction to explanation of the workings of a algorithm

More information

Automatic Programming with Ant Colony Optimization

Automatic Programming with Ant Colony Optimization Automatic Programming with Ant Colony Optimization Jennifer Green University of Kent jg9@kent.ac.uk Jacqueline L. Whalley University of Kent J.L.Whalley@kent.ac.uk Colin G. Johnson University of Kent C.G.Johnson@kent.ac.uk

More information

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification Data Mining 3.3 Fall 2008 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rules With Exceptions Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms

More information

A Hybrid PSO/ACO Algorithm for Classification

A Hybrid PSO/ACO Algorithm for Classification A Hybrid PSO/ACO Algorithm for Classification Nicholas Holden University of Kent Computing Laboratory Canterbury, CT2 7NF, UK +44 (0)1227 823192 nickpholden@gmail.com Alex A. Frietas University of Kent

More information

International Journal of Current Trends in Engineering & Technology Volume: 02, Issue: 01 (JAN-FAB 2016)

International Journal of Current Trends in Engineering & Technology Volume: 02, Issue: 01 (JAN-FAB 2016) Survey on Ant Colony Optimization Shweta Teckchandani, Prof. Kailash Patidar, Prof. Gajendra Singh Sri Satya Sai Institute of Science & Technology, Sehore Madhya Pradesh, India Abstract Although ant is

More information

ABC Optimization: A Co-Operative Learning Approach to Complex Routing Problems

ABC Optimization: A Co-Operative Learning Approach to Complex Routing Problems Progress in Nonlinear Dynamics and Chaos Vol. 1, 2013, 39-46 ISSN: 2321 9238 (online) Published on 3 June 2013 www.researchmathsci.org Progress in ABC Optimization: A Co-Operative Learning Approach to

More information

Ant Colony Optimization

Ant Colony Optimization DM841 DISCRETE OPTIMIZATION Part 2 Heuristics Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. earch 2. Context Inspiration from Nature 3. 4. 5.

More information

SIMULATION APPROACH OF CUTTING TOOL MOVEMENT USING ARTIFICIAL INTELLIGENCE METHOD

SIMULATION APPROACH OF CUTTING TOOL MOVEMENT USING ARTIFICIAL INTELLIGENCE METHOD Journal of Engineering Science and Technology Special Issue on 4th International Technical Conference 2014, June (2015) 35-44 School of Engineering, Taylor s University SIMULATION APPROACH OF CUTTING TOOL

More information

Data Analytics and Boolean Algebras

Data Analytics and Boolean Algebras Data Analytics and Boolean Algebras Hans van Thiel November 28, 2012 c Muitovar 2012 KvK Amsterdam 34350608 Passeerdersstraat 76 1016 XZ Amsterdam The Netherlands T: + 31 20 6247137 E: hthiel@muitovar.com

More information

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá INTRODUCTION TO DATA MINING Daniel Rodríguez, University of Alcalá Outline Knowledge Discovery in Datasets Model Representation Types of models Supervised Unsupervised Evaluation (Acknowledgement: Jesús

More information

Decision Trees: Discussion

Decision Trees: Discussion Decision Trees: Discussion Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning

More information

Derivation of Relational Fuzzy Classification Rules Using Evolutionary Computation

Derivation of Relational Fuzzy Classification Rules Using Evolutionary Computation Derivation of Relational Fuzzy Classification Rules Using Evolutionary Computation Vahab Akbarzadeh Alireza Sadeghian Marcus V. dos Santos Abstract An evolutionary system for derivation of fuzzy classification

More information

Novel Approach for Image Edge Detection

Novel Approach for Image Edge Detection Novel Approach for Image Edge Detection Pankaj Valand 1, Mayurdhvajsinh Gohil 2, Pragnesh Patel 3 Assistant Professor, Electrical Engg. Dept., DJMIT, Mogar, Anand, Gujarat, India 1 Assistant Professor,

More information

CHAPTER 4 GENETIC ALGORITHM

CHAPTER 4 GENETIC ALGORITHM 69 CHAPTER 4 GENETIC ALGORITHM 4.1 INTRODUCTION Genetic Algorithms (GAs) were first proposed by John Holland (Holland 1975) whose ideas were applied and expanded on by Goldberg (Goldberg 1989). GAs is

More information

Improving Tree-Based Classification Rules Using a Particle Swarm Optimization

Improving Tree-Based Classification Rules Using a Particle Swarm Optimization Improving Tree-Based Classification Rules Using a Particle Swarm Optimization Chi-Hyuck Jun *, Yun-Ju Cho, and Hyeseon Lee Department of Industrial and Management Engineering Pohang University of Science

More information

7. Decision or classification trees

7. Decision or classification trees 7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 03 Data Processing, Data Mining Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

The k-means Algorithm and Genetic Algorithm

The k-means Algorithm and Genetic Algorithm The k-means Algorithm and Genetic Algorithm k-means algorithm Genetic algorithm Rough set approach Fuzzy set approaches Chapter 8 2 The K-Means Algorithm The K-Means algorithm is a simple yet effective

More information

Escaping Local Optima: Genetic Algorithm

Escaping Local Optima: Genetic Algorithm Artificial Intelligence Escaping Local Optima: Genetic Algorithm Dae-Won Kim School of Computer Science & Engineering Chung-Ang University We re trying to escape local optima To achieve this, we have learned

More information

A new improved ant colony algorithm with levy mutation 1

A new improved ant colony algorithm with levy mutation 1 Acta Technica 62, No. 3B/2017, 27 34 c 2017 Institute of Thermomechanics CAS, v.v.i. A new improved ant colony algorithm with levy mutation 1 Zhang Zhixin 2, Hu Deji 2, Jiang Shuhao 2, 3, Gao Linhua 2,

More information

Machine Learning in Telecommunications

Machine Learning in Telecommunications Machine Learning in Telecommunications Paulos Charonyktakis & Maria Plakia Department of Computer Science, University of Crete Institute of Computer Science, FORTH Roadmap Motivation Supervised Learning

More information

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis CHAPTER 3 BEST FIRST AND GREEDY SEARCH BASED CFS AND NAÏVE BAYES ALGORITHMS FOR HEPATITIS DIAGNOSIS 3.1 Introduction

More information

Simplicial Global Optimization

Simplicial Global Optimization Simplicial Global Optimization Julius Žilinskas Vilnius University, Lithuania September, 7 http://web.vu.lt/mii/j.zilinskas Global optimization Find f = min x A f (x) and x A, f (x ) = f, where A R n.

More information

Genetic Algorithms and Genetic Programming. Lecture 9: (23/10/09)

Genetic Algorithms and Genetic Programming. Lecture 9: (23/10/09) Genetic Algorithms and Genetic Programming Lecture 9: (23/10/09) Genetic programming II Michael Herrmann michael.herrmann@ed.ac.uk, phone: 0131 6 517177, Informatics Forum 1.42 Overview 1. Introduction:

More information

An Ant Approach to the Flow Shop Problem

An Ant Approach to the Flow Shop Problem An Ant Approach to the Flow Shop Problem Thomas Stützle TU Darmstadt, Computer Science Department Alexanderstr. 10, 64283 Darmstadt Phone: +49-6151-166651, Fax +49-6151-165326 email: stuetzle@informatik.tu-darmstadt.de

More information

9/6/14. Our first learning algorithm. Comp 135 Introduction to Machine Learning and Data Mining. knn Algorithm. knn Algorithm (simple form)

9/6/14. Our first learning algorithm. Comp 135 Introduction to Machine Learning and Data Mining. knn Algorithm. knn Algorithm (simple form) Comp 135 Introduction to Machine Learning and Data Mining Our first learning algorithm How would you classify the next example? Fall 2014 Professor: Roni Khardon Computer Science Tufts University o o o

More information

Association Rule Mining and Clustering

Association Rule Mining and Clustering Association Rule Mining and Clustering Lecture Outline: Classification vs. Association Rule Mining vs. Clustering Association Rule Mining Clustering Types of Clusters Clustering Algorithms Hierarchical:

More information

Gradient Descent. 1) S! initial state 2) Repeat: Similar to: - hill climbing with h - gradient descent over continuous space

Gradient Descent. 1) S! initial state 2) Repeat: Similar to: - hill climbing with h - gradient descent over continuous space Local Search 1 Local Search Light-memory search method No search tree; only the current state is represented! Only applicable to problems where the path is irrelevant (e.g., 8-queen), unless the path is

More information

Data Mining. Covering algorithms. Covering approach At each stage you identify a rule that covers some of instances. Fig. 4.

Data Mining. Covering algorithms. Covering approach At each stage you identify a rule that covers some of instances. Fig. 4. Data Mining Chapter 4. Algorithms: The Basic Methods (Covering algorithm, Association rule, Linear models, Instance-based learning, Clustering) 1 Covering approach At each stage you identify a rule that

More information

Automatic Design of Ant Algorithms with Grammatical Evolution

Automatic Design of Ant Algorithms with Grammatical Evolution Automatic Design of Ant Algorithms with Grammatical Evolution Jorge Tavares 1 and Francisco B. Pereira 1,2 CISUC, Department of Informatics Engineering, University of Coimbra Polo II - Pinhal de Marrocos,

More information

Machine Learning. Decision Trees. Manfred Huber

Machine Learning. Decision Trees. Manfred Huber Machine Learning Decision Trees Manfred Huber 2015 1 Decision Trees Classifiers covered so far have been Non-parametric (KNN) Probabilistic with independence (Naïve Bayes) Linear in features (Logistic

More information

An Evolutionary Algorithm for Minimizing Multimodal Functions

An Evolutionary Algorithm for Minimizing Multimodal Functions An Evolutionary Algorithm for Minimizing Multimodal Functions D.G. Sotiropoulos, V.P. Plagianakos and M.N. Vrahatis University of Patras, Department of Mamatics, Division of Computational Mamatics & Informatics,

More information

Relationship between Genetic Algorithms and Ant Colony Optimization Algorithms

Relationship between Genetic Algorithms and Ant Colony Optimization Algorithms Relationship between Genetic Algorithms and Ant Colony Optimization Algorithms Osvaldo Gómez Universidad Nacional de Asunción Centro Nacional de Computación Asunción, Paraguay ogomez@cnc.una.py and Benjamín

More information

CS Machine Learning

CS Machine Learning CS 60050 Machine Learning Decision Tree Classifier Slides taken from course materials of Tan, Steinbach, Kumar 10 10 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K

More information