A Multi-Objective Evolutionary Approach to Pareto Optimal Model Trees. A preliminary study

Similar documents
A Parallel Evolutionary Algorithm for Discovery of Decision Rules

Incorporation of Scalarizing Fitness Functions into Evolutionary Multiobjective Optimization Algorithms

An Evolutionary Algorithm for Global Induction of Regression and Model Trees

Multiobjective Formulations of Fuzzy Rule-Based Classification System Design

Multi-Objective Optimization using Evolutionary Algorithms

Multi-Objective Optimization using Evolutionary Algorithms

Non-Dominated Bi-Objective Genetic Mining Algorithm

Overview of NSGA-II for Optimizing Machining Process Parameters

Approximation-Guided Evolutionary Multi-Objective Optimization

Multi-objective Optimization

SPEA2+: Improving the Performance of the Strength Pareto Evolutionary Algorithm 2

Comparison of Evolutionary Multiobjective Optimization with Reference Solution-Based Single-Objective Approach

Multi-objective Optimization Algorithm based on Magnetotactic Bacterium

Evolutionary Model Tree Induction

Hybridization EVOLUTIONARY COMPUTING. Reasons for Hybridization - 1. Naming. Reasons for Hybridization - 3. Reasons for Hybridization - 2

Induction of Multivariate Decision Trees by Using Dipolar Criteria

A Search Method with User s Preference Direction using Reference Lines

An Experimental Multi-Objective Study of the SVM Model Selection problem

Evolutionary Computation

DCMOGADES: Distributed Cooperation model of Multi-Objective Genetic Algorithm with Distributed Scheme

Optimizing Delivery Time in Multi-Objective Vehicle Routing Problems with Time Windows

Multi-objective Optimization

Function Approximation and Feature Selection Tool

A Recommender System Based on Improvised K- Means Clustering Algorithm

Lamarckian Repair and Darwinian Repair in EMO Algorithms for Multiobjective 0/1 Knapsack Problems

Approximation Model Guided Selection for Evolutionary Multiobjective Optimization

Evolving Human Competitive Research Spectra-Based Note Fault Localisation Techniques

A genetic algorithms approach to optimization parameter space of Geant-V prototype

Investigating the Effect of Parallelism in Decomposition Based Evolutionary Many-Objective Optimization Algorithms

EVOLUTIONARY algorithms (EAs) are a class of

Recombination of Similar Parents in EMO Algorithms

Multiobjective Optimisation. Why? Panorama. General Formulation. Decision Space and Objective Space. 1 of 7 02/03/15 09:49.

Evolutionary Algorithms: Lecture 4. Department of Cybernetics, CTU Prague.

Late Parallelization and Feedback Approaches for Distributed Computation of Evolutionary Multiobjective Optimization Algorithms

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Communication Strategies in Distributed Evolutionary Algorithms for Multi-objective Optimization

Trade-off Between Computational Complexity and Accuracy in Evolutionary Image Feature Extraction

Introduction to Artificial Intelligence

MULTI-OBJECTIVE GENETIC LOCAL SEARCH ALGORITHM FOR SUPPLY CHAIN SIMULATION OPTIMISATION

A Similarity-Based Mating Scheme for Evolutionary Multiobjective Optimization

Improved Crowding Distance for NSGA-II

A Lexicographic Multi-Objective Genetic Algorithm. GA for Multi-Label Correlation-based Feature Selection

Multi-Objective Pipe Smoothing Genetic Algorithm For Water Distribution Network Design

Finding a preferred diverse set of Pareto-optimal solutions for a limited number of function calls

Discovering Knowledge Rules with Multi-Objective Evolutionary Computing

Developing Multiple Topologies of Path Generating Compliant Mechanism (PGCM) using Evolutionary Optimization

Supervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...

Multiobjective Prototype Optimization with Evolved Improvement Steps

STATISTICS (STAT) Statistics (STAT) 1

Performance Evaluation of Vector Evaluated Gravitational Search Algorithm II

Performance Assessment of DMOEA-DD with CEC 2009 MOEA Competition Test Instances

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Mechanical Component Design for Multiple Objectives Using Elitist Non-Dominated Sorting GA

Evolutionary induction of a decision tree for large-scale data: a GPU-based approach

Multi-Objective Evolutionary Instance Selection for Regression Tasks

Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems

Improving Tree-Based Classification Rules Using a Particle Swarm Optimization

Hardware Neuronale Netzwerke - Lernen durch künstliche Evolution (?)

Multi-Objective Optimization Using Genetic Algorithms

An Evolutionary Algorithm for the Multi-objective Shortest Path Problem

Neural Network Regularization and Ensembling Using Multi-objective Evolutionary Algorithms

Evolutionary Multi-objective Optimization of Business Process Designs with Pre-processing

Solving Multi-objective Optimisation Problems Using the Potential Pareto Regions Evolutionary Algorithm

Image Classification and Processing using Modified Parallel-ACTIT

Classification Using Unstructured Rules and Ant Colony Optimization

Efficient Non-domination Level Update Approach for Steady-State Evolutionary Multiobjective Optimization

Dynamic Uniform Scaling for Multiobjective Genetic Algorithms

Multiobjective Optimization Using Adaptive Pareto Archived Evolution Strategy

Improving Generalization of Radial Basis Function Network with Adaptive Multi-Objective Particle Swarm Optimization

X/$ IEEE

Univariate and Multivariate Decision Trees

Construction of Minimum-Weight Spanners Mikkel Sigurd Martin Zachariasen

Using ɛ-dominance for Hidden and Degenerated Pareto-Fronts

Reference Point Based Evolutionary Approach for Workflow Grid Scheduling

Very Fast Non-Dominated Sorting

Assessing the Convergence Properties of NSGA-II for Direct Crashworthiness Optimization

Parallel Multi-objective Optimization using Master-Slave Model on Heterogeneous Resources

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Credit card Fraud Detection using Predictive Modeling: a Review

A Parallel Implementation of Multiobjective Particle Swarm Optimization Algorithm Based on Decomposition

arxiv: v1 [cs.ai] 12 Feb 2017

Evolutionary Multi-Objective Optimization of Trace Transform for Invariant Feature Extraction

Parallel Multi-objective Optimization using Master-Slave Model on Heterogeneous Resources

An Evolutionary Multi-Objective Crowding Algorithm (EMOCA): Benchmark Test Function Results

Improvement of Web Search Results using Genetic Algorithm on Word Sense Disambiguation

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Improving interpretability in approximative fuzzy models via multi-objective evolutionary algorithms.

Comparative Study on VQ with Simple GA and Ordain GA

Individualized Error Estimation for Classification and Regression Models

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL., NO., MONTH YEAR 1

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries

Bio-inspired cost-aware optimization for dataintensive

A Fast Approximation-Guided Evolutionary Multi-Objective Algorithm

The Journal of MacroTrends in Technology and Innovation

A Genetic Approach for Solving Minimum Routing Cost Spanning Tree Problem

Multi-Objective Memetic Algorithm using Pattern Search Filter Methods

THE NEW HYBRID COAW METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMS

Procesamiento Paralelo para Problemas Multiobjetivo en Entornos Dinámicos

Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction

Transcription:

A Multi-Objective Evolutionary Approach to Pareto Optimal Model Trees. A preliminary study Marcin Czajkowski and Marek Kretowski TPNC 2016 12-13.12.2016 Sendai, Japan Faculty of Computer Science Bialystok University of Technology email: m.czajkowski@pb.edu.pl

Bialystok University of Technology University from 1950 Over 15 000 students in 7 departments Our department: Faculty of Computer Science

The Blind Men and the Elephant The Blind Men and the Elephant by John Godfrey Saxe

Greedy Induction of Decision Trees

Evolutionary Induction of Decision Trees Application of EA allows a global induction of DT. We can search at the same time for: the best tree structure tests in internal nodes models in the leaves General framework of evolutionary algorithm

Evolutionary Induction of Decision Trees

Global Decision Tree System Global Decision Tree (GDT) system is a continuously developed framework for evolutionary induction of all kinds of decision trees, including: EA framework with specialized genetic operators and memetic extensions trees with various representations: univariate, oblique, mixed classification, regression and model trees cost-sensitive decision trees for real-life problems trees for large scale data parallelization of EA with master-slave, cellular and Island strategies MPI, Open-MP and GPGPU and hybrid parallelization approaches Current goal: extend multi-objective function of the GDT system present fitness functions in GDT system covers weight formula and lexicographic analysis discussed solution: Paret-based multi-objective optimization for GDT for model trees denoted as Global Model Trees (GMT)

Global Model Trees (GMT) complete steps 2004-2013 Multiple papers on Global Classification Trees 2014 Czajkowski M., Kretowski M.: Evolutionary Induction of Global Model Trees with Specialized Operators and Memetic Extensions, Information Sciences, Elsevier, vol. 288: 153-173 2015 Czajkowski M., Czerwonka M., Kretowski M.: Cost-sensitive global model trees applied in loan charge-off forecasting, Decision Support Systems, Elsevier, vol. 74: 57-66 2016 Czajkowski M., Kretowski M.: The Role of Decision Tree Representation in Regression Problems - an Evolutionary Perspective, Applied Soft Computing, Elsevier, vol. 48: 458-475 Jurczuk K., Czajkowski M., Kretowski M.: Evolutionary Induction of a Decision Tree for Large Scale Data. A GPU-based Approach, Soft Computing, Springer (in print) 2017+ study of multi-objective functions in globally induced decision trees comparison of different approaches and techniques for GDT parallelization

GMT multi-objective optimization strategies In context of model trees two objectives need to be considered: minimization of the prediction error calculated on the training set minimization of the tree size and complexity of the nodes Most popular multi-objective strategies: weight formula which transforms multi-objective problem into a single-objective one Lexicographic analysis: each pair of individuals is evaluated by analyzing, in order of priority, one of three measures: the residua sum of squares (RSS); number of nodes and attributes in multiple linear models in the leaves Pareto-dominance approach searches not for one best solution, but rather for a group of solutions is such a way, that selecting any one of them in place of another will always sacrifice quality for at least one objective, while improving it for at least one other.

GMT system with Pareto approach

Fitness calculation Efficient non-dominated sorting strategy (ENS) alternative to Non-dominated Sorting Genetic Algorithm II (NSGA-II) fast sorting algorithm for optimization problems with small number of objectives proposed in 2015 in IEEE Trans. on Evolutionary Computations In contrast to most existing non-dominated sorting methods, ENS determines the front one by one instead of using all solutions as a whole In contrast to regular NSGA-II approach (which maintains a population size set of non-dominated solutions) we store all non-dominated solutions investigated so far during the search. Although, the elitist set is quite large, we do not loose any possible non-dominated individual. We use updated NSGA-II crowding distance procedure which involve unique fitness calculation when two individuals share identical value

GMT system with Pareto approach

Selection mechanism Binary tournament selection is applied as a selection mechanism NGSA-II merge archive and current population into new one using binary tournament as a selection method Due to storing full list of non-dominated solutions in the archive, we have: Reserved room for P elitist solutions in the next population default half of the population size For both sets (archive and current population) the binary tournament is performed elitist solutions are scored with the crowding distance solutions from current populations are scored like in NSGA-II algorithm Both selected sets constitute new population

Experiments: datasets & settings Validation performed on three real-life publicly available datasets: Abalone (4177, 7, 1), Kinematics (8192, 8, 0) and Stock (950, 9, 0) Each dataset divided into training (66.6%) and testing (33.4%) set. Multi-objective optimization of Pareto GMT (denoted as pgmt): 3 objectives: prediction error measured with Root Mean Squared Error (RMSE), number of nodes and the number of attributes in regression models located in the leaves 2 objectives that consider RMSE and the tree comprehensibility Comparison analysis between pgmt and GMT with weight (wgmt) and lexicographic (lgmt) fitness popular greedy counterparts of GMT: REP Tree (RT) and state-of-the-art. Model tree called M5.

Experimental results Performance results for GMT with different fitness functions as well as popular greedy counterparts. Results for three solutions from the Pareto front (denoted as pgmt) are presented pgmt is capable of inducing significantly better (in context of prediction error or the tree size) trees

Pareto front for GMT with 2 objectives a) Abalone b) Kinematics c) Stock

Pareto front for GMT with 3 objectives a) Abalone b) Kinematics c) Stock

How many times EA found the solution How many times EA found the solution Open issues limiting the elitist front Visualization of the GMT Pareto front with 3 objectives: 3D visualization (as presented) with restriction to show only those solutions that EA found in at least 80 out of 100 runs 3 objectives 3 objectives transformed to 2 objectives 100 800 90 700 80 70 600 60 500 50 400 40 300 30 20 200 10 100 0 0 200 400 600 800 1000 Tree complexity (numer of nodes + attributes) 0 0 200 400 600 800 1000 Tree complexity (numer of nodes + attributes)

Open issues crowding function Adequate crowding distance calculation is crucial for finding interpretable and compact Pareto front. An example results for different concepts for crowding calculation: crowding disabled updated NGSA-II based on e.g.. weight fitness function incorporate weights of objectives into crowding distance calculation

Open issues archive population Impact and role of archive population that contains Pareto front should also analyzed in context of EA performance: size of the elitist archive store all non-dominated solutions are archived store only top solutions (usually a population size) are stored, selected by the crowding function selection of individuals from the archive with e.g.. random binary tournament percent of individuals from the archive that constitute new population

Conclusion Our proposition extends multi-objective framework of GMT to work with Pareto Optimal trees Performed study covers 2 and 3 objective optimization The traditional NGSA-II solution was specialized in order to exploit the full potential of evolutionary induction of decision trees, including: ENS sorting archive population updated crowding distance But again - why we do all of this? Globally induced trees outperform traditional greedy tree counterparts Presenting Pareto front allows the users to find the specified prediction models they were looking for

THANK YOU