A Multi-Objective Evolutionary Approach to Pareto Optimal Model Trees. A preliminary study Marcin Czajkowski and Marek Kretowski TPNC 2016 12-13.12.2016 Sendai, Japan Faculty of Computer Science Bialystok University of Technology email: m.czajkowski@pb.edu.pl
Bialystok University of Technology University from 1950 Over 15 000 students in 7 departments Our department: Faculty of Computer Science
The Blind Men and the Elephant The Blind Men and the Elephant by John Godfrey Saxe
Greedy Induction of Decision Trees
Evolutionary Induction of Decision Trees Application of EA allows a global induction of DT. We can search at the same time for: the best tree structure tests in internal nodes models in the leaves General framework of evolutionary algorithm
Evolutionary Induction of Decision Trees
Global Decision Tree System Global Decision Tree (GDT) system is a continuously developed framework for evolutionary induction of all kinds of decision trees, including: EA framework with specialized genetic operators and memetic extensions trees with various representations: univariate, oblique, mixed classification, regression and model trees cost-sensitive decision trees for real-life problems trees for large scale data parallelization of EA with master-slave, cellular and Island strategies MPI, Open-MP and GPGPU and hybrid parallelization approaches Current goal: extend multi-objective function of the GDT system present fitness functions in GDT system covers weight formula and lexicographic analysis discussed solution: Paret-based multi-objective optimization for GDT for model trees denoted as Global Model Trees (GMT)
Global Model Trees (GMT) complete steps 2004-2013 Multiple papers on Global Classification Trees 2014 Czajkowski M., Kretowski M.: Evolutionary Induction of Global Model Trees with Specialized Operators and Memetic Extensions, Information Sciences, Elsevier, vol. 288: 153-173 2015 Czajkowski M., Czerwonka M., Kretowski M.: Cost-sensitive global model trees applied in loan charge-off forecasting, Decision Support Systems, Elsevier, vol. 74: 57-66 2016 Czajkowski M., Kretowski M.: The Role of Decision Tree Representation in Regression Problems - an Evolutionary Perspective, Applied Soft Computing, Elsevier, vol. 48: 458-475 Jurczuk K., Czajkowski M., Kretowski M.: Evolutionary Induction of a Decision Tree for Large Scale Data. A GPU-based Approach, Soft Computing, Springer (in print) 2017+ study of multi-objective functions in globally induced decision trees comparison of different approaches and techniques for GDT parallelization
GMT multi-objective optimization strategies In context of model trees two objectives need to be considered: minimization of the prediction error calculated on the training set minimization of the tree size and complexity of the nodes Most popular multi-objective strategies: weight formula which transforms multi-objective problem into a single-objective one Lexicographic analysis: each pair of individuals is evaluated by analyzing, in order of priority, one of three measures: the residua sum of squares (RSS); number of nodes and attributes in multiple linear models in the leaves Pareto-dominance approach searches not for one best solution, but rather for a group of solutions is such a way, that selecting any one of them in place of another will always sacrifice quality for at least one objective, while improving it for at least one other.
GMT system with Pareto approach
Fitness calculation Efficient non-dominated sorting strategy (ENS) alternative to Non-dominated Sorting Genetic Algorithm II (NSGA-II) fast sorting algorithm for optimization problems with small number of objectives proposed in 2015 in IEEE Trans. on Evolutionary Computations In contrast to most existing non-dominated sorting methods, ENS determines the front one by one instead of using all solutions as a whole In contrast to regular NSGA-II approach (which maintains a population size set of non-dominated solutions) we store all non-dominated solutions investigated so far during the search. Although, the elitist set is quite large, we do not loose any possible non-dominated individual. We use updated NSGA-II crowding distance procedure which involve unique fitness calculation when two individuals share identical value
GMT system with Pareto approach
Selection mechanism Binary tournament selection is applied as a selection mechanism NGSA-II merge archive and current population into new one using binary tournament as a selection method Due to storing full list of non-dominated solutions in the archive, we have: Reserved room for P elitist solutions in the next population default half of the population size For both sets (archive and current population) the binary tournament is performed elitist solutions are scored with the crowding distance solutions from current populations are scored like in NSGA-II algorithm Both selected sets constitute new population
Experiments: datasets & settings Validation performed on three real-life publicly available datasets: Abalone (4177, 7, 1), Kinematics (8192, 8, 0) and Stock (950, 9, 0) Each dataset divided into training (66.6%) and testing (33.4%) set. Multi-objective optimization of Pareto GMT (denoted as pgmt): 3 objectives: prediction error measured with Root Mean Squared Error (RMSE), number of nodes and the number of attributes in regression models located in the leaves 2 objectives that consider RMSE and the tree comprehensibility Comparison analysis between pgmt and GMT with weight (wgmt) and lexicographic (lgmt) fitness popular greedy counterparts of GMT: REP Tree (RT) and state-of-the-art. Model tree called M5.
Experimental results Performance results for GMT with different fitness functions as well as popular greedy counterparts. Results for three solutions from the Pareto front (denoted as pgmt) are presented pgmt is capable of inducing significantly better (in context of prediction error or the tree size) trees
Pareto front for GMT with 2 objectives a) Abalone b) Kinematics c) Stock
Pareto front for GMT with 3 objectives a) Abalone b) Kinematics c) Stock
How many times EA found the solution How many times EA found the solution Open issues limiting the elitist front Visualization of the GMT Pareto front with 3 objectives: 3D visualization (as presented) with restriction to show only those solutions that EA found in at least 80 out of 100 runs 3 objectives 3 objectives transformed to 2 objectives 100 800 90 700 80 70 600 60 500 50 400 40 300 30 20 200 10 100 0 0 200 400 600 800 1000 Tree complexity (numer of nodes + attributes) 0 0 200 400 600 800 1000 Tree complexity (numer of nodes + attributes)
Open issues crowding function Adequate crowding distance calculation is crucial for finding interpretable and compact Pareto front. An example results for different concepts for crowding calculation: crowding disabled updated NGSA-II based on e.g.. weight fitness function incorporate weights of objectives into crowding distance calculation
Open issues archive population Impact and role of archive population that contains Pareto front should also analyzed in context of EA performance: size of the elitist archive store all non-dominated solutions are archived store only top solutions (usually a population size) are stored, selected by the crowding function selection of individuals from the archive with e.g.. random binary tournament percent of individuals from the archive that constitute new population
Conclusion Our proposition extends multi-objective framework of GMT to work with Pareto Optimal trees Performed study covers 2 and 3 objective optimization The traditional NGSA-II solution was specialized in order to exploit the full potential of evolutionary induction of decision trees, including: ENS sorting archive population updated crowding distance But again - why we do all of this? Globally induced trees outperform traditional greedy tree counterparts Presenting Pareto front allows the users to find the specified prediction models they were looking for
THANK YOU