7. Metalearning for Automated Workflow Design

Size: px

Start display at page:

Download "7. Metalearning for Automated Workflow Design"

Herbert Melton
6 years ago
Views:

1 AutoML at ECML PKDD 2017, Skopje. Automatic Selection, Configuration & Composition of ML Algorithms 7. Metalearning for Automated Workflow Design by. Pavel Brazdil, Frank Hutter, Holger Hoos, Joaquin Vanschoren 2 Acknowledgments Acknowledgements to the following researchers that worked with me on these topics: Salisu Abdulrahman Miguel Cachada P.Brazdil - ECML/PKDD Tutorial T3: Meta-learning and Algorithm Selection 1

2 3 Summary 1. Introduction (4-8) What are workflows? Providing Support for Workflow Design Workflows for Classification Tasks 2. Extending Metalearning Approaches to Workflows (9-10) 3. Extending the Average Ranking Method to Workflows (11-16) Gathering Performance Metadata Metalearning Approach Experiments & Results of alternative hyperparameter settings Comparison to Auto-WEKA 4. Challenges for Current & Future Research (17 22) Diversify the metadata (datasets, workflows) Devise methods to prune portfolios of workflows (off-line) Explore approaches that focus on useful alternatives on-line Extend comparisons to other systems 4 1. Introduction: What are Workflows? Workflow is a (partially) ordered sequence of operators or algorithms Workflow can be seen also as a plan to be executed. DM workflows have been incorporated into many DM systems: Weka, Knime, RapidMiner, SAS etc. Designing complex workflows manually is time consuming. The resulting workflow(s) can have suboptimal performance (accuracy, AUC, training time etc.) 2

3 5 1. Introduction: Providing Support for Workflow Design Consequently: Hence the users need support regards how to obtain good workflows! Some systems provide some support already. AutoWeka, RapidMiner, etc. The current systems require often relatively long time to come-up with good solutions Users want to obtain good recommendations fast Our aim is to describe the principles involved, so that better systems could be (re-)designed in future Introduction: Workflows for Classification Tasks Some previous studies focus on workflow recommendation for classification tasks Data extraction Model configuration Algorithm selection Selection Data transformation Pre- Cleansing processing Hyperparameters Model evaluation Model deployment Many focus on these phases 3

4 7 1. Introduction: Workflows for Classification Tasks Many different operations can be chosen at any step: Pre-processing operations (feature selection, discretization etc.), Classification algorithms (DT, NB, NN, SVM, knn..) Parameter settings for each, Ensembles (bagging, boosting etc.). People normally use ontologies of operators to specify all the constituents Introduction: Workflows for Classification Tasks Ontologies of operators can be described: in a graphical form, using grammars eg. ClassAlg --> DT NB NN etc. Expansion of a given ontology into workflows: Many systems use of hierarchical planner; Non-terminal nodes represent: tasks / methods / abstract operators (e.g. attribute selection) Terminal nodes represent: Simple (concrete) operators (e.g. CFS) The expansion can be represented as a hierarchical DAG (graph) (Hilario et al., 2011) 4

5 9 2. Extending Metalearning Approaches to Workflows Naïve approach: Generate all possible workflows for a new dataset - Exploit a given ontology of abstract/concrete operators Use meta-knowledge associated with past problems/datasets to: - Retrieve past workflows associated with similar problems; - Rank these workflows according to the expected performance; Carry out tests to identify the best workflow; Extending Metalearning Approaches to Workflows Naïve approach is not practical: The number of possible workflows is normally too large; Performance meta-knowledge concerning different workflows may not be available. Some solutions: Expand preferably only the most promising nodes / branches, with the help of meta-knowledge in the form of: Association rules (Kietz et al., 2012) Conditional probabilities, Collaborative filtering (Misir & Sebag, 2013) 5

6 11 3. Extending Average Ranking Method to Workflows This work was done in collaboration with: Miguel V. Cachada M.Sc. Student awaiting defense soon Salisu M. Abdulrahman Completed PhD in May at LIAAD Inesc Tec / Univ. of Porto Works now at Univ. of Kana, Nigeria Pavel Brazdil LIAAD Inesc Tec / Univ. of Porto Gathering Performance Metadata Build a collection of performance results obtained from training datasets: Workflow configuration Training datasets Performance metadata Accuracy, Runtime Our aim is to identify workflows with good performance, while minimising the runtime. The metric A3R = provides a good solution. 6

7 Metalearning Approach We use a very simple metalearning approach A3R-based Average Ranking (AR*) AR* uses the optimized setting for parameter P in Runtime AR* generates a ranked list of workflows, based on the A3R measure. How far can this simple approach go? Experiments Performance metadata: 184 workflows, run on 37 datasets. Portfolios of workflows: 62 classification algorithms from WEKA with default configurations (AR*+A) 62 variants: Combinations of CFS + algorithms (AR*+FS+A) 30 variants: Hyperparameter configuration of some alg s (AR*+Hyp+A) 30 variants: CFS + Hyperparameter config. of some alg s (AR*+FS+Hyp+A) Evaluation using leave-one-out: 36 datasets are used to propose a ranking of workflows for the dataset left out. The ranking is followed to identify the best workflow and calculate the loss. The loss curves are aggregated into a mean loss curves. 7

8 Results of alternative hyperparameter settings Both AR* ± FS+Hyp+A and AR* +Hyp+A achieved good results. It is important to consider alternative hyperparameter settings! Comparison to Auto-WEKA Auto-WEKA (AW) was given varied time budgets. AW total runtime resulted from adding the search runtime to the recommended model runtime. Accuracy from AR* ± FS+Hyp+A (AR) was obtained by following the ranking up to a cumulative runtime equal to total runtime of AW. Number of datasets Budget (min) Win Loss Ties Win means that AR > AW in terms of accuracy AR wins or competes well with Auto-WEKA, especially for smaller time budgets. 8

9 17 4. Challenges for Current / Future Research 1. Diversify the metadata (datasets): Be prepared for new challenges! 2. Diversify the metadata (workflows): Include top performers (configurations, combinations etc.) 3. Devise methods to prune portfolios of workflows (off-line) 4. Explore approaches that focus on useful alternatives on-line Active testing, SMAC 5. Extend comparisons to other systems (e.g. auto-sklearn, GA-based approaches) Diversify the metadata (datasets): Include diverse datasets to train the meta-level system: Unbalanced data Many-class problems Multi-label problems Problems with missing data Etc. Be prepared for new challenges! 9

10 Diversify the metadata (workflows): Include top performers (configurations, combinations etc.) Similar to strategies used by football coaches (ex. Mourinho MU): Search for good players to strengthen the team This could be done by: Searching literature (i.e. which ranges of hyperparemeter settings are useful, which settings were used etc.) Searching repositories like OpenML etc Devise methods to prune portfolios of workflows Two distinct goals: Eliminate sub-standard workflows Eliminate redundant workflows In general one could use: Filter-like approaches Closed-loop approaches (too costly!) Backward elimination / forward selection (expensive!) One early work that uses a filter-like approach oriented towards the accuracy-based AR: P Brazdil, C Soares, R Pereira: Reducing rankings of classifiers by eliminating redundant classifiers, Progress in Artificial Intelligence, 14-21, 2001 Currently we are working on a solution oriented towards AR* (combined measure of accuracy and runtime). 10

11 Explore approaches that focus on useful alternatives We could explore: 1. Active testing Good for selecting discrete options 2. Regression models Good for modeling the effects of hyperparameter settings and suggesting good settings on target dataset RFs in SMAC, Surrogate versions Gaussian processes, etc. 3. Combination of 1 and Extend comparisons to other systems Extend comparisons to: auto-sklearn, GA-based approaches Etc. 11

2. Blackbox hyperparameter optimization and AutoML

AutoML 2017. Automatic Selection, Configuration & Composition of ML Algorithms. at ECML PKDD 2017, Skopje. 2. Blackbox hyperparameter optimization and AutoML Pavel Brazdil, Frank Hutter, Holger Hoos, Joaquin