7. Metalearning for Automated Workflow Design
|
|
- Herbert Melton
- 6 years ago
- Views:
Transcription
1 AutoML at ECML PKDD 2017, Skopje. Automatic Selection, Configuration & Composition of ML Algorithms 7. Metalearning for Automated Workflow Design by. Pavel Brazdil, Frank Hutter, Holger Hoos, Joaquin Vanschoren 2 Acknowledgments Acknowledgements to the following researchers that worked with me on these topics: Salisu Abdulrahman Miguel Cachada P.Brazdil - ECML/PKDD Tutorial T3: Meta-learning and Algorithm Selection 1
2 3 Summary 1. Introduction (4-8) What are workflows? Providing Support for Workflow Design Workflows for Classification Tasks 2. Extending Metalearning Approaches to Workflows (9-10) 3. Extending the Average Ranking Method to Workflows (11-16) Gathering Performance Metadata Metalearning Approach Experiments & Results of alternative hyperparameter settings Comparison to Auto-WEKA 4. Challenges for Current & Future Research (17 22) Diversify the metadata (datasets, workflows) Devise methods to prune portfolios of workflows (off-line) Explore approaches that focus on useful alternatives on-line Extend comparisons to other systems 4 1. Introduction: What are Workflows? Workflow is a (partially) ordered sequence of operators or algorithms Workflow can be seen also as a plan to be executed. DM workflows have been incorporated into many DM systems: Weka, Knime, RapidMiner, SAS etc. Designing complex workflows manually is time consuming. The resulting workflow(s) can have suboptimal performance (accuracy, AUC, training time etc.) 2
3 5 1. Introduction: Providing Support for Workflow Design Consequently: Hence the users need support regards how to obtain good workflows! Some systems provide some support already. AutoWeka, RapidMiner, etc. The current systems require often relatively long time to come-up with good solutions Users want to obtain good recommendations fast Our aim is to describe the principles involved, so that better systems could be (re-)designed in future Introduction: Workflows for Classification Tasks Some previous studies focus on workflow recommendation for classification tasks Data extraction Model configuration Algorithm selection Selection Data transformation Pre- Cleansing processing Hyperparameters Model evaluation Model deployment Many focus on these phases 3
4 7 1. Introduction: Workflows for Classification Tasks Many different operations can be chosen at any step: Pre-processing operations (feature selection, discretization etc.), Classification algorithms (DT, NB, NN, SVM, knn..) Parameter settings for each, Ensembles (bagging, boosting etc.). People normally use ontologies of operators to specify all the constituents Introduction: Workflows for Classification Tasks Ontologies of operators can be described: in a graphical form, using grammars eg. ClassAlg --> DT NB NN etc. Expansion of a given ontology into workflows: Many systems use of hierarchical planner; Non-terminal nodes represent: tasks / methods / abstract operators (e.g. attribute selection) Terminal nodes represent: Simple (concrete) operators (e.g. CFS) The expansion can be represented as a hierarchical DAG (graph) (Hilario et al., 2011) 4
5 9 2. Extending Metalearning Approaches to Workflows Naïve approach: Generate all possible workflows for a new dataset - Exploit a given ontology of abstract/concrete operators Use meta-knowledge associated with past problems/datasets to: - Retrieve past workflows associated with similar problems; - Rank these workflows according to the expected performance; Carry out tests to identify the best workflow; Extending Metalearning Approaches to Workflows Naïve approach is not practical: The number of possible workflows is normally too large; Performance meta-knowledge concerning different workflows may not be available. Some solutions: Expand preferably only the most promising nodes / branches, with the help of meta-knowledge in the form of: Association rules (Kietz et al., 2012) Conditional probabilities, Collaborative filtering (Misir & Sebag, 2013) 5
6 11 3. Extending Average Ranking Method to Workflows This work was done in collaboration with: Miguel V. Cachada M.Sc. Student awaiting defense soon Salisu M. Abdulrahman Completed PhD in May at LIAAD Inesc Tec / Univ. of Porto Works now at Univ. of Kana, Nigeria Pavel Brazdil LIAAD Inesc Tec / Univ. of Porto Gathering Performance Metadata Build a collection of performance results obtained from training datasets: Workflow configuration Training datasets Performance metadata Accuracy, Runtime Our aim is to identify workflows with good performance, while minimising the runtime. The metric A3R = provides a good solution. 6
7 Metalearning Approach We use a very simple metalearning approach A3R-based Average Ranking (AR*) AR* uses the optimized setting for parameter P in Runtime AR* generates a ranked list of workflows, based on the A3R measure. How far can this simple approach go? Experiments Performance metadata: 184 workflows, run on 37 datasets. Portfolios of workflows: 62 classification algorithms from WEKA with default configurations (AR*+A) 62 variants: Combinations of CFS + algorithms (AR*+FS+A) 30 variants: Hyperparameter configuration of some alg s (AR*+Hyp+A) 30 variants: CFS + Hyperparameter config. of some alg s (AR*+FS+Hyp+A) Evaluation using leave-one-out: 36 datasets are used to propose a ranking of workflows for the dataset left out. The ranking is followed to identify the best workflow and calculate the loss. The loss curves are aggregated into a mean loss curves. 7
8 Results of alternative hyperparameter settings Both AR* ± FS+Hyp+A and AR* +Hyp+A achieved good results. It is important to consider alternative hyperparameter settings! Comparison to Auto-WEKA Auto-WEKA (AW) was given varied time budgets. AW total runtime resulted from adding the search runtime to the recommended model runtime. Accuracy from AR* ± FS+Hyp+A (AR) was obtained by following the ranking up to a cumulative runtime equal to total runtime of AW. Number of datasets Budget (min) Win Loss Ties Win means that AR > AW in terms of accuracy AR wins or competes well with Auto-WEKA, especially for smaller time budgets. 8
9 17 4. Challenges for Current / Future Research 1. Diversify the metadata (datasets): Be prepared for new challenges! 2. Diversify the metadata (workflows): Include top performers (configurations, combinations etc.) 3. Devise methods to prune portfolios of workflows (off-line) 4. Explore approaches that focus on useful alternatives on-line Active testing, SMAC 5. Extend comparisons to other systems (e.g. auto-sklearn, GA-based approaches) Diversify the metadata (datasets): Include diverse datasets to train the meta-level system: Unbalanced data Many-class problems Multi-label problems Problems with missing data Etc. Be prepared for new challenges! 9
10 Diversify the metadata (workflows): Include top performers (configurations, combinations etc.) Similar to strategies used by football coaches (ex. Mourinho MU): Search for good players to strengthen the team This could be done by: Searching literature (i.e. which ranges of hyperparemeter settings are useful, which settings were used etc.) Searching repositories like OpenML etc Devise methods to prune portfolios of workflows Two distinct goals: Eliminate sub-standard workflows Eliminate redundant workflows In general one could use: Filter-like approaches Closed-loop approaches (too costly!) Backward elimination / forward selection (expensive!) One early work that uses a filter-like approach oriented towards the accuracy-based AR: P Brazdil, C Soares, R Pereira: Reducing rankings of classifiers by eliminating redundant classifiers, Progress in Artificial Intelligence, 14-21, 2001 Currently we are working on a solution oriented towards AR* (combined measure of accuracy and runtime). 10
11 Explore approaches that focus on useful alternatives We could explore: 1. Active testing Good for selecting discrete options 2. Regression models Good for modeling the effects of hyperparameter settings and suggesting good settings on target dataset RFs in SMAC, Surrogate versions Gaussian processes, etc. 3. Combination of 1 and Extend comparisons to other systems Extend comparisons to: auto-sklearn, GA-based approaches Etc. 11
2. Blackbox hyperparameter optimization and AutoML
AutoML 2017. Automatic Selection, Configuration & Composition of ML Algorithms. at ECML PKDD 2017, Skopje. 2. Blackbox hyperparameter optimization and AutoML Pavel Brazdil, Frank Hutter, Holger Hoos, Joaquin
More informationarxiv: v1 [cs.lg] 23 Oct 2018
ICML 2018 AutoML Workshop for Machine Learning Pipelines arxiv:1810.09942v1 [cs.lg] 23 Oct 2018 Brandon Schoenfeld Christophe Giraud-Carrier Mason Poggemann Jarom Christensen Kevin Seppi Department of
More informationSpeeding up Algorithm Selection using Average Ranking and Active Testing by Introducing Runtime
Noname manuscript No. (will be inserted by the editor) Speeding up Algorithm Selection using Average Ranking and Active Testing by Introducing Runtime Salisu Mamman Abdulrahman Pavel Brazdil Jan N. van
More informationAutomated Data Pre-processing via Meta-learning
Automated Data Pre-processing via Meta-learning Besim Bilalli 1, Alberto Abelló 1, Tomàs Aluja-Banet 1, and Robert Wrembel 2 1 Universitat Politécnica de Catalunya, Barcelona, Spain {bbilalli,aabello}@essi.upc.edu
More informationP4ML: A Phased Performance-Based Pipeline Planner for Automated Machine Learning
Proceedings of Machine Learning Research 1:1 8, 2018 ICML2018AutoMLWorkshop P4ML: A Phased Performance-Based Pipeline Planner for Automated Machine Learning Yolanda Gil, Ke-Thia Yao, Varun Ratnakar, Daniel
More informationSpeeding up algorithm selection using average ranking and active testing by introducing runtime
Mach Learn (2018) 107:79 108 https://doi.org/10.1007/s10994-017-5687-8 Speeding up algorithm selection using average ranking and active testing by introducing runtime Salisu Mamman Abdulrahman 1,2 Pavel
More informationOverview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8
Tutorial 3 1 / 8 Overview Non-Parametrics Models Definitions KNN Ensemble Methods Definitions, Examples Random Forests Clustering Definitions, Examples k-means Clustering 2 / 8 Non-Parametrics Models Definitions
More informationBusiness Club. Decision Trees
Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building
More informationTutorial Case studies
1 Topic Wrapper for feature subset selection Continuation. This tutorial is the continuation of the preceding one about the wrapper feature selection in the supervised learning context (http://data-mining-tutorials.blogspot.com/2010/03/wrapper-forfeature-selection.html).
More informationAn Empirical Study of Hyperparameter Importance Across Datasets
An Empirical Study of Hyperparameter Importance Across Datasets Jan N. van Rijn and Frank Hutter University of Freiburg, Germany {vanrijn,fh}@cs.uni-freiburg.de Abstract. With the advent of automated machine
More informationA Soft-Computing Approach to Knowledge Flow Synthesis and Optimization
A Soft-Computing Approach to Knowledge Flow Synthesis and Optimization Tomáš Řehořek Pavel Kordík Computational Intelligence Group (CIG), Faculty of Information Technology (FIT), Czech Technical University
More informationAI-Augmented Algorithms
AI-Augmented Algorithms How I Learned to Stop Worrying and Love Choice Lars Kotthoff University of Wyoming larsko@uwyo.edu Warsaw, 17 April 2019 Outline Big Picture Motivation Choosing Algorithms Tuning
More informationUsing Meta-learning to Classify Traveling Salesman Problems
2010 Eleventh Brazilian Symposium on Neural Networks Using Meta-learning to Classify Traveling Salesman Problems Jorge Kanda, Andre Carvalho, Eduardo Hruschka and Carlos Soares Instituto de Ciencias Matematicas
More informationOverview on Automatic Tuning of Hyperparameters
Overview on Automatic Tuning of Hyperparameters Alexander Fonarev http://newo.su 20.02.2016 Outline Introduction to the problem and examples Introduction to Bayesian optimization Overview of surrogate
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More informationActivMetaL: Algorithm Recommendation with Active Meta Learning
ActivMetaL: Algorithm Recommendation with Active Meta Learning Lisheng Sun-Hosoya 1, Isabelle Guyon 1,2, and Michèle Sebag 1 1 UPSud/CNRS/INRIA, Univ. Paris-Saclay. 2 ChaLearn Abstract. We present an active
More informationTask Management in Advanced Computational Intelligence System
Task Management in Advanced Computational Intelligence System Krzysztof Grąbczewski and Norbert Jankowski Department of Informatics, Nicolaus Copernicus University, Toruń, Poland {kg norbert}@is.umk.pl,
More informationAuto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms
Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms Chris Thornton Frank Hutter Holger H. Hoos Kevin Leyton-Brown Department of Computer Science, University of British
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationAutomated Selection and Configuration of Multi-Label Classification Algorithms with. Grammar-based Genetic Programming.
Automated Selection and Configuration of Multi-Label Classification Algorithms with Grammar-based Genetic Programming Alex G. C. de Sá 1, Alex A. Freitas 2, and Gisele L. Pappa 1 1 Computer Science Department,
More informationBatch-Incremental vs. Instance-Incremental Learning in Dynamic and Evolving Data
Batch-Incremental vs. Instance-Incremental Learning in Dynamic and Evolving Data Jesse Read 1, Albert Bifet 2, Bernhard Pfahringer 2, Geoff Holmes 2 1 Department of Signal Theory and Communications Universidad
More informationEffect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction
International Journal of Computer Trends and Technology (IJCTT) volume 7 number 3 Jan 2014 Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction A. Shanthini 1,
More informationEfficient Hyper-parameter Optimization for NLP Applications
Efficient Hyper-parameter Optimization for NLP Applications Lidan Wang 1, Minwei Feng 1, Bowen Zhou 1, Bing Xiang 1, Sridhar Mahadevan 2,1 1 IBM Watson, T. J. Watson Research Center, NY 2 College of Information
More informationAn introduction to random forests
An introduction to random forests Eric Debreuve / Team Morpheme Institutions: University Nice Sophia Antipolis / CNRS / Inria Labs: I3S / Inria CRI SA-M / ibv Outline Machine learning Decision tree Random
More informationEfficient Multi-label Classification
Efficient Multi-label Classification Jesse Read (Supervisors: Bernhard Pfahringer, Geoff Holmes) November 2009 Outline 1 Introduction 2 Pruned Sets (PS) 3 Classifier Chains (CC) 4 Related Work 5 Experiments
More informationNaïve Bayes, Gaussian Distributions, Practical Applications
Naïve Bayes, Gaussian Distributions, Practical Applications Required reading: Mitchell draft chapter, sections 1 and 2. (available on class website) Machine Learning 10-601 Tom M. Mitchell Machine Learning
More informationIEE 520 Data Mining. Project Report. Shilpa Madhavan Shinde
IEE 520 Data Mining Project Report Shilpa Madhavan Shinde Contents I. Dataset Description... 3 II. Data Classification... 3 III. Class Imbalance... 5 IV. Classification after Sampling... 5 V. Final Model...
More informationUser Guide Written By Yasser EL-Manzalawy
User Guide Written By Yasser EL-Manzalawy 1 Copyright Gennotate development team Introduction As large amounts of genome sequence data are becoming available nowadays, the development of reliable and efficient
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationCommunity edition(open-source) Enterprise edition
Suseela Bhaskaruni Rapid Miner is an environment for machine learning and data mining experiments. Widely used for both research and real-world data mining tasks. Software versions: Community edition(open-source)
More informationMIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA
Exploratory Machine Learning studies for disruption prediction on DIII-D by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Presented at the 2 nd IAEA Technical Meeting on
More informationAuto-WEKA: Combined Selection and Hyperparameter Optimization of Supervised Machine Learning Algorithms
Auto-WEKA: Combined Selection and Hyperparameter Optimization of Supervised Machine Learning Algorithms by Chris Thornton B.Sc, University of Calgary, 2011 a thesis submitted in partial fulfillment of
More informationRepresentation of Documents and Infomation Retrieval
Representation of s and Infomation Retrieval Pavel Brazdil LIAAD INESC Porto LA FEP, Univ. of Porto http://www.liaad.up.pt Escola de verão Aspectos de processamento da LN F. Letras, UP, th June 9 Overview.
More informationBAYESIAN GLOBAL OPTIMIZATION
BAYESIAN GLOBAL OPTIMIZATION Using Optimal Learning to Tune Deep Learning Pipelines Scott Clark scott@sigopt.com OUTLINE 1. Why is Tuning AI Models Hard? 2. Comparison of Tuning Methods 3. Bayesian Global
More informationFast Downward Cedalion
Fast Downward Cedalion Jendrik Seipp and Silvan Sievers Universität Basel Basel, Switzerland {jendrik.seipp,silvan.sievers}@unibas.ch Frank Hutter Universität Freiburg Freiburg, Germany fh@informatik.uni-freiburg.de
More informationTopics In Feature Selection
Topics In Feature Selection CSI 5388 Theme Presentation Joe Burpee 2005/2/16 Feature Selection (FS) aka Attribute Selection Witten and Frank book Section 7.1 Liu site http://athena.csee.umbc.edu/idm02/
More informationMulti-objective Optimization and Meta-learning for SVM Parameter Selection
Multi-objective Optimization and Meta-learning for SVM Parameter Selection Péricles B. C. Miranda Ricardo B. C. Prudêncio Andre Carlos P. L. F. de Carvalho Carlos Soares Federal University of PernambucoFederal
More informationMachine Learning for Constraint Solving
Machine Learning for Constraint Solving Alejandro Arbelaez, Youssef Hamadi, Michèle Sebag TAO, Univ. Paris-Sud Dagstuhl May 17th, 2011 Position of the problem Algorithms, the vision Software editor vision
More informationSemi-supervised learning and active learning
Semi-supervised learning and active learning Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Combining classifiers Ensemble learning: a machine learning paradigm where multiple learners
More informationLandmarking for Meta-Learning using RapidMiner
Landmarking for Meta-Learning using RapidMiner Sarah Daniel Abdelmessih 1, Faisal Shafait 2, Matthias Reif 2, and Markus Goldstein 2 1 Department of Computer Science German University in Cairo, Egypt 2
More informationTemplates. for scalable data analysis. 2 Synchronous Templates. Amr Ahmed, Alexander J Smola, Markus Weimer. Yahoo! Research & UC Berkeley & ANU
Templates for scalable data analysis 2 Synchronous Templates Amr Ahmed, Alexander J Smola, Markus Weimer Yahoo! Research & UC Berkeley & ANU Running Example Inbox Spam Running Example Inbox Spam Spam Filter
More informationContents. ACE Presentation. Comparison with existing frameworks. Technical aspects. ACE 2.0 and future work. 24 October 2009 ACE 2
ACE Contents ACE Presentation Comparison with existing frameworks Technical aspects ACE 2.0 and future work 24 October 2009 ACE 2 ACE Presentation 24 October 2009 ACE 3 ACE Presentation Framework for using
More informationTowards efficient Bayesian Optimization for Big Data
Towards efficient Bayesian Optimization for Big Data Aaron Klein 1 Simon Bartels Stefan Falkner 1 Philipp Hennig Frank Hutter 1 1 Department of Computer Science University of Freiburg, Germany {kleinaa,sfalkner,fh}@cs.uni-freiburg.de
More informationThe experiment database for machine learning (Demo)
The experiment base for machine learning (Demo) Joaquin Vanschoren 1 Abstract. We demonstrate the use of the experiment base for machine learning, a community-based platform for the sharing, reuse, and
More informationTutorial on Machine Learning Tools
Tutorial on Machine Learning Tools Yanbing Xue Milos Hauskrecht Why do we need these tools? Widely deployed classical models No need to code from scratch Easy-to-use GUI Outline Matlab Apps Weka 3 UI TensorFlow
More informationSCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER
SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER P.Radhabai Mrs.M.Priya Packialatha Dr.G.Geetha PG Student Assistant Professor Professor Dept of Computer Science and Engg Dept
More informationCSC411/2515 Tutorial: K-NN and Decision Tree
CSC411/2515 Tutorial: K-NN and Decision Tree Mengye Ren csc{411,2515}ta@cs.toronto.edu September 25, 2016 Cross-validation K-nearest-neighbours Decision Trees Review: Motivation for Validation Framework:
More informationLearning to Rank. Tie-Yan Liu. Microsoft Research Asia CCIR 2011, Jinan,
Learning to Rank Tie-Yan Liu Microsoft Research Asia CCIR 2011, Jinan, 2011.10 History of Web Search Search engines powered by link analysis Traditional text retrieval engines 2011/10/22 Tie-Yan Liu @
More informationData Engineering. Data preprocessing and transformation
Data Engineering Data preprocessing and transformation Just apply a learner? NO! Algorithms are biased No free lunch theorem: considering all possible data distributions, no algorithm is better than another
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More informationClassification Algorithms in Data Mining
August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms
More informationAvailable online at ScienceDirect. Procedia Computer Science 35 (2014 )
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 35 (2014 ) 388 396 18 th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems
More informationA study of classification algorithms using Rapidminer
Volume 119 No. 12 2018, 15977-15988 ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu A study of classification algorithms using Rapidminer Dr.J.Arunadevi 1, S.Ramya 2, M.Ramesh Raja
More informationCS 229 Project Report:
CS 229 Project Report: Machine learning to deliver blood more reliably: The Iron Man(drone) of Rwanda. Parikshit Deshpande (parikshd) [SU ID: 06122663] and Abhishek Akkur (abhakk01) [SU ID: 06325002] (CS
More informationCOSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor
COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality
More informationData Mining in Bioinformatics Day 1: Classification
Data Mining in Bioinformatics Day 1: Classification Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institute Tübingen and Eberhard Karls
More informationClassification using Weka (Brain, Computation, and Neural Learning)
LOGO Classification using Weka (Brain, Computation, and Neural Learning) Jung-Woo Ha Agenda Classification General Concept Terminology Introduction to Weka Classification practice with Weka Problems: Pima
More informationComparative Study on Classification Meta Algorithms
Comparative Study on Classification Meta Algorithms Dr. S. Vijayarani 1 Mrs. M. Muthulakshmi 2 Assistant Professor, Department of Computer Science, School of Computer Science and Engineering, Bharathiar
More informationAI-Augmented Algorithms
AI-Augmented Algorithms How I Learned to Stop Worrying and Love Choice Lars Kotthoff University of Wyoming larsko@uwyo.edu Boulder, 16 January 2019 Outline Big Picture Motivation Choosing Algorithms Tuning
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal
More informationSUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018
SUPERVISED LEARNING METHODS Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018 2 CHOICE OF ML You cannot know which algorithm will work
More informationSelection of algorithms to solve traveling salesman problems using meta-learning 1
Selection of algorithms to solve traveling salesman problems using meta-learning 1 Jorge KANDA a,b,2, Andre CARVALHO a,c, Eduardo HRUSCHKA a and Carlos SOARES d a Instituto de Ciencias Matematicas e de
More informationIMPLEMENTATION OF CLASSIFICATION ALGORITHMS USING WEKA NAÏVE BAYES CLASSIFIER
IMPLEMENTATION OF CLASSIFICATION ALGORITHMS USING WEKA NAÏVE BAYES CLASSIFIER N. Suresh Kumar, Dr. M. Thangamani 1 Assistant Professor, Sri Ramakrishna Engineering College, Coimbatore, India 2 Assistant
More informationComputing a Gain Chart. Comparing the computation time of data mining tools on a large dataset under Linux.
1 Introduction Computing a Gain Chart. Comparing the computation time of data mining tools on a large dataset under Linux. The gain chart is an alternative to confusion matrix for the evaluation of a classifier.
More informationarxiv: v1 [stat.ml] 23 Nov 2018
Learning Multiple Defaults for Machine Learning Algorithms Florian Pfisterer and Jan N. van Rijn and Philipp Probst and Andreas Müller and Bernd Bischl Ludwig Maximilian University of Munich, Germany Columbia
More informationSubject. Dataset. Copy paste feature of the diagram. Importing the dataset. Copy paste feature into the diagram.
Subject Copy paste feature into the diagram. When we define the data analysis process into Tanagra, it is possible to copy components (or entire branches of components) towards another location into the
More informationCarelyn Campbell, Ben Blaiszik, Laura Bartolo. November 1, 2016
Carelyn Campbell, Ben Blaiszik, Laura Bartolo November 1, 2016 Data Landscape Collaboration Tools (e.g. Google Drive, DropBox, Sharepoint, Github, MatIN) Data Sharing Communities (e.g. Dryad, FigShare,
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationLarge-scale visual recognition The bag-of-words representation
Large-scale visual recognition The bag-of-words representation Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline Bag-of-words Large or small vocabularies? Extensions for instance-level
More informationEfficient Voting Prediction for Pairwise Multilabel Classification
Efficient Voting Prediction for Pairwise Multilabel Classification Eneldo Loza Mencía, Sang-Hyeun Park and Johannes Fürnkranz TU-Darmstadt - Knowledge Engineering Group Hochschulstr. 10 - Darmstadt - Germany
More informationPrinciples of Machine Learning
Principles of Machine Learning Lab 3 Improving Machine Learning Models Overview In this lab you will explore techniques for improving and evaluating the performance of machine learning models. You will
More informationAn Empirical Study of Lazy Multilabel Classification Algorithms
An Empirical Study of Lazy Multilabel Classification Algorithms E. Spyromitros and G. Tsoumakas and I. Vlahavas Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
More informationMonte Carlo Tree Search: From Playing Go to Feature Selection
Monte Carlo Tree Search: From Playing Go to Feature Selection Michèle Sebag joint work: Olivier Teytaud, Sylvain Gelly, Philippe Rolet, Romaric Gaudel TAO, Univ. Paris-Sud Planning to Learn, ECAI 2010,
More informationAn Empirical Study on Lazy Multilabel Classification Algorithms
An Empirical Study on Lazy Multilabel Classification Algorithms Eleftherios Spyromitros, Grigorios Tsoumakas and Ioannis Vlahavas Machine Learning & Knowledge Discovery Group Department of Informatics
More informationk-nearest Neighbors + Model Selection
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University k-nearest Neighbors + Model Selection Matt Gormley Lecture 5 Jan. 30, 2019 1 Reminders
More informationEnsemble Learning. Another approach is to leverage the algorithms we have via ensemble methods
Ensemble Learning Ensemble Learning So far we have seen learning algorithms that take a training set and output a classifier What if we want more accuracy than current algorithms afford? Develop new learning
More informationCS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.
CS535 Big Data - Fall 2017 Week 8-A-1 CS535 BIG DATA FAQs Term project proposal New deadline: Tomorrow PA1 demo PART 1. BATCH COMPUTING MODELS FOR BIG DATA ANALYTICS 5. ADVANCED DATA ANALYTICS WITH APACHE
More informationDETECTING RESOLVERS AT.NZ. Jing Qiao, Sebastian Castro DNS-OARC 29 Amsterdam, October 2018
DETECTING RESOLVERS AT.NZ Jing Qiao, Sebastian Castro DNS-OARC 29 Amsterdam, October 2018 BACKGROUND DNS-OARC 29 2 DNS TRAFFIC IS NOISY Despite general belief, not all the sources at auth nameserver are
More informationMachine Learning Practical NITP Summer Course Pamela K. Douglas UCLA Semel Institute
Machine Learning Practical NITP Summer Course 2013 Pamela K. Douglas UCLA Semel Institute Email: pamelita@g.ucla.edu Topics Covered Part I: WEKA Basics J Part II: MONK Data Set & Feature Selection (from
More informationUsing Machine Learning to Identify Security Issues in Open-Source Libraries. Asankhaya Sharma Yaqin Zhou SourceClear
Using Machine Learning to Identify Security Issues in Open-Source Libraries Asankhaya Sharma Yaqin Zhou SourceClear Outline - Overview of problem space Unidentified security issues How Machine Learning
More informationPerformance Measures
1 Performance Measures Classification F-Measure: (careful: similar but not the same F-measure as the F-measure we saw for clustering!) Tradeoff between classifying correctly all datapoints of the same
More informationEvaluation of different biological data and computational classification methods for use in protein interaction prediction.
Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Yanjun Qi, Ziv Bar-Joseph, Judith Klein-Seetharaman Protein 2006 Motivation Correctly
More informationPrognosis of Lung Cancer Using Data Mining Techniques
Prognosis of Lung Cancer Using Data Mining Techniques 1 C. Saranya, M.Phil, Research Scholar, Dr.M.G.R.Chockalingam Arts College, Arni 2 K. R. Dillirani, Associate Professor, Department of Computer Science,
More informationBias-Variance Analysis of Ensemble Learning
Bias-Variance Analysis of Ensemble Learning Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 http://www.cs.orst.edu/~tgd Outline Bias-Variance Decomposition
More informationWEB BASED DATA-MINING ASSISTANT
P. J. Safarik University Faculty of Science WEB BASED DATA-MINING ASSISTANT THESIS Field of Study: Institute: Tutor: Computer Science Institute of Computer Science RNDr. Tomáš Horváth, PhD. Košice 2015
More informationModel Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Model Selection Matt Gormley Lecture 4 January 29, 2018 1 Q&A Q: How do we deal
More informationJournal of Engineering Science and Technology Review 10 (2) (2017) Research Article
Jestr Journal of Engineering Science and Technology Review 10 (2) (2017) 51-64 Research Article A Study of Algorithm Selection in Data Mining using Meta-Learning Murchhana Tripathy 1, * and Anita Panda
More informationThis document (including, without limitation, any product roadmap or statement of direction data) illustrates the planned testing, release and
AI and Visual Analytics: Machine Learning in Business Operations Steven Hillion Senior Director, Data Science Anshuman Mishra Principal Data Scientist DISCLAIMER During the course of this presentation,
More informationCover Page. The handle holds various files of this Leiden University dissertation.
Cover Page The handle http://hdl.handle.net/1887/22055 holds various files of this Leiden University dissertation. Author: Koch, Patrick Title: Efficient tuning in supervised machine learning Issue Date:
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationLearning a classification of Mixed-Integer Quadratic Programming problems
Learning a classification of Mixed-Integer Quadratic Programming problems CERMICS 2018 June 29, 2018, Fréjus Pierre Bonami 1, Andrea Lodi 2, Giulia Zarpellon 2 1 CPLEX Optimization, IBM Spain 2 Polytechnique
More informationData mining workflow templates for intelligent discovery assistance and auto-experimentation
Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch Year: 2010 mining workflow templates for intelligent discovery assistance and auto-experimentation
More information3 Virtual attribute subsetting
3 Virtual attribute subsetting Portions of this chapter were previously presented at the 19 th Australian Joint Conference on Artificial Intelligence (Horton et al., 2006). Virtual attribute subsetting
More informationData Mining: STATISTICA
Outline Data Mining: STATISTICA Prepare the data Classification and regression (C & R, ANN) Clustering Association rules Graphic user interface Prepare the Data Statistica can read from Excel,.txt and
More informationEvent: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect
Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect BEOP.CTO.TP4 Owner: OCTO Revision: 0001 Approved by: JAT Effective: 08/30/2018 Buchanan & Edwards Proprietary: Printed copies of
More informationPart I: Data Mining Foundations
Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?
More informationCOMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES
COMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES USING DIFFERENT DATASETS V. Vaithiyanathan 1, K. Rajeswari 2, Kapil Tajane 3, Rahul Pitale 3 1 Associate Dean Research, CTS Chair Professor, SASTRA University,
More informationMachine Learning Software ROOT/TMVA
Machine Learning Software ROOT/TMVA LIP Data Science School / 12-14 March 2018 ROOT ROOT is a software toolkit which provides building blocks for: Data processing Data analysis Data visualisation Data
More informationTour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers
Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers James P. Biagioni Piotr M. Szczurek Peter C. Nelson, Ph.D. Abolfazl Mohammadian, Ph.D. Agenda Background
More informationOpportunities and challenges in personalization of online hotel search
Opportunities and challenges in personalization of online hotel search David Zibriczky Data Science & Analytics Lead, User Profiling Introduction 2 Introduction About Mission: Helping the travelers to
More information