Context Change and Versatile Models in Machine Learning

Similar documents
Network Traffic Measurements and Analysis

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Part III: Multi-class ROC

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Part II: A broader view

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set.

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

CS145: INTRODUCTION TO DATA MINING

Data Mining Classification: Alternative Techniques. Imbalanced Class Problem

Machine Learning Techniques for Data Mining

Evaluating Classifiers

Lecture 25: Review I

Problems 1 and 5 were graded by Amin Sorkhei, Problems 2 and 3 by Johannes Verwijnen and Problem 4 by Jyrki Kivinen. Entropy(D) = Gini(D) = 1

Slides for Data Mining by I. H. Witten and E. Frank

Classification. Instructor: Wei Ding

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati

Non-Linearity of Scorecard Log-Odds

Robotics. Lecture 5: Monte Carlo Localisation. See course website for up to date information.

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

Evaluating Classifiers

CS249: ADVANCED DATA MINING

CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems. Leigh M. Smith Humtap Inc.

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska

Classification Part 4

Large Scale Data Analysis Using Deep Learning

Classification Algorithms in Data Mining

Similarity-Binning Averaging: A Generalisation of Binning Calibration

Data Mining Lecture 8: Decision Trees

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

Classification. Slide sources:

Variable Selection 6.783, Biomedical Decision Support

Exploring Econometric Model Selection Using Sensitivity Analysis

Data Preprocessing. Slides by: Shree Jaswal

Getting to Know Your Data

Advanced Data Mining Techniques

F-Measure Curves for Visualizing Classifier Performance with Imbalanced Data

Understanding Clustering Supervising the unsupervised

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

Business Club. Decision Trees

SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

Fast or furious? - User analysis of SF Express Inc

Clustering and Dissimilarity Measures. Clustering. Dissimilarity Measures. Cluster Analysis. Perceptually-Inspired Measures

Predict Outcomes and Reveal Relationships in Categorical Data

ISyE 6416 Basic Statistical Methods - Spring 2016 Bonus Project: Big Data Analytics Final Report. Team Member Names: Xi Yang, Yi Wen, Xue Zhang

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

2. On classification and related tasks

Clustering & Classification (chapter 15)

DETECTING RESOLVERS AT.NZ. Jing Qiao, Sebastian Castro DNS-OARC 29 Amsterdam, October 2018

Knowledge Discovery and Data Mining

Nearest neighbor classification DSE 220

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression

Preprocessing, Multiphase Inference, and Massive Data in Theory and Practice

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Weka ( )

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools

The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand).

Articulated Pose Estimation with Flexible Mixtures-of-Parts

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.

Compression and intelligence: social environments and communication

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN

CS4491/CS 7265 BIG DATA ANALYTICS

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Tutorials Case studies

DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES

Network Traffic Measurements and Analysis

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

Elemental Set Methods. David Banks Duke University

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

MSA220 - Statistical Learning for Big Data

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

Probabilistic Classifiers DWML, /27

Introduction to Automated Text Analysis. bit.ly/poir599

CART. Classification and Regression Trees. Rebecka Jörnsten. Mathematical Sciences University of Gothenburg and Chalmers University of Technology

Supervised Learning for Image Segmentation

Calibrate your model!

Data Mining Classification: Bayesian Decision Theory

Unsupervised Learning

Decision Trees Oct

UNIT 2 Data Preprocessing

Evaluating Machine-Learning Methods. Goals for the lecture

Machine Learning Software ROOT/TMVA

Louis Fourrier Fabien Gaie Thomas Rolf

MCMC Methods for data modeling

Introduction to Data Mining

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Evaluating Machine Learning Methods: Part 1

1 RefresheR. Figure 1.1: Soy ice cream flavor preferences

Fraud Detection using Machine Learning

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Data Preprocessing. Komate AMPHAWAN

Quality Metrics for Visual Analytics of High-Dimensional Data

SSV Criterion Based Discretization for Naive Bayes Classifiers

A Nelder-Mead Tuner for Svm

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

Transcription:

Context Change and Versatile s in Machine Learning José Hernández-Orallo Universitat Politècnica de València jorallo@dsic.upv.es ECML Workshop on Learning over Multiple Contexts Nancy, 19 September 2014

Spot the difference CONTEXT is everything. 2

Outline Context change: occasional or systematic? Contexts: types and represenation Adaptation procedures Versatile models Kinds of reframing Evaluation with context changes Related areas Conclusions 3

Context change: occasional or systematic? Contexts (domains, data, tasks, etc.) change. Context A? Context B Output Has the model been prepared to be adapted to other contexts? Did we sufficiently generalise from context A? Is the adaptation process ad-hoc? Should we throw the model away and learn a new one? 4

Context change: occasional or systematic? Contexts change repeatedly... Context A Context B Context C 5? Output Context D Output Output

Context change: occasional or systematic? How can treat context change in a more systematic way? 1. Determine which kinds of contexts we will deal with. 2. Describe and parameterise the context space. 3. Use versatile models that are better prepared for changes. 4. Define appropriate adaptation procedures to deal with the changes. 5. Overhaul evaluation tools for a range of contexts. 6

Context change: occasional or systematic? Example of an area that does this: ROC analysis 1. The kinds of contexts dealt with are known as operating conditions. 2. Contexts are parameterised as skews (class and cost proportions). 3. Ranking models provide more versatility than crisp classifiers. 4. s are adapted to contexts by changing the threshold. 5. ROC curves and other plots and metrics evaluate model behaviour for a range of contexts, assuming a given threshold choice method will be used. 7

Contexts: types and representation shift (covariate, prior probability, concept drift, ). Changes in p(x), p(y), p(x Y), p(y X), p(x,y) Costs and utility functions. Cost matrices, loss functions, reject costs, attribute costs, error tolerance Uncertain, missing or noisy information Noise or uncertainty degree, %missing values, missing attribute set,... Representation change, constraints, background knowledge. Granularity level, complex aggregates, attribute set, etc. Task change Regression cut-offs, bins, number of classes or clusters, quantification, 8

Contexts: types and representation Is the context absolute or relative to the original context? Absolute: E.g. in context B positive class is three times more likely than negative class. Relative: E.g. positive class in context B is three times more likely than in the original context A. Is the context given or inferred? Given: E.g.: cost information, cut-off, attribute set, Inferred (from the deployment data or a small labelled dataset): E.g.: p(x), % of missing data, class proportion, Is the context changing once for each dataset or for each example? If the context changes for each example, a non-systematic approach becomes very problematic. context inference is more difficult. 9

Contexts: types and representation A context θ is a tuple of one or more values, discrete or numerical, that represent or summarise contextual information. Examples: Contexts are cities and temperatures: θ A = Nancy, 20 is a context, while θ B = Valencia, 30 is another context. Contexts are cost proportions. θ = c where c is a cost proportion or a skew or a class prior. Contexts are attribute granularity. θ = week, city, women, category to specify granularities for dimensions time, store, customer and product, respectively. Contexts are error tolerance. θ = 20% to specify that up to 20% of regression error is acceptable. 10

Adaptation procedures Retraining: Train another model using the available (old and possibly new) data and the new context into account. Context A Context B θ B A B Output The original model is discarded (no knowledge reuse). If there is plenty of new data, this is a reasonable approach. Not very efficient if the context changes again and again (e.g., for each example). The training data may have been lost or may not exist (the models may have been created or integrated by human experts). May lead to context overfitting. 11

Adaptation procedures Retraining with knowledge transfer: Train another model using (or tranferring) part of the knowledge from the original context. Context A Context B θ B A Knowledge B Output 12 Parts of the original model or other kinds of knowledge is still reused. Instance-transfer (Pan & Yang 2010). Feature-representation-transfer (Pan & Yang 2010). Parameter-transfer (Pan & Yang 2010). Relational-knowlege-transfer (Pan & Yang 2010). Prediction-transfer: the original model is used to label examples (mimetism).

Adaptation procedures Context-as-feature: the parameters of the context are added as features to the training data. Context A θ A Context B θ B Context C θ C direct Context D is reused. data can be discarded. Requires several contexts during training in order to generalise the feature. Makes more sense when there is a different context per example. The context works as a second-order feature, regulating how the other features should be used. Not many machine learning techniques are able to deal with this kind of pattern. θ D Output 13

Adaptation procedures Reframing: process of applying an existing model to the new operating context by the proper transformation of inputs, outputs and/or patterns. Context A θ B Context B Reframing Output is reused. data can be discarded. The reframing process is designed to be systematic (and automated), using θ. Only one original context is needed. 14

Versatile model A versatile model is a model that captures more information than needed and/or generalises further than strictly necessary for the original context in order to be prepared to be reframed for a new context. Examples: Generative models over discriminative models. Scoring classifiers over crisp classifiers. s gathering statistics (means, co-variances, etc.) about the inputs/output. Unpruned trees over pruned trees. s that take different kinds of features. Hierarchical clustering over clustering methods with a fixed no. of clusters. 15

Versatile model How can we generate more versatile models? Redefine learning algorithms and models, so that they include more information. E.g., keep some of the information used during learning (densities, clusters, alternative rules, etc.). Annotate models as a postprocess. E.g., include statistics at each split of a decision tree. Enrich them using the training or a validation dataset. E.g., calibration. The knowledge is not gathered in a separate way from the model (as in knowledge transfer) This knowledge is embedded in the model so that its adaptation can be automated. 16

Kinds of reframing Output reframing. Outputs are reframed. Context A θ B Context B X Z Output transformation Reframing Y Output Examples and other names: Use of threshold choice methods with scoring classifiers (as in ROC analysis). Binarised regression problem (cutoff from regression to classification). Shifting the output to minimise expected cost in regression. By tuning (Bansal et al. 2008) or reframing (Hernandez-Orallo 2014). 17

Kinds of reframing Input reframing. Inputs are reframed. Context A θ A θ B Context B Reframing (Possible) Input transformation X Input transformation X Y Output Examples and other names: Use of quantiles (El Jelali et al. 2013). Feature shift (Ahmed et al 2014) 18

Kinds of reframing Structural reframing. The model is reframed. Context A Reframing θ B Context B transformation X Y Output Examples and other names: Relabelling (e.g., using a small labelled dataset) Postpruning (during deployment). 19

Evaluation with context changes The performance of a model m on a data D can be evaluated for a single context θ using a reframing procedure R. If contexts change systematically, we want to see model performance using a reframing procedure for a range of operating contexts: With a context plot: context on one or more axes and Q on another axis. Dominance regions can be visualised. How can we summarise a curve? A range of contexts is given by a set of contexts and a distribution w over them. 20

Evaluation with context changes Example: classical cost curves are context plots. Many other curves are possible if the reframing procedure is different. In this case, several threshold choice methods on the right. F 0 : TPR or sensitivity if threshold is set on t 1-F 1 : TNR or specificity if threshold is set on t c is the context 21

Evaluation with context changes Example: regression asymmetric costs For instance, using asymmetric absolute cost (Lin-Lin) for regression: α is the context Regression cost curves: 22

Evaluation with context changes Example: REC curves (tolerance level) Acc = 1 y y tolerance tolerance is the context 23

Evaluation with context changes Example: attribute shift One or more attributes have a constant shift (Ahmed et al. 2014): x x + β β is the context In this context plot, we compare retraining with a reframing approach. 24

Evaluation with context changes Example: noise levels (Ferri et al. 2014) may have different levels of noise. level of noise is the context 25

Evaluation with context changes Example: misclassification cost (MC) vs attribute test cost (TC): α is the context Different attribute subsets lead to different cost lines: 26

Evaluation with context changes Example: multidimensional (attributes are hierarchical dimensions) This tuple of levels is the context To make the plot simpler, we use a Reduction Coefficient (RC), which expresses the level of aggregation of the data (from 0 to 1). RC is a simplification of the context 27

Evaluation with context changes Example: multilabel (Al-Otaibi 2014) Costs per each label are introduced The tuple of costs for all labels is the context Different colours represent different threshold choice methods. The average cost is a simplification for the context Curves are for cases where the costs are equal for all labels. Clouds are for cases where cost are different for each label (but the average is on the x-axis). 28

Evaluation with context changes Example: clustering algorithms depending on number of clusters Different clustering algorithms: The no. of clusters is the context The cost can be any clustering performance metric, such as Davies- Bouldin index, the Dunn index or the Silhouette coefficient. Kmeans is rerun (retrained) with different values for K. Hierarchical methods are versatile models working for several contexts. 29

Related areas shift. Domain adaptation. Cost-sensitive learning. Learning with noisy data. Transfer learning. Multi-task learning. Transportability. Context-aware computing. Mimetic models. Theory revision. ROC analysis and cost plots. 30

Related areas A reframing perspective is distinctive: Contexts are clearly identified and parameterised. It s not a one-to-one occasional transfer but a systematic application. There can be several reframing methods for the same model and data, leading to different results. s are learnt in one context and task but kept for many contexts. Performance is analysed in a range of contexts. s are reused. 31

Conclusions Disposing validated models again and again is not cost-efficient. Reusing models seems more appealing. Versatile models should be as general as possible to cope with a range of contexts. Validation has to take this range of contexts into account. deployment is crucial. s become good or bad for a context depending on the deployment procedure we are using. But don t be blinded by reframing. We should always consider the trade-off between retraining and reframing (and other possible options). 32