Non-Linearity of Scorecard Log-Odds

Similar documents
DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

Feature Selection. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 262

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing

Logistic Regression

Context Change and Versatile Models in Machine Learning

Curve fitting using linear models

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

An Automated System for Data Attribute Anomaly Detection

Variable Selection 6.783, Biomedical Decision Support

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Credit card Fraud Detection using Predictive Modeling: a Review

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

Correctly Compute Complex Samples Statistics

Comparative analysis of data mining methods for predicting credit default probabilities in a retail bank portfolio

Lecture 7: Linear Regression (continued)

Support Vector Machines + Classification for IR

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS

Network Traffic Measurements and Analysis

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

Multicollinearity and Validation CIVL 7012/8012

Historical Data RSM Tutorial Part 1 The Basics

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

Labor Economics with STATA. Estimating the Human Capital Model Using Artificial Data

Removing Subjectivity from the Assessment of Critical Process Parameters and Their Impact

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools

Correctly Compute Complex Samples Statistics

Modern Methods of Data Analysis - WS 07/08

The exam is closed book, closed notes except your one-page cheat sheet.

Lecture 21 Section Fri, Oct 3, 2008

Oracle Financial Services Economic Capital Advanced User Guide

CS 664 Image Matching and Robust Fitting. Daniel Huttenlocher

Chapter 2: Modeling Distributions of Data

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CREDIT RISK MODELING IN R. Finding the right cut-off: the strategy curve

Also, for all analyses, two other files are produced upon program completion.

Machine Learning and Bioinformatics 機器學習與生物資訊學

CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT

Expectation Maximization (EM) and Gaussian Mixture Models

Multivariate Data Analysis and Machine Learning in High Energy Physics (V)

Applying Supervised Learning

Product Catalog. AcaStat. Software

Didacticiel - Études de cas

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Dealing with Categorical Data Types in a Designed Experiment

2. On classification and related tasks

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

CHAPTER 1 INTRODUCTION

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Scoring Models, Probability Transformations & Model Calibration Using SAS

Chapter 2 Modeling Distributions of Data

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

2016 Stat-Ease, Inc. & CAMO Software

10-701/15-781, Fall 2006, Final

Modelling Proportions and Count Data

Predictive Modeling with SAS Enterprise Miner

Classification. Slide sources:

Discriminate Analysis

Supplementary Figure 1. Decoding results broken down for different ROIs

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

CART. Classification and Regression Trees. Rebecka Jörnsten. Mathematical Sciences University of Gothenburg and Chalmers University of Technology

Lina Guzman, DIRECTV

CHAPTER 2 Modeling Distributions of Data

Cost Functions in Machine Learning

Interpretable Machine Learning with Applications to Banking

Modelling Proportions and Count Data

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

Stat 4510/7510 Homework 4

STA 4273H: Statistical Machine Learning

CS145: INTRODUCTION TO DATA MINING

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

CISO as Change Agent: Getting to Yes

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

2. Smoothing Binning

Gibbs Sampling. Stephen F. Altschul. National Center for Biotechnology Information National Library of Medicine National Institutes of Health

CS249: ADVANCED DATA MINING

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

humor... May 3, / 56

Cosmic Ray Shower Profile Track Finding for Telescope Array Fluorescence Detectors

Building Better Parametric Cost Models

Lecture 25: Review I

3. Data Analysis and Statistics

Macros and ODS. SAS Programming November 6, / 89

Analysis of Panel Data. Third Edition. Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS

Supervised vs unsupervised clustering

Eight units must be completed and passed to be awarded the Diploma.

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

Predict Outcomes and Reveal Relationships in Categorical Data

An Introduction to Growth Curve Analysis using Structural Equation Modeling

Module 4. Non-linear machine learning econometrics: Support Vector Machine

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Clustering analysis of gene expression data

Clustering and Visualisation of Data

Transcription:

Non-Linearity of Scorecard Log-Odds Ross McDonald, Keith Smith, Matthew Sturgess, Edward Huang Retail Decision Science, Lloyds Banking Group Edinburgh Credit Scoring Conference 6 th August 9

Lloyds Banking Group Lloyds Banking Group was formed by the acquistion of HBOS by Lloyds TSB Three key relationship brands: The retail bank currently has 3 million customers 3 million current accounts Number one positions in Mortgages and Savings Retail Decision Science employs ~5 specialists and modellers in Acquisition, Account Management, Collections and Recoveries, Basel and Impairment, Infrastructure, Portfolio Modelling, Fraud and Marketing.

Outline The Problem Non-linear score to log-odds relationship in scorecard models Why does it matter? Uses of scorecard models, where do we need pr(event)? Why does it happen? Uneven data distributions, correlation across bins How can we fix it? Some remedies, pros and cons 3

The Problem Log-Odds Against Inferred Log-Odds Divergence Maximisation - All Bins 9 8 Actual Log-Odds 7 6 5 4 3 - - 3 4 5 6 7 8 Inferred Log-Odds Log-odds - Trend 4

The Problem Log-Odds Against Inferred Log-Odds Divergence Maximisation Actual Log-Odds - 3 4 5 6 7 8 9 Inferred Log-Odds Class Class Poly. (Class) Poly. (Class) 5

The Problem Scorecard models - we frequently notice a curvature in the relationship between actual (data-estimated) log-odds and the log-odds inferred from the model predictions Seen in many different modelling scenarios (default, marketing propensity, fraud etc.) The curvature is approximately quadratic Common to many linear modelling techniques Gini measures ranking performance - very good Ginis Deviation at the lower and upper ends of the range low odds and high odds Associated with irregular score distributions 6

Does it Matter? NO If we only care about rank order of accounts, bad rates above and below the cut-off. Low Propensity High Propensity Cut-off The traditional ways of assessing scorecard performance Gini K-S statistic ROC curve Concordance Log-odds to score plot Log-Likelihood Only the log-odds plot reveals incorrect p(event) 7

Does it Matter? Traditional scorecard development Fit model to training data Validate ranking performance of model on test set Select cutoff Deploy scorecard use cutoff to make automated decisions (eg. grant credit, send mail, flag as fraud), vary as required to control volumes As long as the model s ranking performance is good, we do not really care how good the model s predictions p(event) are Any monotonic transformation of the model s predictions will preserve ranking performance, Gini We make use of this when we scale scorecard coefficients to a standard form (eg, 6 for equal log odds, every + doubles log-odds) 8

Does it Matter? YES In some scenarios, model accuracy p(event) does matter Eg: Application scorecard built on historic data, predicted bad rates for applicants today used to monitor business strategy and vary cut-off Basel estimated point-in-time PDs by segment based on scorecard default rate estimates Capital allocation (based on profitability of a campaign / action) Targeting marketing campaigns on multiple dimensions Account level strategies that use estimated default rate derived from a scorecard, eg. risk-based repricing, account closure, credit line increase or decrease, collections processes Where we use estimated PD as one component to calculate NPV, expected loss, expected value 9

Does it Matter? Eg. we predict propensity to respond to a marketing campaign, divide the scores into segments, and target based on predicted response rates 3 4 5 6 7 8 9 5% % 9% 6% % %.5%.5%.5%.5% Expected response rate.5% acceptable

Why Does it Happen? Scorecards are usually linear or generalised linear (ie. logistic) models Advantages: coefficient per variable (or bin) Interpretability both model coefficients and output Robustness Easy to implement Output can be linearly scaled to log-odds Do not need technical understanding to use scorecards once built Disadvantages: May not capture complex interactions easily Possible instability of fitting method Non-linearity of log-odds

Why Does it Happen? Class Optimal separating hyperplanc fixed by cutoff) Contours of equal score Class Direction of optimal separation (defined by fitted model coefficients)

Why Does it Happen? Model-building methods: Maximisation of Fisher Distance / Divergence ( r r ( r ( Σ β β µ r µ + Σ )) r ) β Ronald Fisher Maximise the separation of the means of score distributions while minimising the total variance 3

Why Does it Happen? Maximisation of Fisher Distance / Divergence - options: Linear Discriminant Analysis Assume equal covariance matrix for both classes - then differentiate and set equal to zero. Optimal solution is then given by r β r r ( µ ) y µ = Σ = y= Direct optimisation Use an optimisation method to solve directly. 4

Why Does it Happen? Log-Odds Against Inferred Log-Odds Linear Discriminant Analysis Actual Log-Odds 9 8 7 6 5 4 3 - - 3 4 5 6 7 8 Inferred Log-Odds Log-odds - Poly. (Log-odds) 5

Why Does it Happen? Log-Odds Against Inferred Log-Odds Linear Discriminant Analysis Actual Log-Odds - 3 4 5 6 7 8 9 Inferred Log-Odds Class Class Poly. (Class) Poly. (Class) 6

Why Does it Happen? Log-Odds Against Inferred Log-Odds Divergence Maximisation - All Bins 9 8 Actual Log-Odds 7 6 5 4 3 - - 3 4 5 6 7 8 Inferred Log-Odds Log-odds - Trend 7

Why Does it Happen? Log-Odds Against Inferred Log-Odds Divergence Maximisation Actual Log-Odds - 3 4 5 6 7 8 9 Inferred Log-Odds Class Class Poly. (Class) Poly. (Class) 8

Why Does it Happen? Logistic Regression log p p = β + β x + β x + + β K Solved by maximising likelihood, usually using iterative Newton- Raphson to solve non-linear equations (or can solve directly). When there is only one binned variable in the model (so no overlap of dummy variables), it can be shown that β j = c log x = i x = i y ( i y i ) k x k Weights of Evidence 9

Why Does it Happen? Log-Odds Against Inferred Log-Odds Logistic Regression - All bins 9 8 Actual Log-Odds 7 6 5 4 3 - - 3 4 5 6 7 8 Inferred Log-Odds Log-odds - Poly. (Log-odds)

Why Does it Happen? Log-Odds Against Inferred Log-Odds Linear Discriminant Analysis Actual Log-Odds - 3 4 5 6 7 8 9 Inferred Log-Odds Class Class Poly. (Class) Poly. (Class)

Why Does it Happen? It can be shown that if the score distributions are normal with means and µ and µ variances σ and σ then the log-odds to score relationship is quadratic: as + bs + c Where a = σ µ µ b = σ σ σ,, P( Class = c = log P( Class = ) ) + σ log σ + µ σ µ σ ie. Unequal reciprocal variances in score distributions lead to curvature. Note that this assumes normally distributed scores. Why do we get unequal variances?

Why Does it Happen? A simple model Let s suppose we have a linear model: score= β + β x + + β x + +... + β K m m... + βk xk ie. we have binned all our variables / and included all bins in the model. Assume each X is the outcome of a Bernouilli trial with parameter p dependent on class. ie. for class we have probability p of observing a, (- p) of observing a etc. but p is conditional on class. Across the set of bins for a single binned variable the x s will sum to exactly. This is a special case of the multinomial distribution with sum parameter n =. Within a variable, Var(X i ) = p i (-p i ), Cov(X i, X j ) = -p i p j kn x kn

Why Does it Happen? A simple model, variables, bins: Within variable bins Bins across variables score = β β β β + β + x + x + x x ( ) ( score class) = β + β ββ ( ) β + β β β p ( p ) + Var β β β Cov( X β Cov( X,, X X ) + ) β + β β β Cov( X Cov( X,, p X X ( ) + ) p ) + For curvature, the reciprocal of these terms has to differ across classes. components: covariance terms within a bin eg. p (-p ). These won t be equal across classes except in very rare cases (eg. perfect separation, where p = or ). Large covariance terms for bins across different variables Cov(X,X ) etc. Can we reduce curvature by eliminating correlated bins? 4

How Can We Fix It? Can we improve the curvature by eliminating correlated bins? High bin correlation occurs for several reasons: High correlation of missing category. Eg. Scorecard models often incorporate bureau data. If one field is missing, others are often missing for the same reason (no relevant data, failure to match address, etc.) Correlation of values that have an underlying connection eg. high credit balance usually means high interest charges. Correlation of values where there is no qualifying data eg. variables summarising arrears behaviour for someone who has never been in arrears. It is common practice when building a scorecard to monitor variable correlation. Correlation can lead to instability, since we may end up with multiple optimal solutions for coefficients. Reduces interpretability coefficients may change sign Risk: even highly correlated variables can add useful information. 5

How Can We Fix It? Log-Odds Against Inferred Log-Odds Logistic Regression -.5 Threshold Actual Log-Odds 9 8 7 6 5 4 3 - - 3 4 5 6 7 8 Inferred Log-Odds Log-odds - Poly. (Log-odds) 6

How Can We Fix It? Log-Odds Against Inferred Log-Odds Logistic Regression -.45 Threshold Actual Log-Odds 9 8 7 6 5 4 3 - - 3 4 5 6 7 8 Inferred Log-Odds Log-odds - Poly. (Log-odds) 7

How Can We Fix It? Log-Odds Against Inferred Log-Odds Logistic Regression -.4 Threshold Actual Log-Odds 9 8 7 6 5 4 3 - - 3 4 5 6 7 8 Inferred Log-Odds Log-odds - Poly. (Log-odds) 8

How Can We Fix It? Log-Odds Against Inferred Log-Odds Logistic Regression -.3 Threshold 9 8 Actual Log-Odds 7 6 5 4 3 - - 3 4 5 6 7 8 Inferred Log-Odds Log-odds - Poly. (Log-odds) 9

How Can We Fix It? Log-Odds Against Inferred Log-Odds Logistic Regression -.5 Threshold Actual Log-Odds - 3 4 5 6 7 8 9 Inferred Log-Odds Class Class Poly. (Class) Poly. (Class) 3

How Can We Fix It? Log-Odds Against Inferred Log-Odds Logistic Regression -.45 Threshold Actual Log-Odds - 3 4 5 6 7 8 9 Inferred Log-Odds Class Class Poly. (Class) Poly. (Class) 3

How Can We Fix It? Log-Odds Against Inferred Log-Odds Logistic Regression -.4 Threshold Actual Log-Odds - 3 4 5 6 7 8 9 Inferred Log-Odds Class Class Poly. (Class) Poly. (Class) 3

How Can We Fix It? Log-Odds Against Inferred Log-Odds Logistic Regression -.3 Threshold Actual Log-Odds - 3 4 5 6 7 8 9 Inferred Log-Odds Class Class Poly. (Class) Poly. (Class) 33

How Can We Fix It? Impact on Gini of Eliminating Correlated Variables % Gini as % of original Gini % 8% 6% 4% % %.9.8.7.6.5.4.3.. Threshold on (absolute) correlation 34

How Can We Fix It? Other ideas for fixing non-linearity: Adopt a different binning method. Rather than using dummy variables, each variable takes a value equal to the weights of evidence value for the current bin. This is equivalent to using dummy variables and enforcing the condition that all model coefficients β must be proportional to the weights of evidence. In this case leads to a reduction of % in Gini. Apply a retrospective, non-linear scaling to the model scores to align to log-odds (eg. apply a quadratic transformation). No impact on Gini. Disadvantage in that an extra transformation must be applied no longer derive score by summing values 35

How Can We Fix It? Log-Odds Against Inferred Log-Odds Weights of Evidence Binning Actual Log-Odds 9 8 7 6 5 4 3 - - 3 4 5 6 7 8 Inferred Log-Odds Log-odds - Poly. (Log-odds) 36

How Can We Fix It? Log-Odds Against Inferred Log-Odds Weights of Evidence Binning Actual Log-Odds - 3 4 5 6 7 8 9 Inferred Log-Odds Class Class Poly. (Class) Poly. (Class) 37

How Can We Fix It? Log-Odds Against Inferred Log-Odds, Polynomial Correction 9 8 Actual Log-Odds 7 6 5 4 3 - - 3 4 5 6 7 8 Inferred Log-Odds Log-odds - Poly. (Log-odds) 38

How Can We Fix It? Log-Odds Against Inferred Log-Odds Weights of Evidence Binning Actual Log-Odds - 3 4 5 6 7 8 9 Inferred Log-Odds Class Class Poly. (Class) Poly. (Class) 39

Conclusion Causes: Non-linear (quadratic) curvature caused by uneven score distributions. This caused by underlying (non-normal) qualities of the data between classes, exacerbated by collinearity across variable bins It has no adverse effect on ranking performance or Gini, but is a problem where we rely on accurate estimation of p(event) Solutions: Eliminate collinear bins when building the model Enforce weights of evidence proportionality in binning Try different modelling techniques Non-linear rescaling of output model scores 4

Questions? 4