Robust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson

Size: px

Start display at page:

Download "Robust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson"

Elwin McGee
6 years ago
Views:

1 Robust Regression Robust Data Mining Techniques By Boonyakorn Jantaranuson

2 Outline Introduction OLS and important terminology Least Median of Squares (LMedS) M-estimator Penalized least squares

3 What is Regression? Fit a model to observed data Get minimum error between real data and predicted data

4 Outliers Noise: transmission error, measurement error Cause problem to resulting regression model

5 Robust Regression More robust to outliers than normal regression Outliers are not removed but not strongly affect the model

6 Problem formulation y i is called response variable x i is called explanatory variable with p dimensions e i is error term Goal: want to find the estimate of each parameter with minimum error

7 Problem formulation (contd.) Estimates of parameter are called regression coefficients Residual r i is the difference between real and predicted value Formally, our goal is to find a model which can fit the data with smallest residuals

8 Ordinary Least Squares (OLS) Most common regression model Also called sum of least squares or least squares (LS) Goal: find regression coefficients that minimize the sum of squared residuals

9 Problem with OLS Regression model is sensitive to outlier

10 Breakdown point Measure of robustness of regression method Ratio of the smallest number of outliers that causes the regression model to break down and total number of data points E.g. 1 outlier already corrupt OLS result Its breakdown point is 1/n or 0% Highest possible breakdown point is 50%

11 Leverage points Outliers can occur in both x- and y-directions Outliers in x-direction called leverage point Normally yields larger residual than outlier in y-axis

12 Least Median of Squares (LMedS) Introduced by Hampel in 1975 Replace sum in OLS with median More robust because of median

13 LMedS (contd.) Can achieve 50% breakdown point Computationally expensive for exact solution O(n p+1 log n) in p-dimension Need some approximation algorithm

14 LMedS with randomization Calculate the approximation of LMedS Get a good running time of O(n log 2 n) is 2-D with high probability and O(n p-1 log n) in p-dimension in worst case

15 LMedS with randomization (contd.) Goal: maintain the interval of slopes of lines to get minimum residual Set of line is defined by The interval of slopes (w.r.t. 2 points) is

16 LMedS with randomization (contd.) In each iteration, n cones will be random from all possible (n-1)(n-2)/2 cones The median of residual will be tested and interval is shrinked Repeat until residual is small enough and find the optimal solution from the intersections in the remaining interval

17 Reweighted Least Squares (RLS) One variant of LMedS Combines OLS with estimates from LMedS S is scale estimate corresponding to LMedS

18 RLS (contd.) From Robust Regression and Outlier Detection by Rousseuww

19 M-estimator The name M is from Maximum Likelihood Replace squared residual in OLS with a symmetric, positive semi-definite function ρ

20 M-estimator (contd.) To find regression coefficients that minimize the objective function, we need to find derivative of that function

21 M-estimator (contd.) We can also reduce M-estimator to other types of regression OLS: ρ(r i ) = r 2 Least absolute deviations (LAD): ρ(r i ) = r LAD yields less residuals than OLS but in high-dimensional data OLS can perform slightly better But still 0% breakdown points! Challenge: need to choose right ρ function to get a good result

22 Penalized Least Squares OLS is equivalent to find maximum likelihood estimate (MLE) of data MLE only interested in training data, not in prior knowledge => Overfitting Solution: use maximum a posteriori (MAP)

23 Penalized Least Squares With prior that the data is normally distributed (Gaussian), calculating MAP is equivalent to Intuitively, it is OLS with penalty term The above is called ridge regression or l 2 regularization

24 Penalized Least Squares (contd.) Different assumption on data and prior give different type of regularization From Machine Learning: A Probabilistic Perspective by Murphy

25 Hard Thresholding (TORRENT) TORRENT = Thresholding Operator-based Robust RegrEssioN method Based on l 1 regularized regression Iteratively maintain the active set S t using hard thresholding operator Active set is a set of clean points (not outliers) Keep updating weights (regression coeff.) until the residual less than some pre-specified error tolerance

26 TORRENT (contd.) From paper Robust Regression via Hard Thresholding by Bhatia, Jain and Kar

27 TORRENT (contd.) Offer several variants which are suitable in different situations Variants TORRENT-FC: fully corrective LS, converge faster but expensive at each step TORRENT-GD: using gradient descent, suitable for high dimensional data TORRENT-HYB: hybrid version of above variants

28 Self-Scaled Regularized Robust Regression Also based on l 1 regularized regression Incorperate prior knowledge to make the penalty term able to scaled automatically Prior e.g. Data occurrence

29 Conclusion OLS is sensitive to outliers LMedS have high breakdown point but slow M-Estimate is flexible but hard to find the right function to make it robust Penalized Least Squares is also robust but require prior knowledge on data Sometime need strong assumption and not always correct

30 Remarks Old papers tend to talk more about high breakdown point i.e. try to reach 50% breakdown point More recent papers interested in computational speed instead Effect of high dimensional data

CSE446: Linear Regression. Spring 2017

CSE446: Linear Regression. Spring 2017 CSE446: Linear Regression Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Prediction of continuous variables Billionaire says: Wait, that s not what I meant! You say: Chill