Controlling False Alarms with Support Vector Machines

Similar documents
MINIMAX SUPPORT VECTOR MACHINES

LEARNING MINIMUM VOLUME SETS WITH SUPPORT VECTOR MACHINES

Support Vector Machines

LECTURE 5: DUAL PROBLEMS AND KERNELS. * Most of the slides in this lecture are from

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Lecture 9: Support Vector Machines

Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms

HW2 due on Thursday. Face Recognition: Dimensionality Reduction. Biometrics CSE 190 Lecture 11. Perceptron Revisited: Linear Separators

SoftDoubleMinOver: A Simple Procedure for Maximum Margin Classification

Feature scaling in support vector data description

Supervised classification exercice

Leave-One-Out Support Vector Machines

The Effects of Outliers on Support Vector Machines

Classification by Support Vector Machines

Second Order SMO Improves SVM Online and Active Learning

Software Documentation of the Potential Support Vector Machine

Kernel-based online machine learning and support vector reduction

MANY statistical learning problems may be characterized

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

COMS 4771 Support Vector Machines. Nakul Verma

Features: representation, normalization, selection. Chapter e-9

Chakra Chennubhotla and David Koes

CSE 417T: Introduction to Machine Learning. Lecture 22: The Kernel Trick. Henry Chai 11/15/18

CS 229 Midterm Review

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning

Linear methods for supervised learning

Support Vector Machines. James McInerney Adapted from slides by Nakul Verma

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Classification by Support Vector Machines

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Machine Learning for NLP

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

Data Mining in Bioinformatics Day 1: Classification

Lecture 7: Support Vector Machine

Adaptive Learning of an Accurate Skin-Color Model

CS570: Introduction to Data Mining

Simple Model Selection Cross Validation Regularization Neural Networks

SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin. April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1

ECG782: Multidimensional Digital Signal Processing

SUPPORT VECTOR MACHINES

Large synthetic data sets to compare different data mining methods

All lecture slides will be available at CSC2515_Winter15.html

Support vector machines

Naïve Bayes for text classification

Introduction to Support Vector Machines

Conflict Graphs for Parallel Stochastic Gradient Descent

SUPPORT VECTOR MACHINE ACTIVE LEARNING

Machine Learning for. Artem Lind & Aleskandr Tkachenko

Perceptron Learning with Random Coordinate Descent

Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Instance-based Learning

Classification by Support Vector Machines

Rita McCue University of California, Santa Cruz 12/7/09

Network Traffic Measurements and Analysis

Support Vector Machines

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Feature Selection for SVMs

Version Space Support Vector Machines: An Extended Paper

DECISION-TREE-BASED MULTICLASS SUPPORT VECTOR MACHINES. Fumitake Takahashi, Shigeo Abe

.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..

Kernel SVM. Course: Machine Learning MAHDI YAZDIAN-DEHKORDI FALL 2017

Boosting Simple Model Selection Cross Validation Regularization

Machine Learning - Lecture 15 Support Vector Machines

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

CGBoost: Conjugate Gradient in Function Space

Kernel-based online machine learning and support vector reduction

SVM Classification in -Arrays

Applying Supervised Learning

Bagging and Boosting Algorithms for Support Vector Machine Classifiers

A Comparative Study of SVM Kernel Functions Based on Polynomial Coefficients and V-Transform Coefficients

Kernel Combination Versus Classifier Combination

Computerlinguistische Anwendungen Support Vector Machines

Training Data Selection for Support Vector Machines

Bagging for One-Class Learning

Special Topic: Missing Values. Missing Can Mean Many Things. Missing Values Common in Real Data

CS6716 Pattern Recognition

Tutorial on Machine Learning Tools

10601 Machine Learning. Model and feature selection

Support Vector Machines and their Applications

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]

K Fold Cross Validation for Error Rate Estimate in Support Vector Machines

6 Model selection and kernels

Probabilistic Approaches

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

Louis Fourrier Fabien Gaie Thomas Rolf

Gene Expression Based Classification using Iterative Transductive Support Vector Machine

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Performance Measures

J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report. February 5, 1998

SELF-ORGANIZING methods such as the Self-

Classification by Nearest Shrunken Centroids and Support Vector Machines

SVM in Oracle Database 10g: Removing the Barriers to Widespread Adoption of Support Vector Machines

Robust 1-Norm Soft Margin Smooth Support Vector Machine

Support vector machines. Dominik Wisniewski Wojciech Wawrzyniak

A Short SVM (Support Vector Machine) Tutorial

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017

Automated Microarray Classification Based on P-SVM Gene Selection

Contents. Preface to the Second Edition

A Novel Kernelized Classifier Based on the Combination of Partially Global and Local Characteristics

Transcription:

Controlling False Alarms with Support Vector Machines Mark Davenport Clayton Scott Rice University dsp.rice.edu Richard Baraniuk

The Classification Problem Given some training data...... find a classifier that generalizes

Conventional Classification Signal / Pattern: Label: Classifier: Probability of error: Goal:

A Practical Approach Support Vector Machines (SVMs) offer a practical, nonparametric method for learning from data General idea: use kernel trick hyperplane classifiers maximize the margin [Cortes, Vapnik (1995)]

The Kernel Trick Problem: Linear classifiers perform poorly

The Kernel Trick Problem: Linear classifiers perform poorly Solution: Map data into feature space

The Kernel Trick Problem: Linear classifiers perform poorly Solution: Map data into feature space

The Kernel Trick Problem: Linear classifiers perform poorly Solution: Map data into feature space

Maximum Margin Principle Problem: Many classifiers to choose from

Maximum Margin Principle Problem: Many classifiers to choose from Solution: Pick one that maximizes margin

-SVM [Schölkopf et. al. (2000)]

What s Wrong? Sometimes false alarms are more/less important than misses False alarm: object detected, but not present Miss: object present, but not detected

What Else? Class frequencies are often not represented in the training data minimizing P E can ignore smaller class Prior probabilities are usually unknown 100 training samples 50 have leukemia 50 do not 50% of population has leukemia

Neyman-Pearson Classification Solution: Recast the problem False alarm: Miss: Goal:

Simple Approach Bias-shifting ad-hoc, but oft-used

Simple Approach Bias-shifting ad-hoc, but oft-used

Cost-Sensitive SVMs We need to explicitly treat the classes differently during training 2 -SVM Equivalent to the 2C-SVM How to pick + and -? [Chew, Bogner (2001)] [Davenport (2005)]

Controlling False Alarms: 2 -SVM Perform grid search over parameters - +

Controlling False Alarms: 2 -SVM Perform grid search over parameters Estimate false alarm and miss rates - - + +

Controlling False Alarms: 2 -SVM Perform grid search over parameters Estimate false alarm and miss rates Set - - + +

Controlling False Alarms: 2 -SVM Perform grid search over parameters Estimate false alarm and miss rates Set - - + +

Performance Evaluation We need a scalar measure of performance we want to evaluate our ability to achieve a specific point on the ROC

Performance Evaluation Theorem: [Scott (2005)]

Experimental Results Use Gaussian kernel Performance averaged over 100 permutations 4 benchmark datasets We report thyroid heart cancer banana P F P M E BS.06.46.637 2 -SVM.09.04.051 BS.09.55 1.000 2 -SVM.11.23.326 BS.00 1.00 1.000 2 -SVM.11.69.821 BS.11.33.628 2 -SVM.10.12.160 mean P F, P M median E 2 -SVM clear winner BS: 2 -SVM: = 0.1 bias-shifting our approach

Error Estimation True False Alarm Rate Cross-Validation P F P F - + Filtered Cross-Validation - + CV has high variance Filtering reduces the variance and yields a better error estimate P F - +

Filtering Results Filtering provides strong performance gains Shape of the filter doesn t seem to matter Gaussian window Uniform (boxcar) filter Median filter thyroid heart cancer banana GS: FGS: P F P M E GS.10.06.127 FGS.09.04.051 GS.12.22.375 FGS.11.23.326 GS.16.67 1.122 FGS.11.69.821 GS.11.12.255 FGS.10.12.160 = 0.1 grid search filtered grid search

Coordinate Descent Technique for reducing training time Eliminates full grid search 1-0 0 + 1

Coordinate Descent Technique for reducing training time Eliminates full grid search 1-0 0 + 1

Coordinate Descent Technique for reducing training time Eliminates full grid search 1-0 0 + 1

Coordinate Descent Technique for reducing training time Eliminates full grid search 1-0 0 + 1

Coordinate Descent Technique for reducing training time Eliminates full grid search 1-0 0 + 1

Coordinate Descent Technique for reducing training time Eliminates full grid search 1-0 0 + 1

Coordinate Descent Technique for reducing training time Eliminates full grid search 1-0 0 + 1

Coordinate Descent Results Training time is almost as fast as bias-shifting Performance comparable to full grid search Many more techniques for fast search possible thyroid heart cancer banana CD: FGS: P F P M E CD.08.04.066 FGS.09.04.051 CD.11.23.318 FGS.11.23.326 CD.11.68.871 FGS.11.69.821 CD.10.13.179 FGS.10.12.160 = 0.1 coordinate descent filtered grid search

Conclusion 2 -SVM consistently outperforms bias-shifting at controlling false alarms Simple techniques improve performance more accurate error estimation through filtering faster training through coordinate descent Applications: anomaly detection with minimum volume sets minimax classification Code available at www.dsp.rice.edu/software

References C. Scott and R. Nowak, A Neyman-Pearson approach to statistical learning, IEEE Transactions on Information Theory, 2005. B. Schölkopf, A. J. Smola, R. Williams, and P. Bartlett, New support vector algorithms, Neural Computation, 2000. H. G. Chew, R. E. Bogner, and C. C. Lim, Dual- support vector machine with error rate and training size biasing, ICASSP 2001. C. C. Chang and C. J. Lin, Training support vector classifiers: Theory and algorithms, Neural Computation, 2001. M. Davenport, The 2 -SVM: A cost-sensitive extension of the -SVM, http://www.ece.rice.edu/~md. C. Scott, Performance measures for Neyman-Pearson classification, http://www.stat.rice.edu/~cscott.