An Unsupervised Approach for Combining Scores of Outlier Detection Techniques, Based on Similarity Measures

Similar documents
Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

April 3, 2012 T.C. Havens

Machine Learning Techniques for Data Mining

Slides for Data Mining by I. H. Witten and E. Frank

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013

Network Traffic Measurements and Analysis

Introduction to Artificial Intelligence

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Gene Clustering & Classification

node2vec: Scalable Feature Learning for Networks

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar

Clustering CS 550: Machine Learning

Unsupervised Learning

CHAPTER 4: CLUSTER ANALYSIS

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

Computer Vision Group Prof. Daniel Cremers. 8. Boosting and Bagging

Lecture 25: Review I

Contents. Preface to the Second Edition

ECG782: Multidimensional Digital Signal Processing

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Machine Learning in Python. Rohith Mohan GradQuant Spring 2018

International Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at

University of Florida CISE department Gator Engineering. Clustering Part 2

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Problems 1 and 5 were graded by Amin Sorkhei, Problems 2 and 3 by Johannes Verwijnen and Problem 4 by Jyrki Kivinen. Entropy(D) = Gini(D) = 1

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá

Cluster Analysis. Ying Shen, SSE, Tongji University

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Exploratory data analysis for microarrays

University of Florida CISE department Gator Engineering. Clustering Part 4

Unsupervised Learning : Clustering

Overview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8

Hierarchical Clustering

Adaptive Sampling and Learning for Unsupervised Outlier Detection

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest)

Applying Supervised Learning

Part I: Data Mining Foundations

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

ECS 234: Data Analysis: Clustering ECS 234

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

Clustering Part 4 DBSCAN

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Table Of Contents: xix Foreword to Second Edition

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

Machine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

ECLT 5810 Clustering

7. Boosting and Bagging Bagging

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Semi-supervised learning and active learning

ECLT 5810 Clustering

Data Mining Practical Machine Learning Tools and Techniques

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

Mining Web Data. Lijun Zhang

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Classification: Feature Vectors

Salford Systems Predictive Modeler Unsupervised Learning. Salford Systems

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

CS381V Experiment Presentation. Chun-Chen Kuo

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

Clustering Lecture 4: Density-based Methods

Clustering. Chapter 10 in Introduction to statistical learning

Supervised vs.unsupervised Learning

Understanding Clustering Supervising the unsupervised

Clustering and Visualisation of Data

Supplementary Material

Random Forest A. Fornaser

Hierarchical Clustering 4/5/17

Ensemble Methods: Bagging

CLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

What to come. There will be a few more topics we will cover on supervised learning

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Semi-Supervised Clustering with Partial Background Information

CPSC 340: Machine Learning and Data Mining. Outlier Detection Fall 2018

CSE 5243 INTRO. TO DATA MINING

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Adaptive Boosting for Spatial Functions with Unstable Driving Attributes *

Anomaly Detection. You Chen

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Dimension reduction : PCA and Clustering

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

Chapter 3: Data Description - Part 3. Homework: Exercises 1-21 odd, odd, odd, 107, 109, 118, 119, 120, odd

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

Information Retrieval and Web Search Engines

Data Informatics. Seon Ho Kim, Ph.D.

Exploratory Data Analysis using Self-Organizing Maps. Madhumanti Ray

Anomaly Detection on Data Streams with High Dimensional Data Environment

Based on Raymond J. Mooney s slides

AN IMPROVED DENSITY BASED k-means ALGORITHM

Local Context Selection for Outlier Ranking in Graphs with Multiple Numeric Node Attributes

Supervised and Unsupervised Learning (II)

Transcription:

An Unsupervised Approach for Combining Scores of Outlier Detection Techniques, Based on Similarity Measures José Ramón Pasillas-Díaz, Sylvie Ratté Presenter: Christoforos Leventis 1

Basic concepts Outlier detection Construction of ensembles Two unsupervised approaches based on a weighted combination of outlier detection algorithms 2

Outlier detection The discovery of observations that deviates from normal behaviour Rapidly evolving through time Set of algorithms are being designed to detect these rare/crucial events 3

Approaches on outlier detection Supervised Algorithms Semi-Supervised Algorithms Use of labels for Outlier and Inliers High accuracy Label data are harder to obtain Train your model with mislabeled data + + Unlabeled data are easier to obtain Avoids the bias introduced by training your model with anomalous data Use of labels for Inliers Unsupervised Algorithms + - No single use of label 4

Outlier detection algorithms Local Outlier Factor (LOF) K-means Hierarchical clustering Modified box plot 5

Local Outlier Factor Local Outlier Factor Most performing outlier detection technique Density of the neighborhood Example with number of nearest neighbours = 2 : Manhattan Distance Example : X(a,b),Y(c,d) => a-c + b-d 6

Local Outlier Factor LOF(o) ~ 1 means Similar density as neighbors LOF(o) < 1 means Higher density than neighbors (Inlier) LOF(o) > 1 means Lower density than neighbors (Outlier) 7

K-means Distance based approach Divide data into groups depending on the closest centroid Outlierness of a point is equal to the distance to its closest centroid Example with centroids=3 (k=3): 1.K initial "means" are randomly generated within the data domain 2. K clusters are created by associating every observation with the nearest distance. 3. The centroid of each of the K clusters becomes the new mean 4. Steps 2 and 3 are repeated until convergence has been reached. 8

Hierarchical clustering Distance based approach Divide data into clusters until the data cannot be divided any further Outlier is when point presents more resistance to being merged into a cluster Example : 9

Modified boxplot Simple statistical base approach Example: 1. 2. 3. Data: { 3,12,15,16,16,17,19,34 } Min = 3 Max = 34 Q2 (median) = 16 Q1 = 13.5 Q3 = 18 1.5(IQR) Rule = 1.5 * (Q3-Q1) = 1.5 * (18-13.5)=6.75 Find the Lower and Upper outlier Lower = Q1-6.75 = 13.5-6.75 = 6.75 Upper = Q3 + 6.75 = 18 + 6.75 = 24.75 10

Categories of learning techniques Single Learning Ensemble Learning Boosting Stacking Bagging Feature Bagging Breadth First Feature Bagging cumulative sum 11

Single learner Output Y contains errors Each algorithm has its own bias Some algorithms have an overfitting effect 12

Ensemble learning Factors for ensembling: Accuracy Diversity Quality of the output Distinct results Complementary results Mix of algorithms whose errors are not identical 13

Why are ensembles important? Turn a weak learner into a strong learner Lower Error than any individual method by itself Increment of detection rate Less overfitting than any individual method by itself By combining the scores together you reduce the bias 14

Boosting 15

Stacking Meta model 16

Bagging 17

Feature Bagging Breadth First Sorts the outlier scores from all iterations of FB Takes the index of the highest score and inserts the index to a vector Final output: IndFINAL contains indices of the data records ASFINAL contains the probabilities of being outlier Sensitive to the order of outlier detection algorithms 18

Feature Bagging Cumulative sum Create a vector which each index is the sum of all the scores that correspond to each observation (NC) Sort the vectors of each algorithm. Finally uses ranking to identifies the outliers a.k.a sum(nc1) = AS1,1+AS2,4+...+ASt,2 NC1 is ranked : 1st outlier in algorithm 1, 4th outlier in algorithm 2, 2nd outlier in algorithm t NC2 is ranked: 4th outlier in algorithm 1. 19

Comparing Ensemble techniques Bagging Boosting Stacking Data Partitioning Random samples are drawn with replacement Every new subset contains the samples that were misclassified by previous models Various Goal Minimize Variance Increase predictive power Both Fusion of models (Weighted) Average (Weighted) Majority Vote Meta model to estimate the weights 20

Hypothesis & Solution Better performance can be achieved by joining the outputs of different algorithms and setting specific weights without prior knowledge of the output labels Two unsupervised ensemble approaches based on weighted combination of outlier detection algorithms 1. 2. Give weights based on the performance of each algorithm Increase the differentiation between inliers and outliers by creating of the ensemble with a varied set of algorithms 21

Approach: Algorithm 22

Standardization of scores Normalization method in ensemble outlier detection Different outlier detection algorithms produce score at different scale LOF tends to produce values close to 1 Hierarchical clustering produces values on larger scale Method : Z = (Xi - Mean ) / SD SD = Standard Deviation Large scale scores maintain a large value after joining the ensemble 23

Determine Votes 1. 2. 3. 4. Take the standardized scores for each algorithm Apply modified Box Plot in order to find the deviations that are greater than the rest An observation receives a vote IF [ its score > 1.5*IQR ] Output : Vector of votes with size m x T m = Number of Observations T = Number of Algorithms 24

Determine weights W Each outlier algorithm has assigned a weight with a score for each observation (it s not enough) Weight W vector : Increases the weight for outliers Maintains the weight for inliers Weight W vector is calculated with two approaches: Ensemble of Detectors with Correlated Votes (EDCV) Ensemble of Detectors with Variability Votes (EDVV) 25

Approach : EDCV 26

Approach : EDCV Matrix of correlations C with dim(m,n) is obtained by calculating the correlation between standardized scores F Take each row and sum the values W = { w1, w2,,wn } (w1,w2,wn are the weights corresponding to each algorithm) For each sum of the row Apply : (sum-1)/ T-1 27

Approach : EDVV 28

Approach : EDVV Matrix D with dim(m,n) is obtained by calculating the mean absolute deviation between standardized scores F and transform them to a compatible form by using the complement 1-MAD Take each row and sum the values W = { w1, w2,,wn } (w1,w2,wn are the weights corresponding to each algorithm) For each sum of the row Apply : sum / T-1 29

Votes vs Weights Step 8 : Determine votes(v) Step 9 : Determine weights(w) 1. 2. 3. Votes increase the difference between outliers and inliers Votes are produced individual for each observation Weights maintains the actual weight of an inliner and increases the weight of the outlier 30

Combining scores 1. 2. Calculate the product of each of the standardized scores F and their corresponding votes in matrix V The results values are updated by applying the weights W obtained with one of the approaches 31

Recap Algorithms that were used in ensemble LOF K-means clustering Hierarchical Clustering Modified boxplot T is the number of rounds and it was set to 4 Apply the Weights W obtained from one of the approaches (EDCV, EDVV) and update the values 32

Experiments Compared the results of their approaches (EDCV,EDVV) with Simple Averaging, FB Cumulative Sum, FB Breadth First Both FB algorithms were set to 50 iterations Simple Averaging,EDCV,EDVV were set to 4 iterations LOF, number of neighbours = 20 K- means, with K=11 Hierarchical Clustering & Modified boxplot (default) 33

Datasets Info 34

Evaluation (ROC) 35

Evaluation (AUC) 36

Evaluation Conclusions AUC EDVC & EDVV outperformed the rest in almost all datasets In Ann_thryroid dataset FB Breadth First showed strong dependence on the order on the outputs of the algorithms ROC EDVC & EDVV show better results in all datasets except Ann_thyroid and Satimages where only EDCV has higher results then the rest 37

Evaluation Conclusions EDCV & EDVV do not assume an exceptional and good perfmonance of the algorithms EDCV & EDVV assign weights to the algorithms based on their performance on each dataset EDCV & EDVV showed constant improvement in datasets that were originally designed for binary classification 38

Conclusion & Future work Conclusion Two unsupervised novels ensemble approaches for combining the output scores of different outlier algorithms: EDCV & EDVV Both approaches achieved (almost) better performance than similar methods with only 4 iterations instead of FB with 50 iterations Future work Use Feature Bagging variation in order to achieve better results 39

Discussion Motivation Soundness The authors proposed two novel completely unsupervised approaches for combining the outputs of different outlier detection algorithms (EDCV & EDVV) that outperformed similar methods with a reasonable number of iterations Technical Depth The paper reports experimental results on a varied datasets. The experiments were made on all proposed methods and compared to similar methods Novelty Their approach tries to achieve better accuracy on outliers not by comparing the output of the algorithms but instead assign weights based on their performance Non trivial but easy to follow Presentation Structured presentation with almost no figures and zero examples 40