Detection of Outliers
|
|
- Junior Webb
- 6 years ago
- Views:
Transcription
1 Detection of Outliers TNM033 - Data Mining by Anton Auoja, Albert Backenhof & Mikael Dalkvist
2 Holy Outliers, Batman!! An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs. - Frank E. Grubbs
3
4 Holy Causes, Batman!! Apparatus malfunction. Fraudulent behavior. Human error. Natural deviations. Contamination.
5 Holy Applications, Batman!! Fraud Detection Medicine Public Health Sports statistics Detecting measurement errors
6 Holy WEKA, Batman!! Interquartile Range One Class Classifier DBScan
7 Holy Common Methods, Batman!! Statistical Distance Kernel High Dimensional
8 Holy Statistical Methods, Batman!! An outlier is an object with low probability with respect to the probability distribution model of the data. Model Based. Assume Gaussian distribution. Calculate the mean and standard deviation of the data. The probability of each object under the distribution can then be calculated.
9 Holy Examples, Batman!! Box Plots Trimmed Means Grubbs Test
10 Holy Box and Whisker Plots, Batman!! Interquartile Range Q3 - Q1 Lower Inner Fence: Q1-1.5*IQR Upper Inner Fence: Q *IQR Lower Outer Fence: Q1-3*IQR Upper Outer Fence: Q3 + 3*IQR
11 Holy Trimmed Means, Batman!! Delete percentage of extreme values. Calculate mean. Use new mean for comparison.
12 Holy Test, Grubbs!! Calculate the normal logarithm. Sort data. Calculate Z. Compare Z to the critical Z value.
13
14 Holy Issues, Batman!! Identifying distribution of data set. The number of attributes Mixtures of distribution
15 Holy Distance Based Methods, Batman!! DP(p,D) k-nearest Neighbor Local Distance Based
16 Holy DB(p,D), Knorr & Ng, Batman!! An object o is an outlier if at least the p:th fraction of all objects of the database are at a distance greater than D from the given object o.
17 Holy Distance to k-nearest Neighbors, Batman!! Outlier score. Score each object [0, [ depending on the distance to its k-nearest neighbors. Highly dependent on the choice of k. Can be modified to use the mean of distances of a point to all its 1NN, 2NN,..., knn as an outlier score.
18
19 Holy Local distance-based algorithms, Batman!! Determine the difference of an object from its nearest neighbors. A threshold value is set. All objects whose outlier factors exceed this value are considered to be outliers. Local Outlier Factor (LOF).
20 Holy Advantages, Batman!! More general and easier to apply then statistical approaches No probabilistic model needed Can find local outliers
21 Unholy Disadvantages, Batman!! Methods are typically O(n 2 ) Sensitive to choice of parameters Dependent on pre-defined parameters Can t handle datasets with regions that have widely differing density
22 Holy Kernel Based Methods, Batman!!
23
24 Original space Hilbert (Feature) space
25 X H
26
27 Holy Implicitly, Batman!! No additional memory or computation cost.
28 Holy High Dimensional, Batman!! Curse of Dimensionality
29 One way is to create subspaces of original space.
30 Another is Angle Based Outlier Degree.
31 Holy References, Batman!! Outlier Detection Techniques. Hans-Peter Kriegel, Peer Kröger and Arthur Zimek. Ludwig- Maximilians-Universität München Munich, Germany. A Review of Statistical Outlier Methods. Steven Walfish. Pharmaceutical Technology. Outlier Detection Algorithms in Data Mining Systems. M. I. Petrovskiy. Department of Computational Mathematics and Cybernetics, Moscow State University, Vorob evy gory, Moscow. Detection and Accommodation of Outliers in Normally Distributed Data Sets. Agata Fallon and Christine Spada. Outlier Detection with Kernel Density Functions. L. J. Latecki, A. Lazarevic, D. Pokrajac Classification by Support Vector Machines. F. Markowetz. Max-Planck-Institute for Molecular Genetics Introduction to Data Mining. Pang-Ning Tan, Michael Steinbach, Vipin Kumar
Chapter 5: Outlier Detection
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 5: Outlier Detection Lecture: Prof. Dr.
More informationDBSCAN. Presented by: Garrett Poppe
DBSCAN Presented by: Garrett Poppe A density-based algorithm for discovering clusters in large spatial databases with noise by Martin Ester, Hans-peter Kriegel, Jörg S, Xiaowei Xu Slides adapted from resources
More informationOutlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data
Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University
More informationAnomaly Detection on Data Streams with High Dimensional Data Environment
Anomaly Detection on Data Streams with High Dimensional Data Environment Mr. D. Gokul Prasath 1, Dr. R. Sivaraj, M.E, Ph.D., 2 Department of CSE, Velalar College of Engineering & Technology, Erode 1 Assistant
More informationDATA MINING II - 1DL460
DATA MINING II - 1DL460 Spring 2016 A second course in data mining!! http://www.it.uu.se/edu/course/homepage/infoutv2/vt16 Kjell Orsborn! Uppsala Database Laboratory! Department of Information Technology,
More informationOUTLIER MINING IN HIGH DIMENSIONAL DATASETS
OUTLIER MINING IN HIGH DIMENSIONAL DATASETS DATA MINING DISCUSSION GROUP OUTLINE MOTIVATION OUTLIERS IN MULTIVARIATE DATA OUTLIERS IN HIGH DIMENSIONAL DATA Distribution-based Distance-based NN-based Density-based
More informationPackage ldbod. May 26, 2017
Type Package Title Local Density-Based Outlier Detection Version 0.1.2 Author Kristopher Williams Package ldbod May 26, 2017 Maintainer Kristopher Williams Description
More information9 Classification: KNN and SVM
CSE4334/5334 Data Mining 9 Classification: KNN and SVM Chengkai Li Department of Computer Science and Engineering University of Texas at Arlington Fall 2017 (Slides courtesy of Pang-Ning Tan, Michael Steinbach
More informationAuthors: Coman Gentiana. Asparuh Hristov. Daniel Corteso. Fernando Nunez
OUTLIER DETECTOR DOCUMENTATION VERSION 1.0 Authors: Coman Gentiana Asparuh Hristov Daniel Corteso Fernando Nunez Copyright Team 6, 2011 Contents 1. Introduction... 1 2. Global variables used... 1 3. Scientific
More informationKeywords: Clustering, Anomaly Detection, Multivariate Outlier Detection, Mixture Model, EM, Visualization, Explanation, Mineset.
ISSN 2319-8885 Vol.03,Issue.35 November-2014, Pages:7140-7144 www.ijsetr.com Accurate and Efficient Anomaly Detection via Online Oversampling Principal Component Analysis K. RAJESH KUMAR 1, S.S.N ANJANEYULU
More informationPerformance Measures
1 Performance Measures Classification F-Measure: (careful: similar but not the same F-measure as the F-measure we saw for clustering!) Tradeoff between classifying correctly all datapoints of the same
More informationTopic 1 Classification Alternatives
Topic 1 Classification Alternatives [Jiawei Han, Micheline Kamber, Jian Pei. 2011. Data Mining Concepts and Techniques. 3 rd Ed. Morgan Kaufmann. ISBN: 9380931913.] 1 Contents 2. Classification Using Frequent
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Cluster Analysis Reading: Chapter 10.4, 10.6, 11.1.3 Han, Chapter 8.4,8.5,9.2.2, 9.3 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber &
More informationCourse Content. What is an Outlier? Chapter 7 Objectives
Principles of Knowledge Discovery in Data Fall 2007 Chapter 7: Outlier Detection Dr. Osmar R. Zaïane University of Alberta Course Content Introduction to Data Mining Association Analysis Sequential Pattern
More informationSigniTrend: Scalable Detection of Emerging Topics in Textual Streams by Hashed Significance Thresholds
SigniTrend: Scalable Detection of Emerging Topics in Textual Streams by Hashed Significance Thresholds Erich Schubert, Michael Weiler, Hans-Peter Kriegel! Institute of Informatics Database Systems Group
More informationClustering methods: Part 7 Outlier removal Pasi Fränti
Clustering methods: Part 7 Outlier removal Pasi Fränti 6.5.207 Machine Learning University of Eastern Finland Outlier detection methods Distance-based methods Knorr & Ng Density-based methods KDIST: K
More informationOutlier detection using autoencoders
Outlier detection using autoencoders August 19, 2016 Author: Olga Lyudchik Supervisors: Dr. Jean-Roch Vlimant Dr. Maurizio Pierini CERN Non Member State Summer Student Report 2016 Abstract Outlier detection
More informationKnowledge Discovery in Databases
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Lecture notes Knowledge Discovery in Databases Summer Semester 2012 Lecture 8: Clustering
More informationInternational Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at
Performance Evaluation of Ensemble Method Based Outlier Detection Algorithm Priya. M 1, M. Karthikeyan 2 Department of Computer and Information Science, Annamalai University, Annamalai Nagar, Tamil Nadu,
More informationData Mining Classification: Alternative Techniques. Imbalanced Class Problem
Data Mining Classification: Alternative Techniques Imbalanced Class Problem Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Class Imbalance Problem Lots of classification problems
More informationDensity-based clustering algorithms DBSCAN and SNN
Density-based clustering algorithms DBSCAN and SNN Version 1.0, 25.07.2005 Adriano Moreira, Maribel Y. Santos and Sofia Carneiro {adriano, maribel, sofia}@dsi.uminho.pt University of Minho - Portugal 1.
More informationMean-shift outlier detection
Mean-shift outlier detection Jiawei YANG a, Susanto RAHARDJA b a,1 and Pasi FRÄNTI a School of Computing, University of Eastern Finland b Northwestern Polytechnical University, Xi an, China Abstract. We
More informationDistribution-free Predictive Approaches
Distribution-free Predictive Approaches The methods discussed in the previous sections are essentially model-based. Model-free approaches such as tree-based classification also exist and are popular for
More informationCPSC 340: Machine Learning and Data Mining. Outlier Detection Fall 2016
CPSC 340: Machine Learning and Data Mining Outlier Detection Fall 2016 Admin Assignment 1 solutions will be posted after class. Assignment 2 is out: Due next Friday, but start early! Calculus and linear
More informationPCA Based Anomaly Detection
PCA Based Anomaly Detection P. Rameswara Anand 1,, Tulasi Krishna Kumar.K 2 Department of Computer Science and Engineering, Jigjiga University, Jigjiga, Ethiopi 1, Department of Computer Science and Engineering,Yogananda
More informationDAY 52 BOX-AND-WHISKER
DAY 52 BOX-AND-WHISKER VOCABULARY The Median is the middle number of a set of data when the numbers are arranged in numerical order. The Range of a set of data is the difference between the highest and
More informationNDoT: Nearest Neighbor Distance Based Outlier Detection Technique
NDoT: Nearest Neighbor Distance Based Outlier Detection Technique Neminath Hubballi 1, Bidyut Kr. Patra 2, and Sukumar Nandi 1 1 Department of Computer Science & Engineering, Indian Institute of Technology
More informationStatistics 202: Data Mining. c Jonathan Taylor. Outliers Based in part on slides from textbook, slides of Susan Holmes.
Outliers Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Concepts What is an outlier? The set of data points that are considerably different than the remainder of the
More informationKapitel 4: Clustering
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases WiSe 2017/18 Kapitel 4: Clustering Vorlesung: Prof. Dr.
More informationTiP: Analyzing Periodic Time Series Patterns
ip: Analyzing Periodic ime eries Patterns homas Bernecker, Hans-Peter Kriegel, Peer Kröger, and Matthias Renz Institute for Informatics, Ludwig-Maximilians-Universität München Oettingenstr. 67, 80538 München,
More informationNearest Neighbor Classifiers
Nearest Neighbor Classifiers TNM033 Data Mining Techniques Linköping University 2009-12-04 When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck.
More informationAdaptive Sampling and Learning for Unsupervised Outlier Detection
Proceedings of the Twenty-Ninth International Florida Artificial Intelligence Research Society Conference Adaptive Sampling and Learning for Unsupervised Outlier Detection Zhiruo Zhao and Chilukuri K.
More informationarxiv: v1 [cs.ai] 1 Nov 2016
Local Subspace-Based Outlier Detection using Global Neighbourhoods Bas van Stein, Matthijs van Leeuwen and Thomas Bäck LIACS, Leiden University, Leiden, The Netherlands Email: {b.van.stein,m.van.leeuwen,t.h.w.baeck}@liacs.leidenuniv.nl
More informationMachine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017
Machine Learning in the Wild Dealing with Messy Data Rajmonda S. Caceres SDS 293 Smith College October 30, 2017 Analytical Chain: From Data to Actions Data Collection Data Cleaning/ Preparation Analysis
More informationLarge Scale Data Analysis for Policy
Large Scale Data Analysis for Policy 90-866, Fall 2012 Lecture 9: Anomaly and Outlier Detection Parts of this lecture were adapted from Banerjee et al., Anomaly Detection: A Tutorial, presented at SDM
More informationI. INTRODUCTION II. RELATED WORK.
ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A New Hybridized K-Means Clustering Based Outlier Detection Technique
More informationData Exploration and Preparation Data Mining and Text Mining (UIC Politecnico di Milano)
Data Exploration and Preparation Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining, : Concepts and Techniques", The Morgan Kaufmann
More informationMeasures of Position. 1. Determine which student did better
Measures of Position z-score (standard score) = number of standard deviations that a given value is above or below the mean (Round z to two decimal places) Sample z -score x x z = s Population z - score
More informationComputer Department, Savitribai Phule Pune University, Nashik, Maharashtra, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 5 ISSN : 2456-3307 A Review on Various Outlier Detection Techniques
More informationData Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 4 Instance-Based Learning Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Instance Based Classifiers
More information10/5/2017 MIST.6060 Business Intelligence and Data Mining 1. Nearest Neighbors. In a p-dimensional space, the Euclidean distance between two records,
10/5/2017 MIST.6060 Business Intelligence and Data Mining 1 Distance Measures Nearest Neighbors In a p-dimensional space, the Euclidean distance between two records, a = a, a,..., a ) and b = b, b,...,
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.
More informationData Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining, 2 nd Edition
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Outline Prototype-based Fuzzy c-means
More informationOPTICS-OF: Identifying Local Outliers
Proceedings of the 3rd European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 99), Prague, September 1999. OPTICS-OF: Identifying Local Outliers Markus M. Breunig, Hans-Peter
More informationBoxplots. Lecture 17 Section Robb T. Koether. Hampden-Sydney College. Wed, Feb 10, 2010
Boxplots Lecture 17 Section 5.3.3 Robb T. Koether Hampden-Sydney College Wed, Feb 10, 2010 Robb T. Koether (Hampden-Sydney College) Boxplots Wed, Feb 10, 2010 1 / 34 Outline 1 Boxplots TI-83 Boxplots 2
More informationClustering Lecture 4: Density-based Methods
Clustering Lecture 4: Density-based Methods Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3
Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Look for accompanying R code on the course web site. Topics Exploratory Data Analysis
More informationChapter 3 - Displaying and Summarizing Quantitative Data
Chapter 3 - Displaying and Summarizing Quantitative Data 3.1 Graphs for Quantitative Data (LABEL GRAPHS) August 25, 2014 Histogram (p. 44) - Graph that uses bars to represent different frequencies or relative
More informationNAME: DIRECTIONS FOR THE ROUGH DRAFT OF THE BOX-AND WHISKER PLOT
NAME: DIRECTIONS FOR THE ROUGH DRAFT OF THE BOX-AND WHISKER PLOT 1.) Put the numbers in numerical order from the least to the greatest on the line segments. 2.) Find the median. Since the data set has
More informationDensity estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate
Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,
More informationOutlier Detection Techniques
LUDWIG- MAXIMILIANS- UNIVERSITÄT MÜNCHEN INSTITUTE FOR INFORMATICS DATABASE The 2010 SIAM International Conference on Data Mining Outlier Detection Techniques Hans-Peter Kriegel, Peer Kröger, Arthur Zimek
More informationA Meta analysis study of outlier detection methods in classification
A Meta analysis study of outlier detection methods in classification Edgar Acuna and Caroline Rodriguez edgar@cs.uprm.edu, caroline@math.uprm.edu Department of Mathematics University of Puerto Rico at
More informationChapter 9: Outlier Analysis
Chapter 9: Outlier Analysis Jilles Vreeken 8 Dec 2015 IRDM Chapter 9, overview 1. Basics & Motivation 2. Extreme Value Analysis 3. Probabilistic Methods 4. Cluster-based Methods 5. Distance-based Methods
More informationBox and Whisker Plot Review A Five Number Summary. October 16, Box and Whisker Lesson.notebook. Oct 14 5:21 PM. Oct 14 5:21 PM.
Oct 14 5:21 PM Oct 14 5:21 PM Box and Whisker Plot Review A Five Number Summary Activities Practice Labeling Title Page 1 Click on each word to view its definition. Outlier Median Lower Extreme Upper Extreme
More informationDetection of Anomalies using Online Oversampling PCA
Detection of Anomalies using Online Oversampling PCA Miss Supriya A. Bagane, Prof. Sonali Patil Abstract Anomaly detection is the process of identifying unexpected behavior and it is an important research
More informationOutlier Detection Techniques
LUDWIG- MAXIMILIANS- UNIVERSITÄT MÜNCHEN INSTITUTE FOR INFORMATICS DATABASE 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Outlier Detection Techniques Hans-Peter Kriegel, Peer Kröger,
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 4
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationCPSC 340: Machine Learning and Data Mining. Outlier Detection Fall 2018
CPSC 340: Machine Learning and Data Mining Outlier Detection Fall 2018 Admin Assignment 2 is due Friday. Assignment 1 grades available? Midterm rooms are now booked. October 18 th at 6:30pm (BUCH A102
More informationFiltered Clustering Based on Local Outlier Factor in Data Mining
, pp.275-282 http://dx.doi.org/10.14257/ijdta.2016.9.5.28 Filtered Clustering Based on Local Outlier Factor in Data Mining 1 Vishal Bhatt, 2 Mradul Dhakar and 3 Brijesh Kumar Chaurasia 1,2,3 Deptt. of
More informationAN IMPROVED DENSITY BASED k-means ALGORITHM
AN IMPROVED DENSITY BASED k-means ALGORITHM Kabiru Dalhatu 1 and Alex Tze Hiang Sim 2 1 Department of Computer Science, Faculty of Computing and Mathematical Science, Kano University of Science and Technology
More informationCOMP 6838 Data MIning
COMP 6838 Data MIning LECTURE 1: Introduction Dr. Edgar Acuna Departmento de Matematicas Universidad de Puerto Rico- Mayaguez math.uprm.edu/~edgar 1 Course s Objectives Understand the basic concepts to
More informationOUTLIER DATA MINING WITH IMPERFECT DATA LABELS
OUTLIER DATA MINING WITH IMPERFECT DATA LABELS Mr.Yogesh P Dawange 1 1 PG Student, Department of Computer Engineering, SND College of Engineering and Research Centre, Yeola, Nashik, Maharashtra, India
More informationData Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data
More informationDS504/CS586: Big Data Analytics Big Data Clustering II
Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: AK 232 Fall 2016 More Discussions, Limitations v Center based clustering K-means BFR algorithm
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More information3.3 The Five-Number Summary Boxplots
3.3 The Five-Number Summary Boxplots Tom Lewis Fall Term 2009 Tom Lewis () 3.3 The Five-Number Summary Boxplots Fall Term 2009 1 / 9 Outline 1 Quartiles 2 Terminology Tom Lewis () 3.3 The Five-Number Summary
More informationA Fast Randomized Method for Local Density-based Outlier Detection in High Dimensional Data
A Fast Randomized Method for Local Density-based Outlier Detection in High Dimensional Data Minh Quoc Nguyen, Edward Omiecinski, and Leo Mark College of Computing, Georgia Institute of Technology, Atlanta,
More informationPartition Based with Outlier Detection
Partition Based with Outlier Detection Saswati Bhattacharyya 1,RakeshK. Das 2,Nilutpol Sonowal 3,Aloron Bezbaruah 4, Rabinder K. Prasad 5 # Student 1, Student 2,student 3,student 4,Assistant Professor
More informationChuck Cartledge, PhD. 23 September 2017
Introduction Definitions Numerical data Hands-on Q&A Conclusion References Files Big Data: Data Analysis Boot Camp Agglomerative Clustering Chuck Cartledge, PhD 23 September 2017 1/30 Table of contents
More informationOutlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013
Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier
More informationA New Online Clustering Approach for Data in Arbitrary Shaped Clusters
A New Online Clustering Approach for Data in Arbitrary Shaped Clusters Richard Hyde, Plamen Angelov Data Science Group, School of Computing and Communications Lancaster University Lancaster, LA1 4WA, UK
More informationData Mining: An experimental approach with WEKA on UCI Dataset
Data Mining: An experimental approach with WEKA on UCI Dataset Ajay Kumar Dept. of computer science Shivaji College University of Delhi, India Indranath Chatterjee Dept. of computer science Faculty of
More informationFast anomaly discovery given duplicates
Fast anomaly discovery given duplicates Jay-Yoon Lee, U Kang, Danai Koutra, Christos Faloutsos Dec 2012 CMU-CS-12-146 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Abstract
More informationChapter 3: Data Description - Part 3. Homework: Exercises 1-21 odd, odd, odd, 107, 109, 118, 119, 120, odd
Chapter 3: Data Description - Part 3 Read: Sections 1 through 5 pp 92-149 Work the following text examples: Section 3.2, 3-1 through 3-17 Section 3.3, 3-22 through 3.28, 3-42 through 3.82 Section 3.4,
More informationAverages and Variation
Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus
More information15 Wyner Statistics Fall 2013
15 Wyner Statistics Fall 2013 CHAPTER THREE: CENTRAL TENDENCY AND VARIATION Summary, Terms, and Objectives The two most important aspects of a numerical data set are its central tendencies and its variation.
More informationGeneralizing the Optimality of Multi-Step k-nearest Neighbor Query Processing
Generalizing the Optimality of Multi-Step k-nearest Neighbor Query Processing SSTD 2007 Boston, U.S.A. Hans-Peter Kriegel, Peer Kröger, Peter Kunath, Matthias Renz Institute for Computer Science University
More informationOutlier Detection. Chapter 12
Contents 12 Outlier Detection 3 12.1 Outliers and Outlier Analysis.................... 4 12.1.1 What Are Outliers?..................... 4 12.1.2 Types of Outliers....................... 5 12.1.3 Challenges
More informationStatistics 202: Statistical Aspects of Data Mining
Statistics 202: Statistical Aspects of Data Mining Professor Rajan Patel Lecture 11 = Chapter 8 Agenda: 1)Reminder about final exam 2)Finish Chapter 5 3)Chapter 8 1 Class Project The class project is due
More informationCISC 4631 Data Mining
CISC 4631 Data Mining Lecture 03: Nearest Neighbor Learning Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof. R. Mooney (UT Austin) Prof E. Keogh (UCR), Prof. F.
More informationENHANCED DBSCAN ALGORITHM
ENHANCED DBSCAN ALGORITHM Priyamvada Paliwal #1, Meghna Sharma *2 # Software Engineering, ITM University Sector 23-A, Gurgaon, India *Asst. Prof. Dept. of CS, ITM University Sector 23-A, Gurgaon, India
More informationSTA 570 Spring Lecture 5 Tuesday, Feb 1
STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row
More informationarxiv: v2 [cs.lg] 21 Aug 2018
Outlier Detection by Consistent Data Selection Method Utkarsh Porwal ebay Inc San Jose, CA uporwal@ebay.com Smruthi Mukund ebay Inc San Jose, CA smukund@buffalo.edu arxiv:1712.04129v2 [cs.lg] 21 Aug 2018
More informationPackage subspace. October 12, 2015
Title Interface to OpenSubspace Version 1.0.4 Date 2015-09-30 Package subspace October 12, 2015 An interface to 'OpenSubspace', an open source framework for evaluation and exploration of subspace clustering
More informationMineração de Dados Aplicada
Data Exploration August, 9 th 2017 DCC ICEx UFMG Summary of the last session Data mining Data mining is an empiricism; It can be seen as a generalization of querying; It lacks a unified theory; It implies
More informationAutoEpsDBSCAN : DBSCAN with Eps Automatic for Large Dataset
AutoEpsDBSCAN : DBSCAN with Eps Automatic for Large Dataset Manisha Naik Gaonkar & Kedar Sawant Goa College of Engineering, Computer Department, Ponda-Goa, Goa College of Engineering, Computer Department,
More informationData Mining. Lecture 03: Nearest Neighbor Learning
Data Mining Lecture 03: Nearest Neighbor Learning Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof. R. Mooney (UT Austin) Prof E. Keogh (UCR), Prof. F. Provost
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More informationLoda: Lightweight on-line detector of anomalies
Noname manuscript No. (will be inserted by the editor) Loda: Lightweight on-line detector of anomalies Tomáš Pevný the date of receipt and acceptance should be inserted later In supervised learning it
More informationSTA Module 2B Organizing Data and Comparing Distributions (Part II)
STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and
More informationSTA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)
STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and
More informationTUBE: Command Line Program Calls
TUBE: Command Line Program Calls March 15, 2009 Contents 1 Command Line Program Calls 1 2 Program Calls Used in Application Discretization 2 2.1 Drawing Histograms........................ 2 2.2 Discretizing.............................
More informationOutlier Recognition in Clustering
Outlier Recognition in Clustering Balaram Krishna Chavali 1, Sudheer Kumar Kotha 2 1 M.Tech, Department of CSE, Centurion University of Technology and Management, Bhubaneswar, Odisha, India 2 M.Tech, Project
More informationClassification by Nearest Shrunken Centroids and Support Vector Machines
Classification by Nearest Shrunken Centroids and Support Vector Machines Florian Markowetz florian.markowetz@molgen.mpg.de Max Planck Institute for Molecular Genetics, Computational Diagnostics Group,
More informationCOMP20008 Elements of Data Processing. Outlier Detection and Clustering
COMP20008 Elements of Data Processing Outlier Detection and Clustering Today Outlier detection for high dimensional data (part I) A digression clustering algorithms K-means Hierarchical clustering Outlier
More informationDS504/CS586: Big Data Analytics Big Data Clustering II
Welcome to DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu Location: KH 116 Fall 2017 Updates: v Progress Presentation: Week 15: 11/30 v Next Week Office hours
More informationSTA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures
STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and
More informationA REVIEW ON VARIOUS APPROACHES OF CLUSTERING IN DATA MINING
A REVIEW ON VARIOUS APPROACHES OF CLUSTERING IN DATA MINING Abhinav Kathuria Email - abhinav.kathuria90@gmail.com Abstract: Data mining is the process of the extraction of the hidden pattern from the data
More informationPrepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.
Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good
More information