What are anomalies and why do we care?
|
|
- Brooke Briggs
- 5 years ago
- Views:
Transcription
1 Anomaly Detection Based on V. Chandola, A. Banerjee, and V. Kupin, Anomaly detection: A survey, ACM Computing Surveys, 41 (2009), Article 15, 58 pages.
2 Outline What are anomalies and why do we care? Different aspects of detection Anomaly detection applications Classification based anomaly detection methods Nearest neighbor based anomaly detection methods Clustering based anomaly detection methods Statistical anomaly detection methods Information theoretic methods Spectral anomaly detection methods Handling conceptual anomalities 2
3 What are anomalies and why do we care? Anomaly (outlier, discordant, exception, aberration, peculiarity, or contaminant) detection refers to finding patterns that have unexpected behavior. First studied in 1887 paper (Edgeworth) in statistics. Important since anomalies in data translate into significant and actionable information in apps. Useful in cyber-security, fraud or intrusion detection, health care, sensor or equipment failure, surveillance. Relevance or interestingness of anomalies is a feature. Related to noise accommodation/removal and novelty detection. 3
4 Major challenges to making detections: Defining a normal region that covers all normal behavior patterns. Anomalies due to malicious behavior will adapt over time to continue to be plagues. Normal behavior evolves over time. Different application domains have different views on what is an anomaly making a general theory difficult to establish. General lack of availability of useful data for training purposes in labeling algorithms. Noise is similar to an anomaly and leads to false detections. 4
5 Diversity of techniques: Statistics Machine learning Information theory Spectral theory Graph theory and topology which are used in different application areas and may be specific to a single one of them (i.e., not useful for developing an abstract theory). 5
6 Different aspects of anomaly detection A key aspect of any anomaly detection technique is the nature of the input data. Input is generally a collection of data instances (an object, record, point, vector, pattern, event, case, sample, observation, or entity). The attributes can be of different types such as binary, categorical, or continuous. Each data instance consists of one (univariate) or multiple (multivariate) attributes. Multiple data types are common in the multivariate case. 6
7 Attributes are key to which technique(s) to use in anomaly detection. Data instances cause relationships, too (e.g., sequence data, spatial data, and graphical data). In sequence data, data instances are linearly ordered. In spatial data, data instances are related to neighboring data points. Similarly with spatialtemporal data. In graphical data, data instances are related through vertices and edges. 7
8 Anomaly classification: 1. Point anomalies refer to when each data is independent of all of the others. 2. Contextual anomalies refer to when each data has a set of possible attributes. a. Contextual attributes: example is when latitude and longitude are attributes only. b. Behavioral attributes: example is average rainfall at a given point on the planet. 3. Collective anomalies refer to having multiple data instances with particular properties. An example is a sequence like ssh, buffer overflow, http-web, httpweb, ssh, buffer overflow, ftp, http-web, ftp, 8
9 Data labels are associated with data instances and can indicate many things including if the instance is normal or an anomaly. Getting highly labeled data is extremely expensive and is frequently achieved using training sets. Anomaly detection operates with labeled data in Supervised mode: a training set with normal and anomaly labeled data is available. Semisupervised mode: a training set with just normal labeled data is available. Unsupervised mode: no training set is required because it is assumed that normal data is far more common than anomalies. 9
10 Anomaly detection output using is one of only two forms: 1. Scores: Each instance receives a numeric score. Analysts will usually only look at high scored data to verify anomalies. Domain specific scoring thresholds are common and useful. 2. Labels: Normal or anomaly is assigned to each test instance. 10
11 Anomaly detection applications We consider a number of applications: Intrusion detection Fraud detection Medical and health anomaly detection Industrial damage detection Image processing Text processing Sensor networks There are many more application fields that can be considered with specialized methodologies. 11
12 Intrusion detection Intrusion detection refers to detection of malicious activity (break-ins, hacking attempts, penetrations, and other forms of computer abuse) on computer systems. Key challenges: Huge volume of data. The anomaly detection techniques must be computationally efficient. Data typically streamed and requires online analysis. False alarm rate can be too high. 12
13 Host based intrusion detection systems are required to handle the sequential nature of data. Moreover, point anomaly detection techniques are not applicable in this domain. The techniques have to either model the sequence data or compute similarity between sequences. Network based intrusion detection systems deal with network data. The intrusions typically occur as point anomalies though certain techniques model the data in a sequential fashion and detect collective anomalies. The big challenge here is that the anomalies evolve over time as hackers refine their techniques. 13
14 Methods used in intrusion detection: Statistical profiling using histograms Parametric or nonparametric statistical modeling Bayesian networks Mixture of models Neural networks Support vector machines Rule-based systems Clustering based Nearest neighbor based Spectral Information theoretic 14
15 Fraud detection Fraud detection refers to detection of criminal activities occurring in commercial organizations such as banks, credit card companies, insurance agencies, cell phone companies, stock market, etc. and may involve employees or customers. Immediate detection is wanted. Credit card fraud: Data is multidimensional (user id, amount spent, frequency of use, location, distance between last location, time between last use, history of items purchased in the past, ). 15
16 Point anomaly techniques typically used. By owner and by operation methods used. Want detection during the first fraudulent transaction. Do not want to irritate cardholder with false alarms that freeze a card (this can be really irritating when overseas during a hotel checkout or conference registration). Methods used in credit card fraud detection: Neural networks Rule based systems Clustering based 16
17 Mobile phone fraud detection The task is to scan a large set of accounts, examining the calling behavior of each, and to issue an alarm when an account appears to have been misused. Methods used in mobile phone fraud detection: Statistical profiling using histograms Parametric statistical modeling Neural networks Rule based systems Insurance claim fraud detection handled similarly. 17
18 Insider trading detection This refers to the knowledge of a pending merger, acquisition, a terrorist attack affecting a particular industry, pending legislation affecting a particular industry, or any information that would affect the stock prices in a particular industry. Insider trading can be detected by identifying anomalous trading activities in the regular and options markets and tax declarations. Methods used in insider trading detection: Statistical profiling using histograms Information theoretic 18
19 Medical and public health anomaly detection It works with patient records. The data can have anomalies due to several reasons, e.g., abnormal patient condition, instrumentation errors, or recording errors. Several techniques have also focused on detecting disease outbreaks in a specific area. Methods used in medical and public health anomaly detection: Parametric statistical modeling Neural networks Rule based systems 19
20 Bayesian networks Nearest neighbor based Industrial damage detection Industrial units suffer damage due to continuous usage and normal wear and tear. Such damage needs to be detected early to prevent further escalation and losses. The data in this domain is usually referred to as sensor data because it is recorded using different sensors and collected for analysis. There are two categories: 20
21 1. Fault Detection in Mechanical Units 2. Structural Defect Detection The methods used in industrial damage detection: Statistical profiling using histograms Parametric or nonparametric statistical modeling Bayesian networks Mixture of models Neural networks Rule-based systems Spectral 21
22 Image processing Look for interesting features, e.g., video surveillance, satellite imagery, xrays/ct scans/ Methods used in image processing: Bayesian networks Mixture of models Regression Neural networks Support vector machines Clustering based 22
23 Nearest neighbor based Text data anomaly detection Detect novel topics or events or news stories in a collection of documents or news articles. The anomalies are caused due to a new interesting event or an anomalous topic. The data in this domain is typically high dimensional and very sparse. The data also has a temporal aspect since the documents are collected over time. 23
24 Methods used in text data anomaly detection: Statistical profiling using histograms Mixture of models Neural networks Support vector machines Clustering based Sensor networks Anomalies in data collected from a sensor network can either mean that one or more sensors are faulty or they are detecting events (e.g., intrusions). Anomaly detection 24
25 in sensor networks can capture sensor fault detection or intrusion detection or both. By definition, the data is collected online and in a distributed manner. Distributed data mining is used. Methods used in sensor networks: Parametric statistical modeling Bayesian networks Nearest neighbor based Rule-based systems Spectral 25
26 Classification based anomaly detection methods Classification-based anomaly detection techniques operate in a similar two-phase fashion. The training phase learns a classifier using the available labeled training data. The testing phase classifies a test instance as normal or anomalous, using the classifier. Assumption: A classifier that can distinguish between normal and anomalous classes can be learned in the given feature space. 26
27 One-class classification based anomaly detection techniques assume that all training instances have only one class label. Multi-class classification based anomaly detection techniques assume that the train- ing data contains labeled instances belonging to multiple normal classes. Each class is tested for normalness inside a confidence level. Test instances are anomalous if no test is normal. 27
28 Neural networks based A basic multi-class anomaly detection technique using neural networks operates in two steps: 1. A neural network is trained on the normal training data to learn the different normal classes. 2. Each test instance is provided as an input to the neural network. Replicator Neural Networks have been used for one-class anomaly detection. If the neural net accepts the test input, it is normal. 28
29 Some neural net classification methods: Multi-layered perceptrons Neural trees Auto-associative networks Adaptive resonance theory based Radial basis function based Hopfield networks Oscillatory networks 29
30 Bayesian networks based The different attributes are assumed independent. This is a basic technique for a univariate categorical dataset. Given a test data instance it uses a native Bayesian network to estimate the posterior probability of observing a class label from a set of normal class labels and the anomaly class label. The class label with largest posterior is chosen as the predicted class for the given test instance. The likelihood of observing the test instance given a class and the prior on the class probabilities is estimated from the training data set. 30
31 Support vector machines based This method is applied one class setting and learn a region that contains the training data instances (a boundary). Kernels, such as radial basis function kernel, can be used to learn complex regions. For each test instance, the basic technique determines if the test instance falls within the learned region. If a test instance falls within the learned region, it is declared as normal. Otherwise it is an anomaly. Audio anomaly detection is one of the major uses of this method. 31
32 Rule based This method learns rules that capture the normal behavior of a system. A test instance that is not covered by any such rule is considered as an anomaly. Rule based techniques have been applied in both one class and multiclass settings. A basic multi-class rule-based technique consists of two steps: 1. Learn rules (each with a confidence level) from the training data using a rule learning algorithm. 32
33 2. Find for each test instance the rule that best captures the test instance. Complexity: This is dependent on which classification algorithm is used. Decision trees are faster than SVMs. +/- of classification based methods: + Classification-based techniques (especially the multi-class techniques) can use powerful algorithms that can distinguish between instances belonging to different classes. 33
34 + The testing phase of classification based methods is fast since each test instance needs to be compared against only a precomputed model. Multi-class classification based methods rely on the availability of accurate labels for various normal classes, which is often impossible. Classification based methods assign a label to each test instance, which can also become a disadvantage when a meaningful anomaly score is desired for the test instances. 34
35 Nearest neighbor based anomaly detection methods Assumption. Normal data instances occur in dense neighborhoods, while anomalies occur far from their closest neighbors. Nearest neighbor based anomaly detection methods can be broadly grouped into two categories: 1. Methods that use the distance of a data instance to its kth nearest neighbor as the anomaly score; 2. Methods that compute the relative density of each data instance to compute its anomaly score. 35
36 Using distance to kth nearest neighbor The anomaly score of a data instance is defined as its distance to its kth nearest neighbor in a given data set. Three extensions: 1. Modify the definition to obtain the anomaly score of a data instance. 2. Use different distance/similarity measures to handle different data types. 3. Improve the efficiency of the basic technique: the complexity of the basic technique is O(N 2 ), where N is the data size. Use faster methods. 36
37 For point 3, prune the search space by either ignoring instances that cannot be anomalous or by focusing on instances that are most likely to be anomalous. Using relative density Density based anomaly detection techniques estimate the density of the neighborhood of each data instance: Low density implies an anomaly. High density implies normal. Density based techniques perform poorly if the data has regions of varying densities. Approaches that try to 37
38 weigh the relative weights of neighboring dense neighborhoods have been developed. Complexity: O(N 2 ) +/- of nearest neighbor based methods: + They are unsupervised in nature and do not make any assumptions regarding the generative distribution for the data. Instead, they are purely data driven. + Semisupervised techniques perform better than unsupervised techniques in terms of missed anomalies, since the likelihood that an anomaly will 38
39 form a close neigh- borhood in the training data set is very low. + Adapting these methods to a different data type is easy: modify the measure. Missed anomalies for unsupervised methods: if the data has normal instances that do not have enough close neighbors, or if the data has anomalies that have enough close neigh- bors, the technique fails to label them correctly. Many false positives for semi-supervised methods: if the normal instances in the test data do not have enough similar normal instances in the training data. 39
40 The computational complexity of the testing phase is also a significant challenge since it involves computing the distance of each test instance. The performance of a nearest neighbor based technique greatly relies on a distance measure. 40
41 Clustering based anomaly detection methods Clustering is used to group similar data instances into clusters. Clustering is primarily an unsupervised technique though semi-supervised clustering has also been explored lately. Three formulations based on different assumptions: 1. Normal data instances belong to a cluster in the data, while anomalies do not belong to any cluster. 2. Normal data instances lie close to their closest cluster centroid, while anomalies are far away from their closest cluster centroid. 41
42 3. Normal data instances belong to large and dense clusters, while anomalies either belong to small or sparse clusters. Assumption 1 remarks: Apply a known clustering based algorithm to the data set and declare any data instance that does not belong to any cluster as anomalous. A disadvantage of such techniques is that they are not optimized to find anomalies, since the main aim of the underlying clustering algorithm is to find clusters. Assumption 2 remarks: 42
43 Methods consist of two steps: (1) the data is clustered using a clustering algorithm, and (2) for each data instance, its distance to its closest cluster centroid is calculated as its anomaly score. Can operate in semi-supervised mode. If the anomalies in the data form clusters by themselves, these techniques will not be able to detect such anomalies. Assumption 3 remarks: Methods declare instances belonging to clusters whose size or density is below a threshold as anomalous. 43
44 There are linear time algorithms. There are many similarities between clustering and nearest neighbor based anomaly detection methods. Complexity: Depends on the training and detection algorithms, but there are a few O(N) ones. +/- of clustering based methods: + Unsupervised mode is viable. + Complex data types handled by using a clustering algorithm that can handle the particular data type. 44
45 + The testing phase is fast since the number of clusters against which every test instance needs to be compared is a small constant. Performance is highly dependent on the effectiveness of clustering algorithms in capturing the cluster structure of normal instances. Many techniques detect anomalies as a byproduct of clustering, and hence are not optimized for anomaly detection. Miss anomalies: some clustering algorithms force every instance to be assigned to some cluster. Anomaly clusters are clusters, so missed. Sloooow: O(dN), where d= data. 45
46 Statistical anomaly detection methods Underlying principle: An anomaly is an observation that is suspected of being partially or wholly irrelevant because it is not generated by the stochastic model assumed. Assumption. Normal data instances occur in high probability regions of a stochastic model while anomalies occur in low probability regions of the stochastic model. Statistical techniques fit a statistical model (usually for normal behavior) to the given data and then apply a 46
47 statistical inference test to determine if an unseen instance belongs to this model or not. Parametric Methods Parametric methods assume that the normal data is generated by a parametric distribution with parameters θ and probability density function f (x,θ), where x is an observation. The anomaly score of a test instance (or observation) x is the inverse of the probability density function f (x,θ). The parameters θ are estimated from the given data. 47
48 Gaussian modeling based Such methods assume that the data is generated from a Gaussian distribution. The parameters are estimated using Maximum Likelihood Estimates (MLE). The distance of a data instance to the estimated mean is the anomaly score for that instance. A threshold is applied to the anomaly scores to determine the anomalies. Different techniques in this category calculate the distance to the mean and the threshold in different ways. Statistical rules commonly used: 48
49 Box plot rule Grubb s test Student t-test χ 2 test Regression model based Two steps: 1. A regression model is fitted on the data. 2. For each test instance, the residual for the test instance is used to determine the anomaly score. 49
50 The residual is the part of the instance that is unexplained by the regression model. The magnitude of the residual can be used as the anomaly score for the test instance, though statistical tests have been proposed to determine anomalies with certain confidence. Another variant that detects anomalies in multivariate time-series data is generated by an Autoregressive Moving Average (ARMA) model. 50
51 Mixture of parametric distributions based Such methods use a mixture of parametric statistical distributions to model the data. Two subcategories: 1. Those that model the normal instances and anomalies as separate parametric distributions. 2. Those that model only the normal instances as a mixture of parametric distributions. Subcategory remarks: 1. The testing phase involves determining which distribution, normal or anomalous, the test instance belongs to. 51
52 2. Model the normal instances as a mixture of parametric distributions. A test instance that does not belong to any of the learned models is declared to be an anomaly. Nonparametric methods These methods use nonparametric statistical models, such that the model structure is not defined a prioiri, but is instead determined dynamically from the data. These methods make fewer assumptions regarding the data, (e.g., smoothness of density) when compared to parametric techniques. 52
53 Histogram based The simplest nonparametric statistical method is to use histograms to maintain a profile of the normal data. The size of the bin used when building the histogram is key for anomaly detection: Too small: many normal test instances will fall in empty or rare bins, resulting in a high false alarm rate. Too large: many anomalous test instances will fall in frequent bins, resulting in a high false negative rate. 53
54 For univariate data there are two steps: 1. Build a histogram based on the different values taken by that feature in the training data. 2. Check if a test instance falls in any one of the bins of the histogram. If it does, then the test instance is normal. Otherwise it is anomalous. For multivariate data, a basic technique is to construct attributewise histograms. During testing, for each test instance the anomaly score for each attribute value of the test instance is calculated as the height of the bin that contains the attribute value. 54
55 The per attribute anomaly scores are aggregated to obtain an overall anomaly score for the test instance. Complexity: Completely dependent on the method(s) used. Good luck. +/- of statistical based methods: + Assumptions true, statistics true + Confidence levels provide, well, confidence + Unsupervised mode works if the distribution estimation step is robust to anomalies in data. Methods rely on the assumption that the data is generated from a particular distribution. This 55
56 assumption often does not hold true, especially for high dimensional real data sets. What was that famous quote about statistics??? Histograms are simple to implement and easily lie about the results. You get what you pay for. 56
57 Information theoretic methods Analyze the information content of a data set using different information theoretic measures such as Kolomogorov complexity, entropy, relative entropy, etc. Assumption: Anomalies in data induce irregularities in the information content of the data set. Let C(D) denote the complexity of a given data set, D. A basic information theoretic technique can be described as follows. Given a data set D, find the minimal subset of instances, I, such that C(D) C(D I) is maximum. All 57
58 instances in the subset thus obtained, are deemed as anomalous. The problem addressed by this basic technique is to find a Pareto-optimal solution, which does not have a single optimum, since there are two different objectives that need to be optimized. Complexity: This has exponential time complexity. Never, ever use it unless you have no other choice. +/- of information theoretic based methods: + Unsupervised mode works like a charm. + No assumptions about the underlying data. 58
59 The performance of such techniques is highly dependent on the choice of the information theoretic measure. Only with huge numbers of anomalies is even one found. Information theoretic techniques applied to sequences (and spatial data sets rely on the size of the substructure) are often nontrivial to obtain. It is difficult to associate an anomaly score with a test instance using this method. 59
60 Spectral anomaly detection methods Spectral techniques try to find an approximation of the data using a combination of attributes that capture the bulk of the variability in the data. Assumption. Data can be embedded into a lower dimensional subspace in which normal instances and anomalies appear significantly different. Determine such subspaces that the anomalous instances can be easily identified. Such techniques can work in an unsupervised as well as a semi-supervised setting. 60
61 Principal component analysis is the major algorithm used. Complexity: Typically O(d), but O(dim 2 ). Singular value decompositions are frequently used and O(N 2 ). +/- of spectral based methods: + Spectral techniques automatically perform dimensionality reduction and are suitable for handling high dimensional data sets. + They can be used as a preprocessing step followed by application of any existing anomaly detection technique in the transformed space. 61
62 + They can be used in an unsupervised setting. Spectral techniques are useful only if the normal and anomalous instances are separable in the lower dimensional embedding of the data. Have high computational complexity. 62
63 Defining concepts: Spatial Graphs Sequential Profile Handling conceptual anomalies There is very little literature in this area. It is rip for Ph.D. dissertations. 63
64 Quick summary A general theory is still an open research problem that will reward numerous students with Ph.D.s in the future. That said, there are many areas that have been developed over a long time. 64
Automatic Detection Of Suspicious Behaviour
University Utrecht Technical Artificial Intelligence Master Thesis Automatic Detection Of Suspicious Behaviour Author: Iris Renckens Supervisors: Dr. Selmar Smit Dr. Ad Feelders Prof. Dr. Arno Siebes September
More informationAnomaly Detection on Data Streams with High Dimensional Data Environment
Anomaly Detection on Data Streams with High Dimensional Data Environment Mr. D. Gokul Prasath 1, Dr. R. Sivaraj, M.E, Ph.D., 2 Department of CSE, Velalar College of Engineering & Technology, Erode 1 Assistant
More informationDATA MINING II - 1DL460
DATA MINING II - 1DL460 Spring 2016 A second course in data mining!! http://www.it.uu.se/edu/course/homepage/infoutv2/vt16 Kjell Orsborn! Uppsala Database Laboratory! Department of Information Technology,
More informationOutlier Detection. Chapter 12
Contents 12 Outlier Detection 3 12.1 Outliers and Outlier Analysis.................... 4 12.1.1 What Are Outliers?..................... 4 12.1.2 Types of Outliers....................... 5 12.1.3 Challenges
More informationLarge Scale Data Analysis for Policy
Large Scale Data Analysis for Policy 90-866, Fall 2012 Lecture 9: Anomaly and Outlier Detection Parts of this lecture were adapted from Banerjee et al., Anomaly Detection: A Tutorial, presented at SDM
More informationUNSUPERVISED LEARNING FOR ANOMALY INTRUSION DETECTION Presented by: Mohamed EL Fadly
UNSUPERVISED LEARNING FOR ANOMALY INTRUSION DETECTION Presented by: Mohamed EL Fadly Outline Introduction Motivation Problem Definition Objective Challenges Approach Related Work Introduction Anomaly detection
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationAnomaly Detection in Categorical Datasets with Artificial Contrasts. Seyyedehnasim Mousavi
Anomaly Detection in Categorical Datasets with Artificial Contrasts by Seyyedehnasim Mousavi A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science Approved October
More informationINFORMATION-THEORETIC OUTLIER DETECTION FOR LARGE-SCALE CATEGORICAL DATA
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,
More informationAnomaly Detection. You Chen
Anomaly Detection You Chen 1 Two questions: (1) What is Anomaly Detection? (2) What are Anomalies? Anomaly detection refers to the problem of finding patterns in data that do not conform to expected behavior
More informationOutlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data
Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University
More informationDatabase and Knowledge-Base Systems: Data Mining. Martin Ester
Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro
More informationUniversity of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques
University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2015 11. Non-Parameteric Techniques
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationCPSC 340: Machine Learning and Data Mining. Outlier Detection Fall 2018
CPSC 340: Machine Learning and Data Mining Outlier Detection Fall 2018 Admin Assignment 2 is due Friday. Assignment 1 grades available? Midterm rooms are now booked. October 18 th at 6:30pm (BUCH A102
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining
More informationStatistics 202: Data Mining. c Jonathan Taylor. Outliers Based in part on slides from textbook, slides of Susan Holmes.
Outliers Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Concepts What is an outlier? The set of data points that are considerably different than the remainder of the
More informationUniversity of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques
University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2011 11. Non-Parameteric Techniques
More informationOUTLIER DATA MINING WITH IMPERFECT DATA LABELS
OUTLIER DATA MINING WITH IMPERFECT DATA LABELS Mr.Yogesh P Dawange 1 1 PG Student, Department of Computer Engineering, SND College of Engineering and Research Centre, Yeola, Nashik, Maharashtra, India
More informationDetection of Anomalies using Online Oversampling PCA
Detection of Anomalies using Online Oversampling PCA Miss Supriya A. Bagane, Prof. Sonali Patil Abstract Anomaly detection is the process of identifying unexpected behavior and it is an important research
More informationPart I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a
Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering
More informationInternational Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani
LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models
More informationChapter 5: Outlier Detection
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 5: Outlier Detection Lecture: Prof. Dr.
More informationJarek Szlichta
Jarek Szlichta http://data.science.uoit.ca/ Approximate terminology, though there is some overlap: Data(base) operations Executing specific operations or queries over data Data mining Looking for patterns
More informationBig Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1
Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that
More informationDetection and Deletion of Outliers from Large Datasets
Detection and Deletion of Outliers from Large Datasets Nithya.Jayaprakash 1, Ms. Caroline Mary 2 M. tech Student, Dept of Computer Science, Mohandas College of Engineering and Technology, India 1 Assistant
More informationContents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation
Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationTable Of Contents: xix Foreword to Second Edition
Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data
More informationOUTLIER MINING IN HIGH DIMENSIONAL DATASETS
OUTLIER MINING IN HIGH DIMENSIONAL DATASETS DATA MINING DISCUSSION GROUP OUTLINE MOTIVATION OUTLIERS IN MULTIVARIATE DATA OUTLIERS IN HIGH DIMENSIONAL DATA Distribution-based Distance-based NN-based Density-based
More informationCPSC 340: Machine Learning and Data Mining. Hierarchical Clustering Fall 2017
CPSC 340: Machine Learning and Data Mining Hierarchical Clustering Fall 2017 Assignment 1 is due Friday. Admin Follow the assignment guidelines naming convention (a1.zip/a1.pdf). Assignment 0 grades posted
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationPerformance Analysis of Data Mining Classification Techniques
Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal
More informationPredictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA
Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,
More informationThis tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.
About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts
More informationCOSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor
COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationOverview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010
INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,
More informationCS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University
CS377: Database Systems Data Warehouse and Data Mining Li Xiong Department of Mathematics and Computer Science Emory University 1 1960s: Evolution of Database Technology Data collection, database creation,
More informationMachine Learning. Supervised Learning. Manfred Huber
Machine Learning Supervised Learning Manfred Huber 2015 1 Supervised Learning Supervised learning is learning where the training data contains the target output of the learning system. Training data D
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationContents. Foreword to Second Edition. Acknowledgments About the Authors
Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationLab 9. Julia Janicki. Introduction
Lab 9 Julia Janicki Introduction My goal for this project is to map a general land cover in the area of Alexandria in Egypt using supervised classification, specifically the Maximum Likelihood and Support
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 2014-2015 Jakob Verbeek, November 28, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15
More informationChapter 9: Outlier Analysis
Chapter 9: Outlier Analysis Jilles Vreeken 8 Dec 2015 IRDM Chapter 9, overview 1. Basics & Motivation 2. Extreme Value Analysis 3. Probabilistic Methods 4. Cluster-based Methods 5. Distance-based Methods
More informationMSA220 - Statistical Learning for Big Data
MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationMachine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017
Machine Learning in the Wild Dealing with Messy Data Rajmonda S. Caceres SDS 293 Smith College October 30, 2017 Analytical Chain: From Data to Actions Data Collection Data Cleaning/ Preparation Analysis
More informationCAMCOS Report Day. December 9 th, 2015 San Jose State University Project Theme: Classification
CAMCOS Report Day December 9 th, 2015 San Jose State University Project Theme: Classification On Classification: An Empirical Study of Existing Algorithms based on two Kaggle Competitions Team 1 Team 2
More informationCOMP 551 Applied Machine Learning Lecture 13: Unsupervised learning
COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551
More informationDATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane
DATA MINING AND MACHINE LEARNING Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Data preprocessing Feature normalization Missing
More informationChapter 3: Supervised Learning
Chapter 3: Supervised Learning Road Map Basic concepts Evaluation of classifiers Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Summary 2 An example
More informationData Mining Classification - Part 1 -
Data Mining Classification - Part 1 - Universität Mannheim Bizer: Data Mining I FSS2019 (Version: 20.2.2018) Slide 1 Outline 1. What is Classification? 2. K-Nearest-Neighbors 3. Decision Trees 4. Model
More informationCPSC 340: Machine Learning and Data Mining. Outlier Detection Fall 2016
CPSC 340: Machine Learning and Data Mining Outlier Detection Fall 2016 Admin Assignment 1 solutions will be posted after class. Assignment 2 is out: Due next Friday, but start early! Calculus and linear
More informationCS 521 Data Mining Techniques Instructor: Abdullah Mueen
CS 521 Data Mining Techniques Instructor: Abdullah Mueen LECTURE 2: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationSOCIAL MEDIA MINING. Data Mining Essentials
SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate
More informationData Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\
Data Preprocessing S - MAI AMLT - 2016/2017 (S - MAI) Data Preprocessing AMLT - 2016/2017 1 / 71 Outline 1 Introduction Data Representation 2 Data Preprocessing Outliers Missing Values Normalization Discretization
More informationData Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data
More informationChapter 1, Introduction
CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from
More informationOn Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions
On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions CAMCOS Report Day December 9th, 2015 San Jose State University Project Theme: Classification The Kaggle Competition
More informationUnsupervised Learning
Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,
More informationPerformance Evaluation of Various Classification Algorithms
Performance Evaluation of Various Classification Algorithms Shafali Deora Amritsar College of Engineering & Technology, Punjab Technical University -----------------------------------------------------------***----------------------------------------------------------
More informationClustering & Classification (chapter 15)
Clustering & Classification (chapter 5) Kai Goebel Bill Cheetham RPI/GE Global Research goebel@cs.rpi.edu cheetham@cs.rpi.edu Outline k-means Fuzzy c-means Mountain Clustering knn Fuzzy knn Hierarchical
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining Hierarchical Clustering and Outlier Detection Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. Admin Assignment 2 is due
More informationInfluence Maximization in Location-Based Social Networks Ivan Suarez, Sudarshan Seshadri, Patrick Cho CS224W Final Project Report
Influence Maximization in Location-Based Social Networks Ivan Suarez, Sudarshan Seshadri, Patrick Cho CS224W Final Project Report Abstract The goal of influence maximization has led to research into different
More informationTopics in Machine Learning
Topics in Machine Learning Gilad Lerman School of Mathematics University of Minnesota Text/slides stolen from G. James, D. Witten, T. Hastie, R. Tibshirani and A. Ng Machine Learning - Motivation Arthur
More informationData Preprocessing. Data Mining 1
Data Preprocessing Today s real-world databases are highly susceptible to noisy, missing, and inconsistent data due to their typically huge size and their likely origin from multiple, heterogenous sources.
More informationComputational Statistics and Mathematics for Cyber Security
and Mathematics for Cyber Security David J. Marchette Sept, 0 Acknowledgment: This work funded in part by the NSWC In-House Laboratory Independent Research (ILIR) program. NSWCDD-PN--00 Topics NSWCDD-PN--00
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More informationUniversity of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques.
. Non-Parameteric Techniques University of Cambridge Engineering Part IIB Paper 4F: Statistical Pattern Processing Handout : Non-Parametric Techniques Mark Gales mjfg@eng.cam.ac.uk Michaelmas 23 Introduction
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationMachine Learning : Clustering, Self-Organizing Maps
Machine Learning Clustering, Self-Organizing Maps 12/12/2013 Machine Learning : Clustering, Self-Organizing Maps Clustering The task: partition a set of objects into meaningful subsets (clusters). The
More informationSTATISTICS (STAT) Statistics (STAT) 1
Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).
More information3 Feature Selection & Feature Extraction
3 Feature Selection & Feature Extraction Overview: 3.1 Introduction 3.2 Feature Extraction 3.3 Feature Selection 3.3.1 Max-Dependency, Max-Relevance, Min-Redundancy 3.3.2 Relevance Filter 3.3.3 Redundancy
More informationRobot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning
Robot Learning 1 General Pipeline 1. Data acquisition (e.g., from 3D sensors) 2. Feature extraction and representation construction 3. Robot learning: e.g., classification (recognition) or clustering (knowledge
More informationData Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\
Data Preprocessing Javier Béjar BY: $\ URL - Spring 2018 C CS - MAI 1/78 Introduction Data representation Unstructured datasets: Examples described by a flat set of attributes: attribute-value matrix Structured
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationStatistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte
Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,
More informationSupervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationA REVIEW ON VARIOUS APPROACHES OF CLUSTERING IN DATA MINING
A REVIEW ON VARIOUS APPROACHES OF CLUSTERING IN DATA MINING Abhinav Kathuria Email - abhinav.kathuria90@gmail.com Abstract: Data mining is the process of the extraction of the hidden pattern from the data
More informationInternational Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at
Performance Evaluation of Ensemble Method Based Outlier Detection Algorithm Priya. M 1, M. Karthikeyan 2 Department of Computer and Information Science, Annamalai University, Annamalai Nagar, Tamil Nadu,
More informationData Preprocessing. Data Preprocessing
Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should
More informationOne-class Problems and Outlier Detection. 陶卿 中国科学院自动化研究所
One-class Problems and Outlier Detection 陶卿 Qing.tao@mail.ia.ac.cn 中国科学院自动化研究所 Application-driven Various kinds of detection problems: unexpected conditions in engineering; abnormalities in medical data,
More informationCSE4334/5334 DATA MINING
CSE4334/5334 DATA MINING Lecture 4: Classification (1) CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy
More informationSupport vector machines
Support vector machines Cavan Reilly October 24, 2018 Table of contents K-nearest neighbor classification Support vector machines K-nearest neighbor classification Suppose we have a collection of measurements
More informationRandom Forest A. Fornaser
Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University
More informationMachine Learning - Clustering. CS102 Fall 2017
Machine Learning - Fall 2017 Big Data Tools and Techniques Basic Data Manipulation and Analysis Performing well-defined computations or asking well-defined questions ( queries ) Data Mining Looking for
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest
More informationThe Curse of Dimensionality
The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more
More informationMachine Learning Lecture 3
Machine Learning Lecture 3 Probability Density Estimation II 19.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Exam dates We re in the process
More informationMulti-label classification using rule-based classifier systems
Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar
More informationClassifiers and Detection. D.A. Forsyth
Classifiers and Detection D.A. Forsyth Classifiers Take a measurement x, predict a bit (yes/no; 1/-1; 1/0; etc) Detection with a classifier Search all windows at relevant scales Prepare features Classify
More informationArtificial Neural Networks (Feedforward Nets)
Artificial Neural Networks (Feedforward Nets) y w 03-1 w 13 y 1 w 23 y 2 w 01 w 21 w 22 w 02-1 w 11 w 12-1 x 1 x 2 6.034 - Spring 1 Single Perceptron Unit y w 0 w 1 w n w 2 w 3 x 0 =1 x 1 x 2 x 3... x
More information