Spatial Outlier Detection
|
|
- Rudolph Kennedy
- 5 years ago
- Views:
Transcription
1 Spatial Outlier Detection Chang-Tien Lu Department of Computer Science Northern Virginia Center Virginia Tech Joint work with Dechang Chen, Yufeng Kou, Jiang Zhao 1
2 Spatial Outlier A spatial data point that is extreme relative to its neighbors 2
3 Outline Single-Attribute Spatial Outlier Detection Z-value approach Iterative Approach & Median Multi-Attribute Spatial Outlier Detection Region Outlier Detection & Tracking Conclusion 3
4 An Example of Spatial Outlier Spatial outlier: S, global outlier: G, L 4
5 5 Spatial Outlier Detection: Z s(x) approach θ σ µ > = s s x s x S Z ) ( ) ( = )) ( ( 1 ) ( ) ( ) ( y f k x f x S x N y Function: If Declare x as a spatial outlier
6 Evaluation of Statistical Assumption Distribution of traffic station attribute f(x) is normal S 1 x ) = f ( x ) y N ( x ( f ( y )) k ( ) Distribution of is normal too! 6
7 Outline Single-Attribute Spatial Outlier Detection Z-value approach Iterative & Median Approach Multi-Attribute Spatial Outlier Detection Region Outlier Detection & Tracking Conclusion 7
8 Motivation Number of neighbors: k=3 Expected outliers: S1, S2, S3 Outliers detected by traditional approaches: S1, E1, E2 Why inconsistent? An outlier may have negative impact on its nearby points 8
9 Motivation of Proposed Algorithms Objective Eliminate the negative impact of detected spatial outlier on its nearby points, for example: S1 Find spatial outliers that will be ignored by traditional algorithms, for example: S2 Solutions: Iterative algorithms Each iteration detect only one spatial outlier Before a new iteration, substitute the attribute value of the previous detected spatial outlier with the average attribute value of its neighbors Median algorithm Use Median to represent the average attribute value of neighbors 9
10 Iterative Z-value Algorithm In each iteration: Compute the standardized difference (Zvalue) for every point in the dataset: z i = d i σ µ The point with largest Z-value identified as a spatial outlier Substitute the attribute value of the previous detected spatial outlier with the average attribute value of its neighbors 10
11 In each iteration: Iterative Ratio Algorithm Compute the ratio of a point s attribute value and the average attribute value of its neighbors, (r-value), for every point The point with largest r-value identified as an outlier Substitute the attribute value of the previous detected spatial outlier with the average attribute value of its neighbors 11
12 Iterative Z-value v.s. Ratio Iterative Z-value Z(s1) = 1.7 Z(s2) = S2 will be selected first Iterative Ratio Ratio(s1) = 10/1=10 Ratio(s2) = 170/2=8.5 S1 will be selected first 12
13 Median Algorithm Use median to represent the average attribute value of neighbors Median is a robust estimator for the center of a data set Compute Z-value for each point z i = d Select the points whose Z-value greater than threshold as spatial outliers i σ µ 13
14 Outline Single-Attribute Spatial Outlier Detection Multi-Attribute Spatial Outlier Detection Region Outlier Detection & Tracking Conclusions 14
15 Multivariate Spatial Outlier Transportation: Abnormal traffic sensor stations (volume, occupancy, speed) Astronomy : A star whose constituent different from neighboring stars Census A county whose race population dissimilar with neighboring counties Multivariate spatial outliers are not necessarily univariate spatial outliers Unusual combination of normal values may cause multivariate spatial outliers 15
16 Problem Formulation: Definitions A set of spatial points X = {x 1, x 2,.. x n } q measurements (attribute values) are made on the spatial object x, y denotes the vector of (y 1,y 2,,y q ) T NN k (x i ) denotes the k nearest spatial neighbors of X i An attribute function f : A map from X to R q (the q dimensional Euclidean space) y i =f(x i ) = (f 1 (x i ), f 2 (x i ),, f q (x i )) T = (y i1, y i2,, y iq ) T Neighborhood function g: A map from X to Rq such that the jth component of g(x), g j (x i ) returns a summary statistic of attribute values y j of all the spatial points inside NN k (x i ), for example, mean function Comparison function h: For example, h=f g or h=f/g 16
17 Mahalanobis distance A distance measure based on correlations between the variable D 2 t (x) = (X m t )T S -1 t (X m t ) D t is the generalized squared distance of each point from the t group S t represents the within-group covariance matrix m t is the vector of the means of the variables of the t group X is the vector containing the values of the variables at location x Superior to Euclidean distance because it considers the distribution of the points (correlations) 17
18 Mahalanobis Distance It takes into account not only the average value but also its variance and the covariance of the variables measured It accounts for ranges of acceptability (variance) between variables It compensates for interactions or dependencies (covariance) between variables If the variables are normally distributed they can be converted to probabilities using the x 2 density function Unit of variable has influence on the distance Each variable stardardized to mean of zero and vairance of one 18
19 Multivariate Spatial Outlier Detection q-dimensional vector h(x) follows a multivariate normal distribution with mean vector µ and variance-covariance vector Σ Mahalanobis distance d 2 (x) = (h(x)- µ) T Σ -1 (h(x)-µ) is distributed as χ 2 q, which is chi-square distribution with q degree of freedom The probability that h(x) satisfies (h(x)- µ) Σ -1 (h(x)- µ)> χ 2 q (α) is α For a threshold θ, if d 2 (x) > θ, x is a spatial outlier n n 1 1 µ = h( ) Σ [ ][ ] T s = h( xi ) µ s h( xi ) µ s n 1 n 1 i= 1 s x i i= 19
20 Experiment: Census Data Set 20
21 Experiment Result (Median Algorithm) 21
22 Experiment Result (Mean Algorithm) 22
23 Outline Single-Attribute Spatial Outlier Detection Multi-Attribute Spatial Outlier Detection Region Outlier Detection & Tracking Conclusions 23
24 Region Outlier What is region outlier A group of adjoining spatial points whose feature is inconsistent with that of their surrounding neighbors Characteristics of meteorological data Spatial region outliers are frequently associated with severe weather phenomena and climate patterns, e.g., hurricane, tornado Preferable to decompose the original observation into different scales and treat them separately 24
25 Propose Approach Three steps Transform original data into wavelet domain Reconstruct from wavelet domain with particular scales of interest Apply image segmentation to identify region outliers Track the movement of the region outlier 25
26 Wavelet Analysis Method Characteristics of Wavelet Analysis Analyze signal at different frequencies with different resolutions Provide frequency and location of a variation Data in different scale can be studies with different focus Effective to filter signal or split different scales of variation Linear time and space complexities Applications of Wavelet Analysis Signal processing, image processing, computer vision Data mining area clustering, classification, regression, and data visualization 26
27 Wavelet Analysis Method Continuous wavelet transform W ( n, s ) = N n: localization of the wavelet transform s: scale Ψ: wavelet function X i (i=0,n-1): a discrete signal Inverse wavelet transform / 2 δjδt = J Re alw xi j C ψ (0) = s δ ( i 1 * x ( i) ψ i = 0 s n ) δ t 1 ( n, s ) j 0 1 / 2 0 j C δ : a constant for each wavelet function J: maximum scale index Ψ 0 : normalized wavelet function 27
28 Mexican Hat Wavelet with Locations and Scales The variation exists on all scales Power of variation changes at different locations 28
29 Wavelet Analysis Method Two base functions for wavelet analysis Mexican hat base 2 ( 1) d ψ 0 ( η ) = ( e 2 τ ( 21 / 2 ) d η Morlet base η 2 / 2 ) ψ 0 ( η ) 1 / 4 0η = π e w e η 2 / 2 We choose Mexican hat base Capture both positive and negative variations as separate peaks in wavelet power Provide better localization (spatial resolution) 29
30 Image segmentation Image Segmentation Partitions an image into connected components Points in a specific component have uniform attribute values Segmentation Methods: Discontinuity based Segment according to abrupt change of color intensity Often used for edge linking and curve detection Similarity based Segment image to regions which have similar characteristics within the boundary For example, region growing and split-and-merging 30
31 Segmentation Algorithm Find the largest connected component Find a connected component S from the dataset Compare its size with previously detected component S Use S to record the largest one Repeat above steps until all points of the dataset have been processed Steps to extract S from data set Σ 1) Pick a point p0 from Σ, whose value is greater than θ and not processed yet. 2) Label p0 as processed, and add p0 and its unprocessed neighbors into a queue 3) Remove a point p in the queue, check if its degree of connection C(p, p 0 ) is greater than variation level λ. If true, the neighbors of p will be added into the queue and p marked as processed. 4) Repeat the marking process until the queue is empty 31
32 Segmentation Algorithm Input: Σ : a set of data points θ: threshold for the clip level λ: variation level Output: S: the largest connected component with value above θ Σ = Ø; while (Σ contains unlabeled points) s p 0 = pickoneunlabeledpoint(σ, θ); L(p 0 ) = '*'; /*labeling p 0 as processed*/ QUEUE = InsertQueue(QUEUE, p 0 ); /* insert p 0 into a Queue */ while ( not Empty(QUEUE) ) /*get an element from the head of QUEUE*/ p 0 = RemoveQueue(QUEUE); For each p that is adjacent to p 0 if ( L(p) <> '*' and C(p, p 0 ) 1-λ) QUEUE = InsertQueue(QUEUE, p); L(p) = 0 s; S' = { p:l(p)=`0 }; /* S' is a λ-connected component*/ if (S' has more points than S) S = S'; /* save the largest component to S */ return(s); 32
33 Global Weather Data Global data of water vapor Multiple-parameter data with resolution of 1 degree by 1 degree Covers whole earth and is updated 4 times a day 33
34 Mexican Hat Wavelet with Locations and Scales The variation exists on all scales Power of variation changes at different locations Mexican hat wavelet has a satisfactory localization resolution 34
35 Wavelet transform A high value does not necessarily correspond to a high wavelet power Wavelet power mainly represents the variation of the signal for a particular scale 35
36 Perform Wavelet Transform along X dimension (Latitude) Include only particular scales of interest (2 and 3) Two spatial outliers Over south America (Center at 27 S and 55 W): tropical storm Over Gulf of Mexican (Center at 27 N and 90 W): hurricane 36
37 The Problem of transforming along the Y-axis (longitude) Reveal more patterns than the reconstructed data from wavelet transform along X-axis (latitude) These patterns are caused by the normal variation along the longitude Y and are noises in most cases 37
38 Experiment: Image Segmentation Reconstruction of water vapor at 0Am on 9/18, 2003 with Hurricane Isabel identified Reconstruction of water vapor at 6Am on 9/18, 2003 with Hurricane Isabel identified 38
39 Experiment: Tracking Movement 12 consecutive detected Isabel regions in 3 days 6 hour interval between two adjacent regions Noisy data might exist due to other weather patterns or inappropriate segmentation parameters Isabel moves northwestward Trajectory of moving region with noisy data Trajectory of moving region with noisy data removed 39
40 Outline Single-Attribute Spatial Outlier Detection Multi-Attribute Spatial Outlier Detection Region Outlier Detection Conclusions 40
41 Summary Single Attribute Spatial Outlier Z-value, Iterative, Median Multi-Attribute Spatial Outlier Two multivariate spatial outlier detection algorithms based on difference or ratio. Order the degree of spatial outlier-ness w.r.t Mahalanobis distance Region Outlier Detection based on wavelet transform and image segmentation On-line processing approach to tracking movement of outlier region in a data stream 41
42 Future Directions Multi-attribute spatial-temporal outliers Region outlier in three dimensional space with multiple attributes Track multiple moving outlier regions Remove the limitation (assumption) of multivariate normal distribution Widely used informal method: box plot approach Investigate the issue of handling large diskresident data set Minimize the number of disk page reads or passes 42
43 Related Publications Related Publications C.T. Lu, D. Chen, Y. Kou, Algorithms for Spatial Outlier Detection, IEEE International Conference on Data Mining, 2003 C.T. Lu, D. Chen, Y. Kou, Detecting Spatial Outliers with Multiple Attribute, IEEE International Conference on Tools with Artificial Intelligence, 2003 J. Zhao, C.T. Lu, Y. Kou, Detecting Region Outliers in Meteorological Data, Proceedings of the 11th International Symposium on Advances in Geographic Information Systems, New Orleans, Louisiana, pp , Nov. 7-8,
44 Links Mapview: Mapcube: 44
45 Q & A ctlu@vt.edu 45
Information Sciences Manuscript Draft. Title: Detecting and Tracking Region Outliers in Meteorological Data Sequences
Information Sciences Manuscript Draft Manuscript Number: Title: Detecting and Tracking Region Outliers in Meteorological Data Sequences Article Type: Full Length Article Section/Category: Keywords: Spatial
More informationDetecting and tracking regional outliers in meteorological data
Information Sciences 177 (2007) 1609 1632 www.elsevier.com/locate/ins Detecting and tracking regional outliers in meteorological data Chang-Tien Lu a, Yufeng Kou a, Jiang Zhao b, Li Chen c, * a Department
More informationTraffic Volume(Time v.s. Station)
Detecting Spatial Outliers: Algorithm and Application Chang-Tien Lu Spatial Database Lab Department of Computer Science University of Minnesota ctlu@cs.umn.edu hone: 612-378-7705 Outline Introduction Motivation
More informationPart I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a
Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationEdge and local feature detection - 2. Importance of edge detection in computer vision
Edge and local feature detection Gradient based edge detection Edge detection by function fitting Second derivative edge detectors Edge linking and the construction of the chain graph Edge and local feature
More informationMoving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation
IJCSNS International Journal of Computer Science and Network Security, VOL.13 No.11, November 2013 1 Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial
More informationStatistics 202: Data Mining. c Jonathan Taylor. Outliers Based in part on slides from textbook, slides of Susan Holmes.
Outliers Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Concepts What is an outlier? The set of data points that are considerably different than the remainder of the
More informationSensor Tasking and Control
Sensor Tasking and Control Outline Task-Driven Sensing Roles of Sensor Nodes and Utilities Information-Based Sensor Tasking Joint Routing and Information Aggregation Summary Introduction To efficiently
More informationIntroduction to Medical Imaging (5XSA0) Module 5
Introduction to Medical Imaging (5XSA0) Module 5 Segmentation Jungong Han, Dirk Farin, Sveta Zinger ( s.zinger@tue.nl ) 1 Outline Introduction Color Segmentation region-growing region-merging watershed
More informationUnified approach to detecting spatial outliers Shashi Shekhar, Chang-Tien Lu And Pusheng Zhang. Pekka Maksimainen University of Helsinki 2007
Unified approach to detecting spatial outliers Shashi Shekhar, Chang-Tien Lu And Pusheng Zhang Pekka Maksimainen University of Helsinki 2007 Spatial outlier Outlier Inconsistent observation in data set
More informationStatistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte
Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,
More informationRobotics Programming Laboratory
Chair of Software Engineering Robotics Programming Laboratory Bertrand Meyer Jiwon Shin Lecture 8: Robot Perception Perception http://pascallin.ecs.soton.ac.uk/challenges/voc/databases.html#caltech car
More informationUnsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing
Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 1. Supervised Data Mining Classification Regression Outlier detection
More informationPart 3: Image Processing
Part 3: Image Processing Image Filtering and Segmentation Georgy Gimel farb COMPSCI 373 Computer Graphics and Image Processing 1 / 60 1 Image filtering 2 Median filtering 3 Mean filtering 4 Image segmentation
More informationVoronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013
Voronoi Region K-means method for Signal Compression: Vector Quantization Blocks of signals: A sequence of audio. A block of image pixels. Formally: vector example: (0.2, 0.3, 0.5, 0.1) A vector quantizer
More informationData Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha
Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking
More information3. Data Preprocessing. 3.1 Introduction
3. Data Preprocessing Contents of this Chapter 3.1 Introduction 3.2 Data cleaning 3.3 Data integration 3.4 Data transformation 3.5 Data reduction SFU, CMPT 740, 03-3, Martin Ester 84 3.1 Introduction Motivation
More informationDATA MINING II - 1DL460
DATA MINING II - 1DL460 Spring 2016 A second course in data mining!! http://www.it.uu.se/edu/course/homepage/infoutv2/vt16 Kjell Orsborn! Uppsala Database Laboratory! Department of Information Technology,
More informationFiltering Images. Contents
Image Processing and Data Visualization with MATLAB Filtering Images Hansrudi Noser June 8-9, 010 UZH, Multimedia and Robotics Summer School Noise Smoothing Filters Sigmoid Filters Gradient Filters Contents
More information2. Data Preprocessing
2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459
More informationSYDE Winter 2011 Introduction to Pattern Recognition. Clustering
SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned
More informationData Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\
Data Preprocessing Javier Béjar BY: $\ URL - Spring 2018 C CS - MAI 1/78 Introduction Data representation Unstructured datasets: Examples described by a flat set of attributes: attribute-value matrix Structured
More informationClustering in Ratemaking: Applications in Territories Clustering
Clustering in Ratemaking: Applications in Territories Clustering Ji Yao, PhD FIA ASTIN 13th-16th July 2008 INTRODUCTION Structure of talk Quickly introduce clustering and its application in insurance ratemaking
More informationClustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford
Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically
More informationFeature Detectors and Descriptors: Corners, Lines, etc.
Feature Detectors and Descriptors: Corners, Lines, etc. Edges vs. Corners Edges = maxima in intensity gradient Edges vs. Corners Corners = lots of variation in direction of gradient in a small neighborhood
More informationElemental Set Methods. David Banks Duke University
Elemental Set Methods David Banks Duke University 1 1. Introduction Data mining deals with complex, high-dimensional data. This means that datasets often combine different kinds of structure. For example:
More informationStructured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov
Structured Light II Johannes Köhler Johannes.koehler@dfki.de Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov Introduction Previous lecture: Structured Light I Active Scanning Camera/emitter
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationSupervised vs. Unsupervised Learning
Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now
More informationMobility Data Management & Exploration
Mobility Data Management & Exploration Ch. 07. Mobility Data Mining and Knowledge Discovery Nikos Pelekis & Yannis Theodoridis InfoLab University of Piraeus Greece infolab.cs.unipi.gr v.2014.05 Chapter
More informationINF 4300 Classification III Anne Solberg The agenda today:
INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15
More informationMachine Learning: k-nearest Neighbors. Lecture 08. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Machine Learning: k-nearest Neighbors Lecture 08 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Nonparametric Methods: k-nearest Neighbors Input: A training dataset
More informationMixture Models and EM
Table of Content Chapter 9 Mixture Models and EM -means Clustering Gaussian Mixture Models (GMM) Expectation Maximiation (EM) for Mixture Parameter Estimation Introduction Mixture models allows Complex
More informationClustering. Chapter 10 in Introduction to statistical learning
Clustering Chapter 10 in Introduction to statistical learning 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 1 Clustering ² Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 1990). ² What
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationSegmentation and Grouping
Segmentation and Grouping How and what do we see? Fundamental Problems ' Focus of attention, or grouping ' What subsets of pixels do we consider as possible objects? ' All connected subsets? ' Representation
More informationSupplementary Figure 1. Decoding results broken down for different ROIs
Supplementary Figure 1 Decoding results broken down for different ROIs Decoding results for areas V1, V2, V3, and V1 V3 combined. (a) Decoded and presented orientations are strongly correlated in areas
More informationCOMPUTER AND ROBOT VISION
VOLUME COMPUTER AND ROBOT VISION Robert M. Haralick University of Washington Linda G. Shapiro University of Washington A^ ADDISON-WESLEY PUBLISHING COMPANY Reading, Massachusetts Menlo Park, California
More informationMachine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves
Machine Learning A 708.064 11W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence I [2 P] a) [1 P] Give an example for a probability distribution P (A, B, C) that disproves
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationSpatial Interpolation & Geostatistics
(Z i Z j ) 2 / 2 Spatial Interpolation & Geostatistics Lag Lag Mean Distance between pairs of points 1 Tobler s Law All places are related, but nearby places are related more than distant places Corollary:
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 4
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationInstance-based Learning
Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin
More informationRegion-based Segmentation
Region-based Segmentation Image Segmentation Group similar components (such as, pixels in an image, image frames in a video) to obtain a compact representation. Applications: Finding tumors, veins, etc.
More informationUncertainties: Representation and Propagation & Line Extraction from Range data
41 Uncertainties: Representation and Propagation & Line Extraction from Range data 42 Uncertainty Representation Section 4.1.3 of the book Sensing in the real world is always uncertain How can uncertainty
More informationSpatial Interpolation - Geostatistics 4/3/2018
Spatial Interpolation - Geostatistics 4/3/201 (Z i Z j ) 2 / 2 Spatial Interpolation & Geostatistics Lag Distance between pairs of points Lag Mean Tobler s Law All places are related, but nearby places
More informationClustering & Classification (chapter 15)
Clustering & Classification (chapter 5) Kai Goebel Bill Cheetham RPI/GE Global Research goebel@cs.rpi.edu cheetham@cs.rpi.edu Outline k-means Fuzzy c-means Mountain Clustering knn Fuzzy knn Hierarchical
More informationBayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis
Bayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis Xavier Le Faucheur a, Brani Vidakovic b and Allen Tannenbaum a a School of Electrical and Computer Engineering, b Department of Biomedical
More informationCS 490: Computer Vision Image Segmentation: Thresholding. Fall 2015 Dr. Michael J. Reale
CS 490: Computer Vision Image Segmentation: Thresholding Fall 205 Dr. Michael J. Reale FUNDAMENTALS Introduction Before we talked about edge-based segmentation Now, we will discuss a form of regionbased
More informationProcessing and Others. Xiaojun Qi -- REU Site Program in CVMA
Advanced Digital Image Processing and Others Xiaojun Qi -- REU Site Program in CVMA (0 Summer) Segmentation Outline Strategies and Data Structures Overview of Algorithms Region Splitting Region Merging
More informationData Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures
More informationOverview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010
INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,
More informationToday. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time
Today Lecture 4: We examine clustering in a little more detail; we went over it a somewhat quickly last time The CAD data will return and give us an opportunity to work with curves (!) We then examine
More informationHomework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)
Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Fall 2014, Prakash Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in
More informationData Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\
Data Preprocessing S - MAI AMLT - 2016/2017 (S - MAI) Data Preprocessing AMLT - 2016/2017 1 / 71 Outline 1 Introduction Data Representation 2 Data Preprocessing Outliers Missing Values Normalization Discretization
More informationData Mining and Analytics. Introduction
Data Mining and Analytics Introduction Data Mining Data mining refers to extracting or mining knowledge from large amounts of data It is also termed as Knowledge Discovery from Data (KDD) Mostly, data
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 1 Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks in Data Preprocessing Data Cleaning Data Integration Data
More informationLecture 8 Object Descriptors
Lecture 8 Object Descriptors Azadeh Fakhrzadeh Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University 2 Reading instructions Chapter 11.1 11.4 in G-W Azadeh Fakhrzadeh
More information3. Data Structures for Image Analysis L AK S H M O U. E D U
3. Data Structures for Image Analysis L AK S H M AN @ O U. E D U Different formulations Can be advantageous to treat a spatial grid as a: Levelset Matrix Markov chain Topographic map Relational structure
More informationHistograms. h(r k ) = n k. p(r k )= n k /NM. Histogram: number of times intensity level rk appears in the image
Histograms h(r k ) = n k Histogram: number of times intensity level rk appears in the image p(r k )= n k /NM normalized histogram also a probability of occurence 1 Histogram of Image Intensities Create
More informationECLT 5810 Data Preprocessing. Prof. Wai Lam
ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate
More information3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.
3D Computer Vision Structured Light II Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de 1 Introduction
More informationChapter 5: Outlier Detection
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 5: Outlier Detection Lecture: Prof. Dr.
More informationCS 664 Segmentation. Daniel Huttenlocher
CS 664 Segmentation Daniel Huttenlocher Grouping Perceptual Organization Structural relationships between tokens Parallelism, symmetry, alignment Similarity of token properties Often strong psychophysical
More informationCS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs
CS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs Felix Wang fywang2 John Wieting wieting2 Introduction We implement a texture classification algorithm using 2-D Noncausal Hidden
More informationOverview of Clustering
based on Loïc Cerfs slides (UFMG) April 2017 UCBL LIRIS DM2L Example of applicative problem Student profiles Given the marks received by students for different courses, how to group the students so that
More informationMULTIVIEW REPRESENTATION OF 3D OBJECTS OF A SCENE USING VIDEO SEQUENCES
MULTIVIEW REPRESENTATION OF 3D OBJECTS OF A SCENE USING VIDEO SEQUENCES Mehran Yazdi and André Zaccarin CVSL, Dept. of Electrical and Computer Engineering, Laval University Ste-Foy, Québec GK 7P4, Canada
More informationSpatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data
Spatial Patterns We will examine methods that are used to analyze patterns in two sorts of spatial data: Point Pattern Analysis - These methods concern themselves with the location information associated
More informationMultiple Model Estimation : The EM Algorithm & Applications
Multiple Model Estimation : The EM Algorithm & Applications Princeton University COS 429 Lecture Nov. 13, 2007 Harpreet S. Sawhney hsawhney@sarnoff.com Recapitulation Problem of motion estimation Parametric
More information7.1 INTRODUCTION Wavelet Transform is a popular multiresolution analysis tool in image processing and
Chapter 7 FACE RECOGNITION USING CURVELET 7.1 INTRODUCTION Wavelet Transform is a popular multiresolution analysis tool in image processing and computer vision, because of its ability to capture localized
More informationCourse Content. What is an Outlier? Chapter 7 Objectives
Principles of Knowledge Discovery in Data Fall 2007 Chapter 7: Outlier Detection Dr. Osmar R. Zaïane University of Alberta Course Content Introduction to Data Mining Association Analysis Sequential Pattern
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationSupervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful
More informationImage Segmentation for Image Object Extraction
Image Segmentation for Image Object Extraction Rohit Kamble, Keshav Kaul # Computer Department, Vishwakarma Institute of Information Technology, Pune kamble.rohit@hotmail.com, kaul.keshav@gmail.com ABSTRACT
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA
More informationOutline. Advanced Digital Image Processing and Others. Importance of Segmentation (Cont.) Importance of Segmentation
Advanced Digital Image Processing and Others Xiaojun Qi -- REU Site Program in CVIP (7 Summer) Outline Segmentation Strategies and Data Structures Algorithms Overview K-Means Algorithm Hidden Markov Model
More information9.1. K-means Clustering
424 9. MIXTURE MODELS AND EM Section 9.2 Section 9.3 Section 9.4 view of mixture distributions in which the discrete latent variables can be interpreted as defining assignments of data points to specific
More informationData fusion and multi-cue data matching using diffusion maps
Data fusion and multi-cue data matching using diffusion maps Stéphane Lafon Collaborators: Raphy Coifman, Andreas Glaser, Yosi Keller, Steven Zucker (Yale University) Part of this work was supported by
More informationClustering. Supervised vs. Unsupervised Learning
Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now
More informationBackground Subtraction based on Cooccurrence of Image Variations
Background Subtraction based on Cooccurrence of Image Variations Makito Seki Toshikazu Wada Hideto Fujiwara Kazuhiko Sumi Advanced Technology R&D Center Faculty of Systems Engineering Mitsubishi Electric
More informationAcquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.
Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting
More informationOnline Pattern Recognition in Multivariate Data Streams using Unsupervised Learning
Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Devina Desai ddevina1@csee.umbc.edu Tim Oates oates@csee.umbc.edu Vishal Shanbhag vshan1@csee.umbc.edu Machine Learning
More informationUlrik Söderström 16 Feb Image Processing. Segmentation
Ulrik Söderström ulrik.soderstrom@tfe.umu.se 16 Feb 2011 Image Processing Segmentation What is Image Segmentation? To be able to extract information from an image it is common to subdivide it into background
More informationProbabilistic and Statistical Models for Outlier Detection
Chapter 2 Probabilistic and Statistical Models for Outlier Detection With four parameters, I can fit an elephant, and with five, I can make him wiggle his trunk. John von Neumann 2.1 Introduction The earliest
More informationData Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data
More informationThe Curse of Dimensionality
The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more
More informationIBL and clustering. Relationship of IBL with CBR
IBL and clustering Distance based methods IBL and knn Clustering Distance based and hierarchical Probability-based Expectation Maximization (EM) Relationship of IBL with CBR + uses previously processed
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More informationComputer Vision Grouping and Segmentation. Grouping and Segmentation
Computer Vision Grouping and Segmentation Professor Hager http://www.cs.jhu.edu/~hager Grouping and Segmentation G&S appear to be one of the early processes in human vision They are a way of *organizing*
More informationOutlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data
Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University
More informationIntroduction to Trajectory Clustering. By YONGLI ZHANG
Introduction to Trajectory Clustering By YONGLI ZHANG Outline 1. Problem Definition 2. Clustering Methods for Trajectory data 3. Model-based Trajectory Clustering 4. Applications 5. Conclusions 1 Problem
More informationProcessing of binary images
Binary Image Processing Tuesday, 14/02/2017 ntonis rgyros e-mail: argyros@csd.uoc.gr 1 Today From gray level to binary images Processing of binary images Mathematical morphology 2 Computer Vision, Spring
More informationA DATA DRIVEN METHOD FOR FLAT ROOF BUILDING RECONSTRUCTION FROM LiDAR POINT CLOUDS
A DATA DRIVEN METHOD FOR FLAT ROOF BUILDING RECONSTRUCTION FROM LiDAR POINT CLOUDS A. Mahphood, H. Arefi *, School of Surveying and Geospatial Engineering, College of Engineering, University of Tehran,
More informationImage Segmentation. Selim Aksoy. Bilkent University
Image Segmentation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Examples of grouping in vision [http://poseidon.csd.auth.gr/lab_research/latest/imgs/s peakdepvidindex_img2.jpg]
More informationImage Segmentation. Selim Aksoy. Bilkent University
Image Segmentation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Examples of grouping in vision [http://poseidon.csd.auth.gr/lab_research/latest/imgs/s peakdepvidindex_img2.jpg]
More informationComputer Vision 2. SS 18 Dr. Benjamin Guthier Professur für Bildverarbeitung. Computer Vision 2 Dr. Benjamin Guthier
Computer Vision 2 SS 18 Dr. Benjamin Guthier Professur für Bildverarbeitung Computer Vision 2 Dr. Benjamin Guthier 1. IMAGE PROCESSING Computer Vision 2 Dr. Benjamin Guthier Content of this Chapter Non-linear
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationHow to Price a House
How to Price a House An Interpretable Bayesian Approach Dustin Lennon dustin@inferentialist.com Inferentialist Consulting Seattle, WA April 9, 2014 Introduction Project to tie up loose ends / came out
More information